Resilient Distributed Dataset (RDD)

GPTKB entity

Predicate	Object
gptkbp:instanceOf	gptkb:architecture gptkb:Apache_Spark_component
gptkbp:canBeActionedBy	gptkb:Count first reduce take collect foreach saveAsTextFile
gptkbp:canCreate	gptkb:Hadoop_Distributed_File_System_(HDFS) local file system existing RDD
gptkbp:feature	lazy evaluation resilient distributed partitioned fault-tolerant immutable lineage graph typed
gptkbp:firstReleased	2012
gptkbp:hasConcept	gptkb:Apache_Spark
gptkbp:introduced	gptkb:Apache_Spark
gptkbp:language	gptkb:Java gptkb:Python gptkb:Scala R
gptkbp:replacedBy	gptkb:Dataset DataFrame
gptkbp:supports	fault tolerance lazy evaluation in-memory computation parallel computation lineage tracking
gptkbp:transformsInto	gptkb:topographic_map gptkb:Union gptkb:filter distinct sample join flatMap cartesian groupByKey reduceByKey sortBy
gptkbp:usedFor	distributed data processing
gptkbp:website	https://spark.apache.org/docs/latest/rdd-programming-guide.html
gptkbp:bfsParent	gptkb:Spark_Core
gptkbp:bfsLayer	7
http://www.w3.org/2000/01/rdf-schema#label	Resilient Distributed Dataset (RDD)