Resilient Distributed Dataset

GPTKB entity

Predicate	Object
gptkbp:instanceOf	gptkb:architecture
gptkbp:abbreviation	gptkb:RDD
gptkbp:canBeActionedBy	gptkb:Count reduce take collect foreach saveAsTextFile
gptkbp:canCreate	parallelizing existing collections transforming data from external storage
gptkbp:category	gptkb:Distributed_Computing Big Data Data Processing
gptkbp:enables	lazy evaluation in-memory computation lineage tracking distributed fault recovery
gptkbp:introduced	gptkb:Matei_Zaharia
gptkbp:introducedIn	2012
gptkbp:isImmutable	true
gptkbp:isPartitioned	true
gptkbp:language	gptkb:Java gptkb:Python gptkb:Scala R
gptkbp:openSource	true
gptkbp:relatedTo	gptkb:Hadoop gptkb:Spark_SQL gptkb:MapReduce gptkb:Spark_Streaming
gptkbp:replacedBy	gptkb:Dataset DataFrame
gptkbp:supports	fault tolerance immutable data parallel computation distributed processing
gptkbp:transformsInto	gptkb:topographic_map gptkb:Union gptkb:filter distinct sample join flatMap groupByKey reduceByKey sortBy
gptkbp:usedIn	gptkb:Apache_Spark
gptkbp:bfsParent	gptkb:RDD gptkb:PySpark_RDD
gptkbp:bfsLayer	8
http://www.w3.org/2000/01/rdf-schema#label	Resilient Distributed Dataset