Resilient Distributed Dataset

GPTKB entity

Statements (51)
Predicate Object
gptkbp:instanceOf gptkb:architecture
gptkbp:abbreviation gptkb:RDD
gptkbp:canBeActionedBy gptkb:Count
reduce
take
collect
foreach
saveAsTextFile
gptkbp:canCreate parallelizing existing collections
transforming data from external storage
gptkbp:category gptkb:Distributed_Computing
Big Data
Data Processing
gptkbp:enables lazy evaluation
in-memory computation
lineage tracking
distributed fault recovery
https://www.w3.org/2000/01/rdf-schema#label Resilient Distributed Dataset
gptkbp:introduced gptkb:Matei_Zaharia
gptkbp:introducedIn 2012
gptkbp:isImmutable true
gptkbp:isPartitioned true
gptkbp:language gptkb:Java
gptkb:Python
gptkb:Scala
R
gptkbp:openSource true
gptkbp:relatedTo gptkb:Hadoop
gptkb:Spark_SQL
gptkb:MapReduce
gptkb:Spark_Streaming
gptkbp:replacedBy gptkb:Dataset
DataFrame
gptkbp:supports fault tolerance
immutable data
parallel computation
distributed processing
gptkbp:transformsInto gptkb:topographic_map
Union
distinct
sample
filter
join
flatMap
groupByKey
reduceByKey
sortBy
gptkbp:usedIn gptkb:Apache_Spark
gptkbp:bfsParent gptkb:RDD
gptkb:PySpark_RDD
gptkbp:bfsLayer 8