Resilient Distributed Dataset (RDD)

GPTKB entity

Statements (50)
Predicate Object
gptkbp:instanceOf gptkb:architecture
Apache Spark component
gptkbp:canBeActionedBy gptkb:Count
first
reduce
take
collect
foreach
saveAsTextFile
gptkbp:canCreate gptkb:Hadoop_Distributed_File_System_(HDFS)
local file system
existing RDD
gptkbp:feature lazy evaluation
resilient
distributed
partitioned
fault-tolerant
immutable
lineage graph
typed
gptkbp:firstReleased 2012
gptkbp:hasConcept gptkb:Apache_Spark
https://www.w3.org/2000/01/rdf-schema#label Resilient Distributed Dataset (RDD)
gptkbp:introduced gptkb:Apache_Spark
gptkbp:language gptkb:Java
gptkb:Python
gptkb:Scala
R
gptkbp:replacedBy gptkb:Dataset
DataFrame
gptkbp:supports fault tolerance
lazy evaluation
in-memory computation
parallel computation
lineage tracking
gptkbp:transformsInto gptkb:topographic_map
Union
distinct
sample
filter
join
flatMap
cartesian
groupByKey
reduceByKey
sortBy
gptkbp:usedFor distributed data processing
gptkbp:website https://spark.apache.org/docs/latest/rdd-programming-guide.html
gptkbp:bfsParent gptkb:Spark_Core
gptkbp:bfsLayer 7