Resilient Distributed Dataset (RDD)
GPTKB entity
Statements (50)
Predicate | Object |
---|---|
gptkbp:instanceOf |
gptkb:architecture
Apache Spark component |
gptkbp:canBeActionedBy |
gptkb:Count
first reduce take collect foreach saveAsTextFile |
gptkbp:canCreate |
gptkb:Hadoop_Distributed_File_System_(HDFS)
local file system existing RDD |
gptkbp:feature |
lazy evaluation
resilient distributed partitioned fault-tolerant immutable lineage graph typed |
gptkbp:firstReleased |
2012
|
gptkbp:hasConcept |
gptkb:Apache_Spark
|
https://www.w3.org/2000/01/rdf-schema#label |
Resilient Distributed Dataset (RDD)
|
gptkbp:introduced |
gptkb:Apache_Spark
|
gptkbp:language |
gptkb:Java
gptkb:Python gptkb:Scala R |
gptkbp:replacedBy |
gptkb:Dataset
DataFrame |
gptkbp:supports |
fault tolerance
lazy evaluation in-memory computation parallel computation lineage tracking |
gptkbp:transformsInto |
gptkb:topographic_map
Union distinct sample filter join flatMap cartesian groupByKey reduceByKey sortBy |
gptkbp:usedFor |
distributed data processing
|
gptkbp:website |
https://spark.apache.org/docs/latest/rdd-programming-guide.html
|
gptkbp:bfsParent |
gptkb:Spark_Core
|
gptkbp:bfsLayer |
7
|