PySpark RDD

GPTKB entity

Statements (51)
Predicate Object
gptkbp:instanceOf PySpark API component
distributed data structure
gptkbp:action gptkb:Count
take
collect
foreach
saveAsTextFile
gptkbp:API distributed computing
big data processing
parallel computation
gptkbp:canBeCached true
gptkbp:canBePersisted true
gptkbp:documentation https://spark.apache.org/docs/latest/rdd-programming-guide.html
https://www.w3.org/2000/01/rdf-schema#label PySpark RDD
gptkbp:introducedIn gptkb:Apache_Spark_1.0
gptkbp:isFaultTolerant true
gptkbp:isImmutable true
gptkbp:isLazilyEvaluated true
gptkbp:isPartitioned true
gptkbp:isTyped false
gptkbp:language gptkb:Python
gptkbp:operator gptkb:topographic_map
gptkb:Count
Union
distinct
sample
checkpoint
filter
join
reduce
take
flatMap
cache
collect
groupByKey
reduceByKey
saveAsTextFile
persist
gptkbp:partOf gptkb:Apache_Spark
gptkb:PySpark
gptkbp:replacedBy gptkb:Dataset
DataFrame
gptkbp:standsFor gptkb:Resilient_Distributed_Dataset
gptkbp:transformsInto gptkb:topographic_map
filter
flatMap
groupByKey
reduceByKey
gptkbp:usedIn gptkb:Apache_Spark
gptkbp:bfsParent gptkb:Dask_Bag
gptkbp:bfsLayer 7