gptkbp:instanceOf
|
PySpark API component
distributed data structure
|
gptkbp:action
|
gptkb:Count
take
collect
foreach
saveAsTextFile
|
gptkbp:API
|
distributed computing
big data processing
parallel computation
|
gptkbp:canBeCached
|
true
|
gptkbp:canBePersisted
|
true
|
gptkbp:documentation
|
https://spark.apache.org/docs/latest/rdd-programming-guide.html
|
https://www.w3.org/2000/01/rdf-schema#label
|
PySpark RDD
|
gptkbp:introducedIn
|
gptkb:Apache_Spark_1.0
|
gptkbp:isFaultTolerant
|
true
|
gptkbp:isImmutable
|
true
|
gptkbp:isLazilyEvaluated
|
true
|
gptkbp:isPartitioned
|
true
|
gptkbp:isTyped
|
false
|
gptkbp:language
|
gptkb:Python
|
gptkbp:operator
|
gptkb:topographic_map
gptkb:Count
Union
distinct
sample
checkpoint
filter
join
reduce
take
flatMap
cache
collect
groupByKey
reduceByKey
saveAsTextFile
persist
|
gptkbp:partOf
|
gptkb:Apache_Spark
gptkb:PySpark
|
gptkbp:replacedBy
|
gptkb:Dataset
DataFrame
|
gptkbp:standsFor
|
gptkb:Resilient_Distributed_Dataset
|
gptkbp:transformsInto
|
gptkb:topographic_map
filter
flatMap
groupByKey
reduceByKey
|
gptkbp:usedIn
|
gptkb:Apache_Spark
|
gptkbp:bfsParent
|
gptkb:Dask_Bag
|
gptkbp:bfsLayer
|
7
|