Spark Dataset API

GPTKB entity

Statements (50)
Predicate Object
gptkbp:instanceOf gptkb:REST_API
gptkbp:canBeCached true
gptkbp:canBeConvertedFrom DataFrame
gptkbp:canBePersisted memory
disk
gptkbp:class org.apache.spark.sql.Dataset
gptkbp:combines gptkb:RDD_API
gptkb:DataFrame_API
gptkbp:convertedTo DataFrame
gptkbp:documentation https://spark.apache.org/docs/latest/sql-programming-guide.html#datasets-and-dataframes
gptkbp:enables object-oriented programming
functional programming
type-safe operations
https://www.w3.org/2000/01/rdf-schema#label Spark Dataset API
gptkbp:introducedIn gptkb:Apache_Spark_1.6
gptkbp:license gptkb:Apache_License_2.0
gptkbp:openSource true
gptkbp:partOf gptkb:Apache_Spark
gptkbp:provides compile-time type safety
optimizations via Catalyst engine
gptkbp:relatedTo gptkb:Spark_SQL
gptkb:Spark_DataFrame_API
Spark RDD API
gptkbp:serialization gptkb:Kryo
gptkb:Java_serialization
Encoders
gptkbp:supports sorting
actions
filtering
grouping
lazy evaluation
custom data types
aggregation
encoders
transformations
joins
flatMap operations
map operations
typed transformations
untyped transformations
gptkbp:supportsLanguage gptkb:Java
gptkb:Scala
gptkbp:usedFor gptkb:machine_learning
data analysis
batch processing
ETL
stream processing
structured data processing
gptkbp:bfsParent gptkb:Tungsten_execution_engine
gptkbp:bfsLayer 7