gptkbp:instanceOf
|
gptkb:REST_API
|
gptkbp:canBeCached
|
true
|
gptkbp:canBeConvertedFrom
|
DataFrame
|
gptkbp:canBePersisted
|
memory
disk
|
gptkbp:class
|
org.apache.spark.sql.Dataset
|
gptkbp:combines
|
gptkb:RDD_API
gptkb:DataFrame_API
|
gptkbp:convertedTo
|
DataFrame
|
gptkbp:documentation
|
https://spark.apache.org/docs/latest/sql-programming-guide.html#datasets-and-dataframes
|
gptkbp:enables
|
object-oriented programming
functional programming
type-safe operations
|
https://www.w3.org/2000/01/rdf-schema#label
|
Spark Dataset API
|
gptkbp:introducedIn
|
gptkb:Apache_Spark_1.6
|
gptkbp:license
|
gptkb:Apache_License_2.0
|
gptkbp:openSource
|
true
|
gptkbp:partOf
|
gptkb:Apache_Spark
|
gptkbp:provides
|
compile-time type safety
optimizations via Catalyst engine
|
gptkbp:relatedTo
|
gptkb:Spark_SQL
gptkb:Spark_DataFrame_API
Spark RDD API
|
gptkbp:serialization
|
gptkb:Kryo
gptkb:Java_serialization
Encoders
|
gptkbp:supports
|
sorting
actions
filtering
grouping
lazy evaluation
custom data types
aggregation
encoders
transformations
joins
flatMap operations
map operations
typed transformations
untyped transformations
|
gptkbp:supportsLanguage
|
gptkb:Java
gptkb:Scala
|
gptkbp:usedFor
|
gptkb:machine_learning
data analysis
batch processing
ETL
stream processing
structured data processing
|
gptkbp:bfsParent
|
gptkb:Tungsten_execution_engine
|
gptkbp:bfsLayer
|
7
|