Apache Spark 2.0+

GPTKB entity

Properties (64)
Predicate Object
gptkbp:instanceOf Software Framework
gptkbp:compatibleWith gptkb:Hadoop
gptkbp:developedBy gptkb:Apache_Software_Foundation
gptkbp:hasFeature Improved performance
Structured Streaming
Support for cloud computing
Support for data visualization
Support for distributed computing
Support for data warehousing
Support for batch processing
Support for edge computing
Support for big data processing
Support for Apache Hudi
Support for JDBC and ODBC
Support for data analytics
Support for data governance
Support for data lakes
Integration with TensorFlow
Support for data science
Support for real-time processing
Support for Python 3
DataFrame API improvements
Support for Apache ORC
Support for batch and stream processing
Support for data aggregation
Support for data enrichment
Support for data exploration
Support for data integration
Support for data lineage
Support for data profiling
Support for data quality
Support for data reporting
Support for data security
Support for data summarization
Support for data transformation
Support for graph processing
Support for interactive queries
Support for machine learning pipelines
Support for user-defined functions (UDFs)
Improved_Catalyst_optimizer
New_SQL_functions
Support_for_Apache_Avro
Support_for_Apache_Iceberg
Support_for_Apache_Parquet
Support_for_Delta_Lake
Support_for_SQL_on_streaming_data
https://www.w3.org/2000/01/rdf-schema#label Apache Spark 2.0+
gptkbp:language Scala
gptkbp:provides gptkb:Spark_SQL
gptkb:Machine_Learning_Library_(MLlib)
GraphX
DataFrame API
In-memory computing
Streaming Processing
gptkbp:releaseDate July 2016
gptkbp:supports gptkb:Java
Python
R
gptkbp:uses gptkb:Amazon_S3
gptkb:Apache_Hive
gptkb:Apache_Cassandra
Kubernetes
Apache Mesos
Resilient_Distributed_Datasets_(RDDs)