Apache Spark

GPTKB entity

Statements (70)
Predicate Object
gptkbp:instanceOf open-source distributed computing system
gptkbp:component gptkb:GraphX
gptkb:MLlib
gptkb:Spark_Core
gptkb:Spark_SQL
gptkb:Spark_Streaming
gptkbp:designedFor big data processing
gptkbp:developedBy gptkb:Apache_Software_Foundation
gptkbp:developer gptkb:Matei_Zaharia
gptkbp:firstPaper gptkb:Resilient_Distributed_Datasets:_A_Fault-Tolerant_Abstraction_for_In-Memory_Cluster_Computing
https://www.w3.org/2000/01/rdf-schema#label Apache Spark
gptkbp:latestReleaseVersion 2023-10-13
3.5.0
gptkbp:license gptkb:Apache_License_2.0
gptkbp:notableUser gptkb:Airbnb
gptkb:Netflix
gptkb:Uber
gptkb:eBay
gptkb:Yahoo
gptkb:Alibaba
gptkbp:operatingSystem Cross-platform
gptkbp:originatedIn gptkb:AMPLab,_UC_Berkeley
gptkbp:predecessor gptkb:Apache_Hadoop_MapReduce
gptkbp:releaseDate 2014
gptkbp:repository https://github.com/apache/spark
gptkbp:runsOn gptkb:Apache_Mesos
gptkb:Kubernetes
gptkb:Hadoop_YARN
standalone cluster mode
gptkbp:supports gptkb:machine_learning
distributed computing
batch processing
fault tolerance
stream processing
SQL queries
graph processing
in-memory computation
gptkbp:supportsLanguage gptkb:Java
gptkb:Python
gptkb:Scala
R
SQL
gptkbp:usedFor data analytics
ETL
real-time data processing
interactive queries
machine learning pipelines
gptkbp:website https://spark.apache.org/
gptkbp:writtenBy gptkb:Java
gptkb:Python
gptkb:Scala
R
SQL
gptkbp:bfsParent gptkb:Avro
gptkb:Deeplearning4j
gptkb:Cloudera
gptkb:Databricks
gptkb:Databricks_product
gptkb:Apache
gptkb:Amazon_Elastic_MapReduce
gptkb:Apache_License_2.0
gptkb:Apache_Software_Foundation
gptkb:Azure_Event_Hubs
gptkb:Azure_HDInsight
gptkb:Cloud_Bigtable
gptkb:Cloud_Dataproc
gptkb:Presto
gptkb:Azure_Databricks
gptkb:Azure_Synapse_Analytics
gptkbp:bfsLayer 5