Apache Spark

GPTKB entity

Statements (58)
Predicate Object
gptkbp:instanceOf gptkb:open-source_distributed_computing_system
gptkbp:component gptkb:GraphX
gptkb:MLlib
gptkb:Spark_Core
gptkb:Spark_SQL
gptkb:Spark_Streaming
gptkbp:designedFor big data processing
gptkbp:developedBy gptkb:Apache_Software_Foundation
gptkbp:developer gptkb:Matei_Zaharia
gptkbp:firstPaper gptkb:Resilient_Distributed_Datasets:_A_Fault-Tolerant_Abstraction_for_In-Memory_Cluster_Computing
gptkbp:latestReleaseVersion 2023-10-13
3.5.0
gptkbp:license gptkb:Apache_License_2.0
gptkbp:notableUser gptkb:Airbnb
gptkb:Netflix
gptkb:Uber
gptkb:eBay
gptkb:Yahoo
gptkb:Alibaba
gptkbp:operatingSystem Cross-platform
gptkbp:originatedIn gptkb:AMPLab,_UC_Berkeley
gptkbp:predecessor gptkb:Apache_Hadoop_MapReduce
gptkbp:releaseDate 2014
gptkbp:repository https://github.com/apache/spark
gptkbp:runsOn gptkb:Apache_Mesos
gptkb:Kubernetes
gptkb:Hadoop_YARN
standalone cluster mode
gptkbp:supports gptkb:machine_learning
distributed computing
batch processing
fault tolerance
stream processing
SQL queries
graph processing
in-memory computation
gptkbp:supportsLanguage gptkb:Java
gptkb:Python
gptkb:Scala
R
SQL
gptkbp:usedFor data analytics
ETL
real-time data processing
interactive queries
machine learning pipelines
gptkbp:website https://spark.apache.org/
gptkbp:writtenBy gptkb:Java
gptkb:Python
gptkb:Scala
R
SQL
gptkbp:bfsParent gptkb:Cloudera
gptkb:Databricks
gptkb:Apache_License_2.0
gptkb:Presto
gptkbp:bfsLayer 5
https://www.w3.org/2000/01/rdf-schema#label Apache Spark