Apache Spark

URI: https://gptkb.org/entity/Apache_Spark

GPTKB entity

Predicate	Object
gptkbp:instanceOf	gptkb:open-source_distributed_computing_system
gptkbp:component	gptkb:GraphX gptkb:MLlib gptkb:Spark_Core gptkb:Spark_SQL gptkb:Spark_Streaming
gptkbp:designedFor	big data processing
gptkbp:developedBy	gptkb:Apache_Software_Foundation
gptkbp:developer	gptkb:Matei_Zaharia
gptkbp:firstPaper	gptkb:Resilient_Distributed_Datasets:_A_Fault-Tolerant_Abstraction_for_In-Memory_Cluster_Computing
gptkbp:latestReleaseVersion	2023-10-13 3.5.0
gptkbp:license	gptkb:Apache_License_2.0
gptkbp:notableUser	gptkb:Airbnb gptkb:Netflix gptkb:Uber gptkb:eBay gptkb:Yahoo gptkb:Alibaba
gptkbp:operatingSystem	Cross-platform
gptkbp:originatedIn	gptkb:AMPLab,_UC_Berkeley
gptkbp:predecessor	gptkb:Apache_Hadoop_MapReduce
gptkbp:releaseDate	2014
gptkbp:repository	https://github.com/apache/spark
gptkbp:runsOn	gptkb:Apache_Mesos gptkb:Kubernetes gptkb:Hadoop_YARN standalone cluster mode
gptkbp:supports	gptkb:machine_learning distributed computing batch processing fault tolerance stream processing SQL queries graph processing in-memory computation
gptkbp:supportsLanguage	gptkb:Java gptkb:Python gptkb:Scala R SQL
gptkbp:usedFor	data analytics ETL real-time data processing interactive queries machine learning pipelines
gptkbp:website	https://spark.apache.org/
gptkbp:writtenBy	gptkb:Java gptkb:Python gptkb:Scala R SQL
gptkbp:bfsParent	gptkb:Cloudera gptkb:Databricks gptkb:Apache_License_2.0 gptkb:Presto
gptkbp:bfsLayer	5
http://www.w3.org/2000/01/rdf-schema#label	Apache Spark