Apache Spark

GPTKB entity

Statements (249)
Predicate Object
gptkbp:instance_of gptkb:servers
gptkb:open-source_software
gptkb:machine_learning
gptkbp:api Dataset API
Data Frame API
Graph X API
MLlib API
Spark SQL API
gptkbp:can_be_used_with gptkb:Kubernetes
gptkb:Mechagodzilla
gptkb:Apache_Flink
gptkb:Hadoop
gptkbp:community_support active user community
gptkbp:competes_with gptkb:Dask
gptkb:Storm
gptkb:Apache_Flink
gptkb:Apache_Storm
gptkb:Hadoop
gptkb:Hadoop_Map_Reduce
Flink
gptkbp:deployment gptkb:cloud_computing
on-premises
hybrid
gptkbp:developed_by gptkb:Apache_Software_Foundation
gptkbp:first_released gptkb:2010
gptkbp:founder gptkb:Patrick_Wendell
gptkb:Reynold_Xin
gptkb:Ion_Stoica
gptkb:Matei_Zaharia
gptkbp:has_apicall Dataset API
RDD API
Data Frame API
gptkbp:has_community gptkb:developers
gptkb:Author
gptkb:researchers
Active user community
conferences
meetups
online forums
user groups
active user community
user community
active developer community
active contributors
contributor community
gptkbp:has_component gptkb:Spark_Streaming
gptkb:Graph_X
gptkb:Spark_SQL
gptkb:MLlib
gptkbp:has_documentation gptkb:Tutorials
tutorials
API references
User guides
official documentation
Official documentation
extensive user guides
gptkbp:has_feature gptkb:Naive_Bayes
gptkb:Spark_Streaming
gptkb:Linear_Regression
gptkb:Support_Vector_Machines
gptkb:Graph_X
gptkb:Decision_Trees
gptkb:Logistic_Regression
gptkb:Random_Forests
gptkb:Spark_SQL
gptkb:MLlib
Data visualization
Model selection
Data preprocessing
fault tolerance
Cross-validation
Data transformation
stream processing
batch processing
Data aggregation
Data encoding
Matrix factorization
K-means clustering
graph processing
Data normalization
interactive queries
Data splitting
Principal Component Analysis (PCA)
Data sampling
Data imputation
Classification evaluation
Clustering evaluation
Data scaling
Feature transformation
Gradient-Boosted Trees
Regression evaluation
Singular Value Decomposition (SVD)
gptkbp:has_integration_with gptkb:Hadoop_ecosystem
gptkb:Amazon_S3
gptkb:Apache_HBase
gptkb:Apache_Hive
gptkb:Microsoft_Azure_Blob_Storage
gptkb:Apache_Kafka
gptkb:Apache_Jena
gptkb:cloud_storage
gptkbp:has_performance faster than Hadoop Map Reduce
gptkbp:has_version 3.1.0
3.0.0
3.2.0
3.3.0
2.4.0
https://www.w3.org/2000/01/rdf-schema#label Apache Spark
gptkbp:includes gptkb:Spark_Streaming
gptkb:Graph_X
gptkb:Spark_SQL
gptkb:MLlib
gptkbp:is_adopted_by gptkb:Alibaba
gptkb:Uber
gptkb:Yahoo!
gptkb:Netflix
gptkb:e_Bay
gptkbp:is_compatible_with gptkb:Kubernetes
gptkb:Hadoop_ecosystem
gptkb:Mechagodzilla
gptkb:Java
gptkb:SQL
gptkb:Apache_Hive
gptkb:Python
gptkb:R
gptkb:Hadoop
gptkb:Docker
Cloud platforms
Hadoop 3.x
Hadoop 2.7+
gptkbp:is_known_for flexibility
high performance
ease of use
gptkbp:is_optimized_for real-time data processing
batch processing
interactive queries
large-scale data processing
in-memory computation
gptkbp:is_part_of Apache Software Foundation projects
big data ecosystem
gptkbp:is_scalable horizontal scaling
thousands of nodes
vertical scaling
gptkbp:is_supported_by gptkb:Documentation
gptkb:Amazon_EMR
gptkb:Google_Cloud_Dataproc
gptkb:Databricks
gptkb:Microsoft_Azure_HDInsight
tutorials
community contributions
multiple data sources
various programming languages
user forums
numerous cloud platforms
gptkbp:is_used_by gptkb:Airbnb
gptkb:Alibaba
gptkb:Linked_In
gptkb:Pinterest
gptkb:Uber
gptkb:Yahoo!
gptkb:IBM
gptkb:Spotify
gptkb:Netflix
gptkb:e_Bay
Data scientists
Data engineers
Big data analysts
gptkbp:is_used_for gptkb:machine_learning
big data processing
data analytics
big data analytics
stream processing
graph processing
gptkbp:is_used_in Data analysis
Machine learning
Stream processing
Big data processing
ETL processes
data analytics
data visualization
real-time analytics
data warehousing
machine learning workflows
data processing pipelines
gptkbp:language gptkb:Java
gptkb:Python
gptkb:R
gptkb:Scala
gptkbp:latest_version 3.3.1
gptkbp:license Apache License 2.0
gptkbp:part_of Apache Spark ecosystem
gptkbp:programming_language gptkb:Scala
gptkbp:provides gptkb:Regression
Classification
Clustering
Feature extraction
data processing capabilities
Hyperparameter tuning
in-memory data processing
machine learning capabilities
stream processing
Machine Learning algorithms
stream processing capabilities
Collaborative filtering
graph processing capabilities
Pipeline API
machine learning libraries
Evaluation metrics
Model persistence
gptkbp:released gptkb:2014
gptkbp:released_on May 2014
gptkbp:runs_through gptkb:Kubernetes
gptkb:Mechagodzilla
gptkb:Hadoop_YARN
gptkb:Hadoop
Mesos
Standalone cluster
gptkbp:security_features authorization
encryption
authentication
gptkbp:supports gptkb:Java
gptkb:Python
gptkb:R
Distributed computing
Batch processing
Graph processing
In-memory processing
in-memory computing
Streaming data
gptkbp:tutorials online tutorials
gptkbp:use_case ETL processes
real-time analytics
data warehousing
data science workflows
log processing
gptkbp:used_by Data scientists
Data engineers
Big data analysts
Machine learning practitioners
gptkbp:uses Resilient Distributed Datasets (RDDs)
Data Frames
gptkbp:written_in gptkb:Java
gptkb:SQL
gptkb:Python
gptkb:R
gptkb:Scala
gptkbp:bfsParent gptkb:Data
gptkb:machine_learning
gptkb:weapons
gptkbp:bfsLayer 3