Statements (249)
Predicate | Object |
---|---|
gptkbp:instance_of |
gptkb:servers
gptkb:open-source_software gptkb:machine_learning |
gptkbp:api |
Dataset API
Data Frame API Graph X API MLlib API Spark SQL API |
gptkbp:can_be_used_with |
gptkb:Kubernetes
gptkb:Mechagodzilla gptkb:Apache_Flink gptkb:Hadoop |
gptkbp:community_support |
active user community
|
gptkbp:competes_with |
gptkb:Dask
gptkb:Storm gptkb:Apache_Flink gptkb:Apache_Storm gptkb:Hadoop gptkb:Hadoop_Map_Reduce Flink |
gptkbp:deployment |
gptkb:cloud_computing
on-premises hybrid |
gptkbp:developed_by |
gptkb:Apache_Software_Foundation
|
gptkbp:first_released |
gptkb:2010
|
gptkbp:founder |
gptkb:Patrick_Wendell
gptkb:Reynold_Xin gptkb:Ion_Stoica gptkb:Matei_Zaharia |
gptkbp:has_apicall |
Dataset API
RDD API Data Frame API |
gptkbp:has_community |
gptkb:developers
gptkb:Author gptkb:researchers Active user community conferences meetups online forums user groups active user community user community active developer community active contributors contributor community |
gptkbp:has_component |
gptkb:Spark_Streaming
gptkb:Graph_X gptkb:Spark_SQL gptkb:MLlib |
gptkbp:has_documentation |
gptkb:Tutorials
tutorials API references User guides official documentation Official documentation extensive user guides |
gptkbp:has_feature |
gptkb:Naive_Bayes
gptkb:Spark_Streaming gptkb:Linear_Regression gptkb:Support_Vector_Machines gptkb:Graph_X gptkb:Decision_Trees gptkb:Logistic_Regression gptkb:Random_Forests gptkb:Spark_SQL gptkb:MLlib Data visualization Model selection Data preprocessing fault tolerance Cross-validation Data transformation stream processing batch processing Data aggregation Data encoding Matrix factorization K-means clustering graph processing Data normalization interactive queries Data splitting Principal Component Analysis (PCA) Data sampling Data imputation Classification evaluation Clustering evaluation Data scaling Feature transformation Gradient-Boosted Trees Regression evaluation Singular Value Decomposition (SVD) |
gptkbp:has_integration_with |
gptkb:Hadoop_ecosystem
gptkb:Amazon_S3 gptkb:Apache_HBase gptkb:Apache_Hive gptkb:Microsoft_Azure_Blob_Storage gptkb:Apache_Kafka gptkb:Apache_Jena gptkb:cloud_storage |
gptkbp:has_performance |
faster than Hadoop Map Reduce
|
gptkbp:has_version |
3.1.0
3.0.0 3.2.0 3.3.0 2.4.0 |
https://www.w3.org/2000/01/rdf-schema#label |
Apache Spark
|
gptkbp:includes |
gptkb:Spark_Streaming
gptkb:Graph_X gptkb:Spark_SQL gptkb:MLlib |
gptkbp:is_adopted_by |
gptkb:Alibaba
gptkb:Uber gptkb:Yahoo! gptkb:Netflix gptkb:e_Bay |
gptkbp:is_compatible_with |
gptkb:Kubernetes
gptkb:Hadoop_ecosystem gptkb:Mechagodzilla gptkb:Java gptkb:SQL gptkb:Apache_Hive gptkb:Python gptkb:R gptkb:Hadoop gptkb:Docker Cloud platforms Hadoop 3.x Hadoop 2.7+ |
gptkbp:is_known_for |
flexibility
high performance ease of use |
gptkbp:is_optimized_for |
real-time data processing
batch processing interactive queries large-scale data processing in-memory computation |
gptkbp:is_part_of |
Apache Software Foundation projects
big data ecosystem |
gptkbp:is_scalable |
horizontal scaling
thousands of nodes vertical scaling |
gptkbp:is_supported_by |
gptkb:Documentation
gptkb:Amazon_EMR gptkb:Google_Cloud_Dataproc gptkb:Databricks gptkb:Microsoft_Azure_HDInsight tutorials community contributions multiple data sources various programming languages user forums numerous cloud platforms |
gptkbp:is_used_by |
gptkb:Airbnb
gptkb:Alibaba gptkb:Linked_In gptkb:Pinterest gptkb:Uber gptkb:Yahoo! gptkb:IBM gptkb:Spotify gptkb:Netflix gptkb:e_Bay Data scientists Data engineers Big data analysts |
gptkbp:is_used_for |
gptkb:machine_learning
big data processing data analytics big data analytics stream processing graph processing |
gptkbp:is_used_in |
Data analysis
Machine learning Stream processing Big data processing ETL processes data analytics data visualization real-time analytics data warehousing machine learning workflows data processing pipelines |
gptkbp:language |
gptkb:Java
gptkb:Python gptkb:R gptkb:Scala |
gptkbp:latest_version |
3.3.1
|
gptkbp:license |
Apache License 2.0
|
gptkbp:part_of |
Apache Spark ecosystem
|
gptkbp:programming_language |
gptkb:Scala
|
gptkbp:provides |
gptkb:Regression
Classification Clustering Feature extraction data processing capabilities Hyperparameter tuning in-memory data processing machine learning capabilities stream processing Machine Learning algorithms stream processing capabilities Collaborative filtering graph processing capabilities Pipeline API machine learning libraries Evaluation metrics Model persistence |
gptkbp:released |
gptkb:2014
|
gptkbp:released_on |
May 2014
|
gptkbp:runs_through |
gptkb:Kubernetes
gptkb:Mechagodzilla gptkb:Hadoop_YARN gptkb:Hadoop Mesos Standalone cluster |
gptkbp:security_features |
authorization
encryption authentication |
gptkbp:supports |
gptkb:Java
gptkb:Python gptkb:R Distributed computing Batch processing Graph processing In-memory processing in-memory computing Streaming data |
gptkbp:tutorials |
online tutorials
|
gptkbp:use_case |
ETL processes
real-time analytics data warehousing data science workflows log processing |
gptkbp:used_by |
Data scientists
Data engineers Big data analysts Machine learning practitioners |
gptkbp:uses |
Resilient Distributed Datasets (RDDs)
Data Frames |
gptkbp:written_in |
gptkb:Java
gptkb:SQL gptkb:Python gptkb:R gptkb:Scala |
gptkbp:bfsParent |
gptkb:Data
gptkb:machine_learning gptkb:weapons |
gptkbp:bfsLayer |
3
|