Statements (63)
Predicate | Object |
---|---|
gptkbp:instance_of |
gptkb:Data
|
gptkbp:can |
structured data
|
gptkbp:can_be_extended_by |
custom functions
|
gptkbp:can_be_used_to |
Spark Thrift Server
Spark SQL CLI |
gptkbp:can_create |
execution plans
|
gptkbp:can_handle |
streaming data
batch data |
gptkbp:can_perform |
joins
aggregations filtering operations distributed queries |
gptkbp:connects |
gptkb:Apache_Parquet
gptkb:Apache_Hive JSON data sources JDBC data sources |
gptkbp:deployment |
cloud platforms
on-premises servers |
gptkbp:developed_by |
gptkb:Apache_Software_Foundation
|
https://www.w3.org/2000/01/rdf-schema#label |
Spark SQL engine
|
gptkbp:integrates_with |
gptkb:Apache_Airflow
gptkb:Apache_Flink gptkb:Apache_Kafka gptkb:Apache_Spark |
gptkbp:is_available_on |
gptkb:Git_Hub
|
gptkbp:is_compatible_with |
gptkb:Java
gptkb:Python gptkb:R gptkb:Scala SQL standards |
gptkbp:is_documented_in |
official documentation
|
gptkbp:is_effective_against |
data processing tasks
|
gptkbp:is_often_used_in |
ETL processes
data engineering data warehousing |
gptkbp:is_optimized_for |
big data processing
in-memory computing |
gptkbp:is_part_of |
big data frameworks
Apache Spark ecosystem |
gptkbp:is_scalable |
large datasets
|
gptkbp:is_supported_by |
community contributions
|
gptkbp:is_used_for |
data visualization
real-time analytics reporting |
gptkbp:is_used_in |
gptkb:machine_learning
business intelligence data analytics data science |
gptkbp:provides |
Data Frame API
SQL interface Spark Session |
gptkbp:supports |
SQL queries
subqueries window functions user-defined functions (UDFs) schema inference data manipulation language (DML) Hive QL data definition language (DDL) data source API |
gptkbp:uses |
gptkb:Catalyst_optimizer
|
gptkbp:bfsParent |
gptkb:Catalyst_optimizer
|
gptkbp:bfsLayer |
6
|