Apache Spark workloads

GPTKB entity

Statements (49)
Predicate Object
gptkbp:instanceOf computing workload
gptkbp:canBe gptkb:machine_learning
gptkb:GraphX
gptkb:MLlib
gptkb:Spark_SQL
gptkb:Spark_Streaming
batch processing
stream processing
graph processing
gptkbp:canBeScheduledAs gptkb:theatre
Mission
job
gptkbp:executedBy Spark cluster
gptkbp:failureMode network issues
data skew
out of memory
https://www.w3.org/2000/01/rdf-schema#label Apache Spark workloads
gptkbp:input gptkb:HDFS
S3
local file system
gptkbp:monitors Spark History Server
Spark UI
gptkbp:optimizedFor caching
data serialization
partitioning
broadcast joins
gptkbp:output gptkb:HDFS
S3
local file system
gptkbp:relatedTo gptkb:Apache_Spark
gptkbp:runsOn gptkb:Kubernetes
gptkb:YARN
gptkb:Mesos
standalone cluster
gptkbp:scheduledBy Spark scheduler
gptkbp:tuning compression
parallelism
serialization format
dynamic allocation
driver memory
executor memory
number of executors
shuffle partitions
gptkbp:writtenBy gptkb:Java
gptkb:Python
gptkb:Scala
R
gptkbp:bfsParent gptkb:Apache_Spark_pools
gptkbp:bfsLayer 7