gptkbp:instanceOf
|
computing workload
|
gptkbp:canBe
|
gptkb:machine_learning
gptkb:GraphX
gptkb:MLlib
gptkb:Spark_SQL
gptkb:Spark_Streaming
batch processing
stream processing
graph processing
|
gptkbp:canBeScheduledAs
|
gptkb:theatre
Mission
job
|
gptkbp:executedBy
|
Spark cluster
|
gptkbp:failureMode
|
network issues
data skew
out of memory
|
https://www.w3.org/2000/01/rdf-schema#label
|
Apache Spark workloads
|
gptkbp:input
|
gptkb:HDFS
S3
local file system
|
gptkbp:monitors
|
Spark History Server
Spark UI
|
gptkbp:optimizedFor
|
caching
data serialization
partitioning
broadcast joins
|
gptkbp:output
|
gptkb:HDFS
S3
local file system
|
gptkbp:relatedTo
|
gptkb:Apache_Spark
|
gptkbp:runsOn
|
gptkb:Kubernetes
gptkb:YARN
gptkb:Mesos
standalone cluster
|
gptkbp:scheduledBy
|
Spark scheduler
|
gptkbp:tuning
|
compression
parallelism
serialization format
dynamic allocation
driver memory
executor memory
number of executors
shuffle partitions
|
gptkbp:writtenBy
|
gptkb:Java
gptkb:Python
gptkb:Scala
R
|
gptkbp:bfsParent
|
gptkb:Apache_Spark_pools
|
gptkbp:bfsLayer
|
7
|