Apache Spark workloads

URI: https://gptkb.org/entity/Apache_Spark_workloads

GPTKB entity

Predicate	Object
gptkbp:instanceOf	gptkb:computing_workload
gptkbp:canBe	gptkb:machine_learning gptkb:GraphX gptkb:MLlib gptkb:Spark_SQL gptkb:Spark_Streaming batch processing stream processing graph processing
gptkbp:canBeScheduledAs	gptkb:job gptkb:theatre gptkb:Mission
gptkbp:executedBy	Spark cluster
gptkbp:failureMode	network issues data skew out of memory
gptkbp:input	gptkb:HDFS S3 local file system
gptkbp:monitors	Spark History Server Spark UI
gptkbp:optimizedFor	caching data serialization partitioning broadcast joins
gptkbp:output	gptkb:HDFS S3 local file system
gptkbp:relatedTo	gptkb:Apache_Spark
gptkbp:runsOn	gptkb:Kubernetes gptkb:YARN gptkb:Mesos standalone cluster
gptkbp:scheduledBy	Spark scheduler
gptkbp:tuning	gptkb:serialization_format compression parallelism dynamic allocation driver memory executor memory number of executors shuffle partitions
gptkbp:writtenBy	gptkb:Java gptkb:Python gptkb:Scala R
gptkbp:bfsParent	gptkb:Apache_Spark_pools
gptkbp:bfsLayer	7
http://www.w3.org/2000/01/rdf-schema#label	Apache Spark workloads