gptkbp:instance_of
|
gptkb:cloud_services
|
gptkbp:allows
|
customization of clusters
|
gptkbp:can_be_used_with
|
gptkb:Jupyter_Notebooks
|
gptkbp:developed_by
|
gptkb:Google
|
gptkbp:enables
|
real-time processing
|
https://www.w3.org/2000/01/rdf-schema#label
|
Google Cloud Dataproc
|
gptkbp:integrates_with
|
gptkb:Google
gptkb:Google_Cloud_Platform
gptkb:cloud_storage
|
gptkbp:is_available_for
|
enterprise use
|
gptkbp:is_available_in
|
multiple languages
multiple regions
|
gptkbp:is_compatible_with
|
gptkb:Apache_Airflow
third-party tools
|
gptkbp:is_effective_against
|
large datasets
|
gptkbp:is_integrated_with
|
gptkb:Google_Cloud_IAM
|
gptkbp:is_optimized_for
|
performance and cost
Google Cloud environment
|
gptkbp:is_part_of
|
gptkb:Google_Cloud_Platform
big data ecosystem
|
gptkbp:is_scalable
|
thousands of nodes
|
gptkbp:is_used_by
|
business analysts
data scientists
data engineers
|
gptkbp:is_used_for
|
gptkb:machine_learning
data analysis
ETL processes
data migration
data preparation
|
gptkbp:offers
|
data visualization tools
job scheduling
auto-scaling
data import/export capabilities
data processing pipelines
preemptible VMs
scalable clusters
data exploration capabilities
|
gptkbp:provides
|
gptkb:Command_Line_Interface
API access
monitoring and logging
security features
user-friendly interface
data transformation tools
job management tools
data lake integration
managed Apache Spark and Apache Hadoop
|
gptkbp:released
|
gptkb:2017
|
gptkbp:supports
|
gptkb:Apache_Pig
gptkb:Spark_Streaming
gptkb:Apache_Hive
gptkb:Apache_Flink
gptkb:Spark_SQL
gptkb:Apache_Zeppelin
Docker containers
big data processing
real-time analytics
data warehousing
batch processing
data science workflows
Python, Java, Scala
|
gptkbp:bfsParent
|
gptkb:Google_Inc.
gptkb:Google_Cloud
gptkb:Apache_Hive
gptkb:Google
gptkb:Apache_Spark
gptkb:cloud_services
|
gptkbp:bfsLayer
|
4
|