Hadoop

GPTKB entity

Statements (142)
Predicate Object
gptkbp:instance_of gptkb:software_framework
gptkbp:api REST APIs
Java APIs
Python APIs
C++ APIs
gptkbp:architecture master-slave architecture
gptkbp:community_support active community
gptkbp:components gptkb:research_center
Application Master
Node Manager
gptkbp:configuration gptkb:YARN_Resource_Manager
gptkb:YARN_Application_Master
gptkb:YARN_Node_Manager
gptkbp:created_by gptkb:2005
gptkbp:developed_by gptkb:Apache_Software_Foundation
gptkbp:enables scalable applications
gptkbp:has_component gptkb:Apache_Pig
gptkb:Hadoop_HDFS
gptkb:Apache_HBase
gptkb:Apache_Hive
gptkb:Apache_Flink
gptkb:Apache_Oozie
gptkb:Apache_Storm
gptkb:Apache_Spark
gptkb:Hadoop_YARN
gptkb:Apache_Mahout
gptkb:Hadoop_Common
gptkb:Hadoop_Map_Reduce
gptkb:Apache_Zoo_Keeper
gptkb:Apache_Sqoop
gptkbp:has_documentation tutorials
API references
user guides
official documentation
gptkbp:has_version 3.3.1
2.7.7
2.10.1
1.2.1
3.2.2
https://www.w3.org/2000/01/rdf-schema#label Hadoop
gptkbp:includes gptkb:Hadoop_YARN
gptkb:Map_Reduce
gptkb:Hadoop_Common
gptkb:Hadoop_Map_Reduce
gptkb:YARN
gptkb:Hadoop_Distributed_File_System_(HDFS)
gptkbp:integrates_with gptkb:Apache_Tez
gptkb:Apache_Flink
gptkb:Apache_Storm
gptkb:Apache_Spark
gptkbp:is_adopted_by government agencies
research institutions
startups
enterprises
gptkbp:is_compatible_with gptkb:Kubernetes
gptkb:Linux
gptkb:Apache_Hive
gptkb:Mac_OS
gptkb:pork
gptkb:Spark
gptkb:Hadoop_Distributed_File_System_(HDFS)
gptkb:Docker
gptkb:Windows
gptkb:HBase
gptkbp:is_designed_for data-intensive applications
gptkbp:is_documented_in Hadoop Documentation
gptkbp:is_integrated_with gptkb:Apache_Airflow
gptkb:Apache_Kafka
gptkb:Apache_Jena
gptkb:Apache_Ni_Fi
gptkb:cloud_services
gptkb:weapons
data visualization tools
data lakes
No SQL databases
gptkbp:is_often_used_in gptkb:cloud_computing
gptkb:machine_learning
data science
big data analytics
data warehousing
gptkbp:is_open_source gptkb:true
gptkbp:is_part_of gptkb:organ
Apache Software Foundation projects
big data ecosystem
gptkbp:is_popular_in business intelligence
data science
analytics
data engineering
gptkbp:is_scalable gptkb:true
thousands of nodes
gptkbp:is_supported_by community contributions
various cloud providers
gptkbp:is_used_by gptkb:Twitter
gptkb:Linked_In
gptkb:Spotify
gptkb:Yahoo
gptkb:Netflix
gptkb:e_Bay
gptkb:Facebook
large-scale data processing applications
gptkbp:is_used_for gptkb:machine_learning
gptkb:cloud_storage
data analysis
ETL processes
data processing
data transformation
real-time data processing
data mining
data warehousing
batch processing
data archiving
log processing
job scheduling and resource management
gptkbp:is_used_in gptkb:cloud_storage
data processing
gptkbp:license Apache License 2.0
gptkbp:part_of gptkb:Apache_Hadoop_ecosystem
gptkbp:provides fault tolerance
job scheduling
resource allocation
scalability
scheduling services
gptkbp:release_date gptkb:2006
gptkb:2011
gptkbp:storage petabytes of data
gptkbp:strategic_goals gptkb:true
gptkbp:supports distributed computing
large data sets
multi-tenancy
big data applications
data-intensive tasks
data processing frameworks
various programming models
gptkbp:used_for resource management
gptkbp:uses containers
distributed processing
distributed storage
Map Reduce programming model
gptkbp:written_in gptkb:Java
gptkbp:bfsParent gptkb:jet_engine
gptkb:Big_Data
gptkbp:bfsLayer 3