Hadoop Distributed File System

GPTKB entity

Statements (63)
Predicate Object
gptkbp:instance_of gptkb:computer
gptkbp:bfsLayer 4
gptkbp:bfsParent gptkb:Hadoop_Common
gptkbp:deployment commodity hardware
gptkbp:developed_by gptkb:software_framework
gptkbp:enables distributed computing
gptkbp:features fault tolerance
https://www.w3.org/2000/01/rdf-schema#label Hadoop Distributed File System
gptkbp:introduced gptkb:2006
gptkbp:is_accessible_by gptkb:Web_HDFS
command line interface
HDFSAPI
gptkbp:is_compatible_with gptkb:Map_Reduce
gptkb:park
No SQL databases
Hadoop Streaming
gptkbp:is_designed_for high availability
high throughput
storing large data sets
gptkbp:is_designed_to scale horizontally
handle hardware failures
gptkbp:is_integrated_with gptkb:Apache_Pig
gptkb:Sultan
gptkb:Apache_Flume
gptkb:Apache_Sqoop
gptkbp:is_monitored_by Hadoop Metrics
gptkbp:is_open_source gptkb:theorem
gptkbp:is_optimized_for large files
large streaming reads
gptkbp:is_part_of gptkb:Hadoop_ecosystem
data engineering workflows
data lakes
cloud storage solutions
big data solutions
gptkbp:is_scalable thousands of nodes
to petabytes of data
gptkbp:is_supported_by gptkb:Hadoop_Common
Hadoop ecosystem tools
gptkbp:is_used_by big data applications
gptkbp:is_used_for gptkb:computer
backup and recovery
data analytics
log processing
gptkbp:is_used_in gptkb:software_framework
data processing
real-time data processing
data warehousing
gptkbp:is_utilized_in ETL processes
data scientists
gptkbp:managed_by gptkb:YARN
gptkbp:notable_products data in a distributed manner
gptkbp:provides data locality
high throughput access to application data
gptkbp:security_features gptkb:Kerberos
gptkbp:setting XML files
gptkbp:suitable_for small files
gptkbp:supports multiple clients
data replication
data integrity checks
write-once, read-many access model
gptkbp:uses block storage
Name Node and Data Node architecture
gptkbp:written_in gptkb:Java