Hadoop Distributed File System

GPTKB entity

Statements (64)
Predicate Object
gptkbp:instance_of gptkb:cloud_storage
gptkbp:can_be_configured_for XML files
gptkbp:deployment commodity hardware
gptkbp:designed_for storing large data sets
gptkbp:developed_by gptkb:Apache_Software_Foundation
gptkbp:enables distributed computing
gptkbp:features fault tolerance
https://www.w3.org/2000/01/rdf-schema#label Hadoop Distributed File System
gptkbp:introduced_in gptkb:2006
gptkbp:is_accessible_by gptkb:Web_HDFS
command line interface
HDFS API
gptkbp:is_compatible_with gptkb:Apache_Spark
gptkb:Map_Reduce
No SQL databases
Hadoop Streaming
gptkbp:is_designed_for high availability
high throughput
gptkbp:is_designed_to scale horizontally
handle hardware failures
gptkbp:is_integrated_with gptkb:Apache_Pig
gptkb:Apache_Hive
gptkb:Apache_Flume
gptkb:Apache_Sqoop
gptkbp:is_managed_by gptkb:YARN
gptkbp:is_monitored_by Hadoop Metrics
gptkbp:is_open_source gptkb:true
gptkbp:is_optimized_for large files
large streaming reads
gptkbp:is_part_of gptkb:Hadoop_ecosystem
data engineering workflows
data lakes
cloud storage solutions
big data solutions
gptkbp:is_scalable thousands of nodes
to petabytes of data
gptkbp:is_supported_by gptkb:Hadoop_Common
Hadoop ecosystem tools
gptkbp:is_used_by big data applications
gptkbp:is_used_for gptkb:cloud_storage
backup and recovery
data analytics
log processing
gptkbp:is_used_in gptkb:machine_learning
data processing
real-time data processing
data warehousing
gptkbp:is_utilized_by data scientists
gptkbp:is_utilized_for ETL processes
gptkbp:provides data locality
high throughput access to application data
gptkbp:retailers data in a distributed manner
gptkbp:security gptkb:Kerberos
gptkbp:suitable_for small files
gptkbp:supports multiple clients
data replication
data integrity checks
write-once, read-many access model
gptkbp:uses block storage
Name Node and Data Node architecture
gptkbp:written_in gptkb:Java
gptkbp:bfsParent gptkb:The_Google_File_System
gptkb:Hadoop_Common
gptkbp:bfsLayer 5