Python (via Hadoop Streaming)
GPTKB entity
Statements (32)
Predicate | Object |
---|---|
gptkbp:instanceOf |
gptkb:software
|
gptkbp:advantage |
integration with Hadoop ecosystem
easy prototyping flexibility in language choice |
gptkbp:canBe |
gptkb:transformation
batch processing log analysis big data processing ETL tasks |
gptkbp:compatibleWith |
Hadoop Streaming API
|
gptkbp:documentation |
https://hadoop.apache.org/docs/stable/hadoop-streaming/HadoopStreaming.html
|
gptkbp:enables |
writing MapReduce jobs in Python
|
gptkbp:example |
data aggregation
word count log parsing |
gptkbp:format |
standard input
standard output |
https://www.w3.org/2000/01/rdf-schema#label |
Python (via Hadoop Streaming)
|
gptkbp:limitation |
less efficient for very large jobs
performance overhead compared to Java MapReduce serialization cost |
gptkbp:programmingLanguage |
gptkb:Python
|
gptkbp:requires |
interpreter
Hadoop installation |
gptkbp:runsOn |
Hadoop cluster
|
gptkbp:supports |
custom mapper scripts
custom reducer scripts |
gptkbp:usedBy |
data scientists
data engineers |
gptkbp:usedFor |
processing data with Hadoop
|
gptkbp:bfsParent |
gptkb:Hadoop_MapReduce
|
gptkbp:bfsLayer |
7
|