Python (via Hadoop Streaming)
GPTKB entity
Statements (32)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:software
|
| gptkbp:advantage |
integration with Hadoop ecosystem
easy prototyping flexibility in language choice |
| gptkbp:canBe |
gptkb:transformation
batch processing log analysis big data processing ETL tasks |
| gptkbp:compatibleWith |
Hadoop Streaming API
|
| gptkbp:documentation |
https://hadoop.apache.org/docs/stable/hadoop-streaming/HadoopStreaming.html
|
| gptkbp:enables |
writing MapReduce jobs in Python
|
| gptkbp:example |
data aggregation
word count log parsing |
| gptkbp:format |
standard input
standard output |
| gptkbp:limitation |
less efficient for very large jobs
performance overhead compared to Java MapReduce serialization cost |
| gptkbp:programmingLanguage |
gptkb:Python
|
| gptkbp:requires |
gptkb:interpreter
Hadoop installation |
| gptkbp:runsOn |
Hadoop cluster
|
| gptkbp:supports |
custom mapper scripts
custom reducer scripts |
| gptkbp:usedBy |
data scientists
data engineers |
| gptkbp:usedFor |
processing data with Hadoop
|
| gptkbp:bfsParent |
gptkb:Hadoop_MapReduce
|
| gptkbp:bfsLayer |
7
|
| https://www.w3.org/2000/01/rdf-schema#label |
Python (via Hadoop Streaming)
|