Python (via Hadoop Streaming)

GPTKB entity

Predicate	Object
gptkbp:instanceOf	gptkb:software
gptkbp:advantage	integration with Hadoop ecosystem easy prototyping flexibility in language choice
gptkbp:canBe	gptkb:transformation batch processing log analysis big data processing ETL tasks
gptkbp:compatibleWith	Hadoop Streaming API
gptkbp:documentation	https://hadoop.apache.org/docs/stable/hadoop-streaming/HadoopStreaming.html
gptkbp:enables	writing MapReduce jobs in Python
gptkbp:example	data aggregation word count log parsing
gptkbp:format	standard input standard output
gptkbp:limitation	less efficient for very large jobs performance overhead compared to Java MapReduce serialization cost
gptkbp:programmingLanguage	gptkb:Python
gptkbp:requires	gptkb:interpreter Hadoop installation
gptkbp:runsOn	Hadoop cluster
gptkbp:supports	custom mapper scripts custom reducer scripts
gptkbp:usedBy	data scientists data engineers
gptkbp:usedFor	processing data with Hadoop
gptkbp:bfsParent	gptkb:Hadoop_MapReduce
gptkbp:bfsLayer	7
http://www.w3.org/2000/01/rdf-schema#label	Python (via Hadoop Streaming)