Apache Nutch

GPTKB entity

Statements (59)
Predicate Object
gptkbp:instance_of gptkb:Inspector
gptkbp:can_be_configured_for Web UI
XML configuration files
Command line options
gptkbp:can_be_extended_by Plugins
gptkbp:dependency gptkb:Maven
gptkb:Hadoop_ecosystem
gptkb:Java_Runtime_Environment
gptkbp:developed_by gptkb:Apache_Software_Foundation
gptkbp:has_community gptkb:Author
Forums
Active user community
Mailing lists
gptkbp:has_documentation API reference
User guide
Developer guide
gptkbp:has_feature Scalability
Robustness
Extensible architecture
Data storage options
Support for multiple protocols
Customizable parsing
Distributed crawling
https://www.w3.org/2000/01/rdf-schema#label Apache Nutch
gptkbp:integrates_with gptkb:Apache_Solr
gptkbp:is_available_on gptkb:Git_Hub
Apache website
gptkbp:is_compatible_with gptkb:Apache_Tika
gptkb:Apache_HBase
gptkb:Apache_Mahout
gptkb:Apache_Jena
gptkbp:is_optimized_for gptkb:performance
Resource efficiency
Data throughput
gptkbp:is_part_of Apache Software Foundation projects
gptkbp:is_scalable Large datasets
Cloud environments
Multiple nodes
gptkbp:is_used_by Search engines
Research projects
SEO tools
Content aggregators
Data mining applications
gptkbp:is_used_for Indexing web content
gptkbp:is_used_in Business intelligence
Academic research
Market analysis
Competitive analysis
Content discovery
gptkbp:latest_version 1.19
gptkbp:license Apache License 2.0
gptkbp:provides Crawling capabilities
gptkbp:release_date gptkb:2003
gptkbp:released Regular updates
gptkbp:supports Web scraping
gptkbp:uses gptkb:Hadoop
gptkbp:written_in gptkb:Java
gptkbp:bfsParent gptkb:Apache_Software_Foundation
gptkbp:bfsLayer 4