Apache Nutch

GPTKB entity

Properties (50)
Predicate Object
gptkbp:instanceOf Web crawler
gptkbp:associatedWith Regular updates
gptkbp:developedBy gptkb:Apache_Software_Foundation
gptkbp:hasAmenities API reference
Installation instructions
User guide
Configuration guide
Developer guide
gptkbp:hasFeature Scalability
Robustness
Support for various data formats
Data storage options
Support for multiple protocols
Customizable parsing
Distributed crawling
gptkbp:hasOccupation Wiki
Contributors
Developer mailing list
Issue tracker
User mailing list
gptkbp:hasPersonnel Apache License 2.0
gptkbp:hasVersion 1.19
https://www.w3.org/2000/01/rdf-schema#label Apache Nutch
gptkbp:integratesWith gptkb:Apache_Solr
gptkbp:isAvailableIn GitHub
Apache_website
gptkbp:isCompatibleWith gptkb:Apache_Tika
gptkb:Apache_HBase
gptkb:Apache_Cassandra
gptkb:PostgreSQL
gptkb:Apache_Mahout
MySQL
gptkbp:isPartOf Apache_Software_Foundation_projects
gptkbp:isSupportedBy Forums
Tutorials
Documentation
Community contributions
Online resources
gptkbp:isUsedBy Search engines
Research projects
Content aggregators
Data mining applications
Web archiving projects
gptkbp:isUsedFor Plugins
Indexing web content
gptkbp:provides Crawling capabilities
gptkbp:publishedIn gptkb:Java
gptkbp:releaseDate 2003
gptkbp:supports Web scraping
gptkbp:uses gptkb:Apache_Hadoop