Apache Tika

GPTKB entity

Statements (131)
Predicate Object
gptkbp:instance_of gptkb:software
gptkb:Library
gptkb:project
gptkbp:available_at Apache License 2.0
gptkbp:can audio files
XML files
video files
PDF files
image files
HTML files
Microsoft Office files
gptkbp:can_be_extended_by yes
gptkbp:community open source community
gptkbp:community_support open source community
gptkbp:contribution volunteer contributions
gptkbp:dependency gptkb:Tika_Core
gptkb:Apache_Lucene
gptkb:Apache_POI
gptkb:Apache_PDFBox
gptkb:Tika_Server
gptkb:Apache_Commons_IO
Tika Parser
gptkbp:developed_by gptkb:metadata
gptkb:Apache_Software_Foundation
audio files
XML files
ZIP files
video files
PDF files
image files
text content
HTML files
Microsoft Office files
gptkbp:features language detection
metadata extraction
text extraction
document type detection
gptkbp:first_released gptkb:2009
gptkbp:has_community gptkb:Performance_Monitoring
active user community
mailing lists
contribution guidelines
gptkbp:has_documentation API documentation
official website
tutorials
API reference
release notes
user guide
developer guide
gptkbp:has_feature yes
language detection
content type detection
metadata extraction
text extraction
metadata extraction from documents
text extraction from images
gptkbp:has_integration_with gptkb:Apache_Airflow
gptkb:Apache_Camel
gptkb:Spring_Framework
gptkbp:has_restapi gptkb:Tika_Server
https://www.w3.org/2000/01/rdf-schema#label Apache Tika
gptkbp:integration gptkb:Apache_Nutch
gptkb:Google
gptkb:Apache_Solr
gptkb:Hadoop
Content Management Systems
gptkbp:interface Tika CLI
gptkbp:is_available_on gptkb:Maven_Central
gptkb:Git_Hub
gptkbp:is_compatible_with gptkb:Java_SE
gptkb:Apache_Nutch
gptkb:Java_EE
gptkb:Apache_Solr
gptkb:Hadoop
gptkbp:is_part_of gptkb:organ
Apache Software Foundation projects
gptkbp:is_scalable yes
gptkbp:is_used_by gptkb:developers
gptkb:researchers
data analysts
data scientists
gptkbp:is_used_for data mining
content management
digital forensics
document indexing
gptkbp:is_used_in gptkb:cloud_services
enterprise applications
content management systems
data processing
search engines
web applications
data mining
big data applications
digital forensics
gptkbp:latest_version 2.7.0
gptkbp:license Apache License 2.0
gptkbp:platform yes
gptkbp:production_status active
gptkbp:programming_language gptkb:Java
gptkbp:project gptkb:open-source_software
text analysis
metadata extraction
file format support
content extraction
content detection
gptkbp:provides REST API
content analysis
command line interface
gptkbp:release_date gptkb:2007
2009-03-19
gptkbp:supports multiple file formats
gptkbp:use_case big data processing
data integration
metadata management
search engines
data mining
content management
information retrieval
document management
digital forensics
text analytics
automated content classification
gptkbp:used_for content analysis
metadata extraction
text extraction
gptkbp:uses gptkb:Apache_Lucene
gptkbp:website https://tika.apache.org
gptkbp:written_in gptkb:Java
gptkbp:bfsParent gptkb:Apache
gptkb:Apache_Software_Foundation
gptkbp:bfsLayer 4