Wikipedia corpus

GPTKB entity

Statements (91)
Predicate Object
gptkbp:instanceOf gptkb:text
gptkbp:availableOn gptkb:Wikimedia_Foundation
gptkbp:basedOn gptkb:Wikipedia
gptkbp:contains Wikipedia articles
gptkbp:excludes files
modules
templates
mediawiki namespaces
non-free content
talk pages
user pages
gptkbp:format gptkb:text
XML
https://www.w3.org/2000/01/rdf-schema#label Wikipedia corpus
gptkbp:includes categories
links
redirects
references
infoboxes
gptkbp:language multiple languages
gptkbp:license gptkb:Creative_Commons_Attribution-ShareAlike_License
gptkbp:size hundreds of gigabytes
gptkbp:updated regularly
gptkbp:usedBy gptkb:Google
gptkb:OpenAI
gptkb:industry
gptkb:Meta
academic researchers
gptkbp:usedFor gptkb:machine_learning
gptkb:dialogue_systems
information retrieval
natural language processing
translator
data augmentation
transfer learning
question answering
summarization
text generation
topic modeling
sentiment analysis
semantic search
sentence similarity
text mining
text classification
language modeling
coreference resolution
few-shot learning
knowledge base construction
text summarization
zero-shot learning
named entity recognition
information extraction
document classification
entity linking
paraphrase detection
text normalization
text alignment
text annotation
text clustering
text ranking
text segmentation
text simplification
word sense disambiguation
word embedding
entity disambiguation
knowledge graph construction
word similarity
relation extraction
semantic parsing
pretraining language models
text tokenization
multilingual NLP
zero-shot classification
cross-lingual tasks
fact verification
text entailment
text tagging
text vectorization
word segmentation
word translation
zero-shot entity linking
zero-shot question answering
zero-shot relation extraction
zero-shot summarization
zero-shot text classification
zero-shot transfer
zero-shot translation
gptkbp:usedIn research
benchmark datasets
gptkbp:bfsParent gptkb:Word2Vec
gptkbp:bfsLayer 7