Statements (23)
Predicate | Object |
---|---|
gptkbp:instanceOf |
gptkb:dataset
|
gptkbp:createdBy |
gptkb:EleutherAI
|
gptkbp:excludes |
gptkb:Wikipedia
Reddit comments |
gptkbp:format |
plain text
|
https://www.w3.org/2000/01/rdf-schema#label |
OpenWebText
|
gptkbp:inspiredBy |
gptkb:OpenAI_WebText
|
gptkbp:language |
English
|
gptkbp:license |
gptkb:MIT_License
|
gptkbp:notableCollection |
crawling URLs from Reddit submissions with high karma
|
gptkbp:notableFor |
gptkb:GPT-J
gptkb:GPT-Neo gptkb:GPT-NeoX |
gptkbp:numberOfArticles |
8 million
|
gptkbp:relatedTo |
gptkb:The_Pile
|
gptkbp:releaseYear |
2019
|
gptkbp:size |
40GB
|
gptkbp:source |
web pages
|
gptkbp:url |
https://skylion007.github.io/OpenWebTextCorpus/
|
gptkbp:usedFor |
language model pretraining
|
gptkbp:bfsParent |
gptkb:DistilGPT2
gptkb:RoBERTa |
gptkbp:bfsLayer |
6
|