Statements (24)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:dataset
|
| gptkbp:createdBy |
gptkb:EleutherAI
|
| gptkbp:excludes |
gptkb:Wikipedia
Reddit comments |
| gptkbp:format |
plain text
|
| gptkbp:inspiredBy |
gptkb:OpenAI_WebText
|
| gptkbp:language |
English
|
| gptkbp:license |
gptkb:MIT_License
|
| gptkbp:notableCollection |
crawling URLs from Reddit submissions with high karma
|
| gptkbp:notableFor |
gptkb:GPT-J
gptkb:GPT-Neo gptkb:GPT-NeoX |
| gptkbp:numberOfArticles |
8 million
|
| gptkbp:relatedTo |
gptkb:The_Pile
|
| gptkbp:releaseYear |
2019
|
| gptkbp:size |
40GB
|
| gptkbp:source |
web pages
|
| gptkbp:url |
https://skylion007.github.io/OpenWebTextCorpus/
|
| gptkbp:usedFor |
language model pretraining
|
| gptkbp:bfsParent |
gptkb:CTRL
gptkb:Text-To-Text_Transfer_Transformer gptkb:RoBERTa |
| gptkbp:bfsLayer |
7
|
| https://www.w3.org/2000/01/rdf-schema#label |
OpenWebText
|