Statements (19)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:dataset
|
| gptkbp:availableOn |
https://huggingface.co/datasets/Skylion007/openwebtext2
|
| gptkbp:canBeFilteredBy |
quality and content guidelines
|
| gptkbp:contains |
text documents
|
| gptkbp:createdBy |
gptkb:EleutherAI
|
| gptkbp:format |
gptkb:JSONL
|
| gptkbp:inspiredBy |
gptkb:OpenAI_WebText
|
| gptkbp:language |
English
|
| gptkbp:license |
gptkb:MIT_License
|
| gptkbp:releaseYear |
2023
|
| gptkbp:size |
over 10 billion tokens
|
| gptkbp:source |
web pages
|
| gptkbp:usedFor |
language model training
|
| gptkbp:usedIn |
gptkb:Pythia
gptkb:GPT-NeoX |
| gptkbp:bfsParent |
gptkb:The_Pile
gptkb:The_Pile:_An_800GB_Dataset_of_Diverse_Text_for_Language_Modeling |
| gptkbp:bfsLayer |
8
|
| https://www.w3.org/2000/01/rdf-schema#label |
OpenWebText2
|