Statements (19)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:dataset
|
| gptkbp:access |
gptkb:Hugging_Face_Datasets
|
| gptkbp:basedOn |
gptkb:Common_Crawl
|
| gptkbp:contains |
web-crawled text
cleaned text deduplicated text |
| gptkbp:creator |
gptkb:Allen_Institute_for_AI
|
| gptkbp:language |
multiple languages
|
| gptkbp:license |
varies (depends on Common Crawl)
|
| gptkbp:relatedTo |
gptkb:C4_dataset
|
| gptkbp:releaseYear |
2020s
|
| gptkbp:size |
hundreds of gigabytes
|
| gptkbp:usedFor |
natural language processing
language model training |
| gptkbp:usedIn |
T5 model
mT5 model |
| gptkbp:bfsParent |
gptkb:mC4
|
| gptkbp:bfsLayer |
8
|
| https://www.w3.org/2000/01/rdf-schema#label |
Multilingual C4
|