Statements (21)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:large_language_model
|
| gptkbp:canBeFilteredBy |
toxicity
duplicates non-linguistic content |
| gptkbp:contains |
web text
Common Crawl data |
| gptkbp:developedBy |
gptkb:Google_Research
|
| gptkbp:fullName |
gptkb:Massive_C4
|
| gptkbp:language |
101
multilingual |
| gptkbp:license |
gptkb:CC-BY_4.0
|
| gptkbp:releaseYear |
2020
|
| gptkbp:size |
750GB
|
| gptkbp:usedFor |
multilingual NLP research
pretraining large language models |
| gptkbp:usedIn |
gptkb:T5
gptkb:UL2 gptkb:mT5 |
| gptkbp:bfsParent |
gptkb:Jungle_Rules
|
| gptkbp:bfsLayer |
7
|
| https://www.w3.org/2000/01/rdf-schema#label |
MC4
|