Statements (20)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:dataset
|
| gptkbp:availableOn |
https://huggingface.co/datasets/mc4
https://www.tensorflow.org/datasets/community_catalog/huggingface/mc4 |
| gptkbp:basedOn |
gptkb:Common_Crawl
|
| gptkbp:contains |
web text
|
| gptkbp:createdBy |
gptkb:Google_Research
|
| gptkbp:fullName |
gptkb:Multilingual_C4
|
| gptkbp:license |
CC BY 4.0
|
| gptkbp:notableFor |
large-scale multilingual data
|
| gptkbp:releaseYear |
2020
|
| gptkbp:size |
over 750GB
|
| gptkbp:supportsLanguage |
101 languages
|
| gptkbp:usedFor |
natural language processing
pretraining large language models |
| gptkbp:usedIn |
gptkb:T5
gptkb:UL2 gptkb:mT5 |
| gptkbp:bfsParent |
gptkb:OSCAR
|
| gptkbp:bfsLayer |
7
|
| https://www.w3.org/2000/01/rdf-schema#label |
mC4
|