mC4

GPTKB entity

Statements (20)
Predicate Object
gptkbp:instanceOf gptkb:dataset
gptkbp:availableOn https://huggingface.co/datasets/mc4
https://www.tensorflow.org/datasets/community_catalog/huggingface/mc4
gptkbp:basedOn gptkb:Common_Crawl
gptkbp:contains web text
gptkbp:createdBy gptkb:Google_Research
gptkbp:fullName gptkb:Multilingual_C4
https://www.w3.org/2000/01/rdf-schema#label mC4
gptkbp:license CC BY 4.0
gptkbp:notableFor large-scale multilingual data
gptkbp:releaseYear 2020
gptkbp:size over 750GB
gptkbp:supportsLanguage 101 languages
gptkbp:usedFor natural language processing
pretraining large language models
gptkbp:usedIn gptkb:T5
gptkb:UL2
gptkb:mT5
gptkbp:bfsParent gptkb:OSCAR
gptkbp:bfsLayer 7