CC-News

GPTKB entity

Statements (19)
Predicate Object
gptkbp:instanceOf gptkb:dataset
gptkbp:availableOn public dataset
gptkbp:contains news articles
gptkbp:createdBy gptkb:Common_Crawl
gptkbp:firstReleased 2016
gptkbp:format gptkb:WARC
gptkbp:frequency monthly
https://www.w3.org/2000/01/rdf-schema#label CC-News
gptkbp:language English
gptkbp:license gptkb:CC-BY_4.0
gptkbp:size billions of words
gptkbp:source news websites
web crawls
gptkbp:url https://commoncrawl.org/2016/10/news-dataset-available/
gptkbp:usedFor gptkb:machine_learning
natural language processing
language model training
gptkbp:bfsParent gptkb:RoBERTa
gptkbp:bfsLayer 6