Statements (33)
Predicate | Object |
---|---|
gptkbp:instanceOf |
gptkb:archives
|
gptkbp:access |
open data
|
gptkbp:category |
gptkb:dataset
web data internet archive |
gptkbp:format |
gptkb:WAT
gptkb:WARC WET |
gptkbp:foundedBy |
gptkb:Gil_Elbaz
|
gptkbp:foundedYear |
2007
|
gptkbp:frequency |
monthly
|
gptkbp:headquartersLocation |
gptkb:United_States
|
https://www.w3.org/2000/01/rdf-schema#label |
CommonCrawl
|
gptkbp:language |
multilingual
|
gptkbp:license |
gptkb:Creative_Commons_Attribution-ShareAlike
|
gptkbp:nonProfitStatus |
true
|
gptkbp:organization |
gptkb:nonprofit_organization
|
gptkbp:size |
petabytes
|
gptkbp:twitter |
@CommonCrawl
|
gptkbp:type |
metadata
text data web crawl data WARC files |
gptkbp:usedBy |
gptkb:researchers
developers companies |
gptkbp:usedFor |
gptkb:machine_learning
natural language processing search engine development web analysis |
gptkbp:website |
https://commoncrawl.org/
|
gptkbp:bfsParent |
gptkb:XLM-R
|
gptkbp:bfsLayer |
6
|