Statements (33)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:archives
|
| gptkbp:access |
open data
|
| gptkbp:category |
gptkb:dataset
web data internet archive |
| gptkbp:format |
gptkb:WAT
gptkb:WARC WET |
| gptkbp:foundedBy |
gptkb:Gil_Elbaz
|
| gptkbp:foundedYear |
2007
|
| gptkbp:frequency |
monthly
|
| gptkbp:headquartersLocation |
gptkb:United_States
|
| gptkbp:language |
multilingual
|
| gptkbp:license |
gptkb:Creative_Commons_Attribution-ShareAlike
|
| gptkbp:nonProfitStatus |
true
|
| gptkbp:organization |
gptkb:nonprofit_organization
|
| gptkbp:size |
petabytes
|
| gptkbp:twitter |
@CommonCrawl
|
| gptkbp:type |
metadata
text data web crawl data WARC files |
| gptkbp:usedBy |
gptkb:researchers
developers companies |
| gptkbp:usedFor |
gptkb:machine_learning
natural language processing search engine development web analysis |
| gptkbp:website |
https://commoncrawl.org/
|
| gptkbp:bfsParent |
gptkb:XLM-R
|
| gptkbp:bfsLayer |
7
|
| https://www.w3.org/2000/01/rdf-schema#label |
CommonCrawl
|