Statements (26)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:nonprofit_organization
|
| gptkbp:dataSource |
gptkb:Amazon_S3
public datasets |
| gptkbp:focus |
open data
web crawling |
| gptkbp:format |
gptkb:WAT
gptkb:WARC WET |
| gptkbp:foundedIn |
2007
|
| gptkbp:founder |
gptkb:Gil_Elbaz
|
| gptkbp:frequency |
monthly
|
| gptkbp:license |
gptkb:Creative_Commons_Attribution-ShareAlike_License
|
| gptkbp:location |
gptkb:United_States
|
| gptkbp:mission |
to democratize access to web information
|
| gptkbp:notableUser |
gptkb:researchers
data scientists machine learning engineers search engine developers |
| gptkbp:product |
Common Crawl web archive
|
| gptkbp:size |
petabytes
|
| gptkbp:taxStatus |
501(c)(3)
|
| gptkbp:type |
gptkb:web_corpus
|
| gptkbp:website |
https://commoncrawl.org/
|
| gptkbp:bfsParent |
gptkb:Common_Crawl_Corpus
|
| gptkbp:bfsLayer |
7
|
| https://www.w3.org/2000/01/rdf-schema#label |
Common Crawl Foundation
|