Statements (26)
Predicate | Object |
---|---|
gptkbp:instanceOf |
gptkb:nonprofit_organization
|
gptkbp:dataSource |
gptkb:Amazon_S3
public datasets |
gptkbp:focus |
open data
web crawling |
gptkbp:format |
gptkb:WAT
gptkb:WARC WET |
gptkbp:foundedIn |
2007
|
gptkbp:founder |
gptkb:Gil_Elbaz
|
gptkbp:frequency |
monthly
|
https://www.w3.org/2000/01/rdf-schema#label |
Common Crawl Foundation
|
gptkbp:license |
gptkb:Creative_Commons_Attribution-ShareAlike_License
|
gptkbp:location |
gptkb:United_States
|
gptkbp:mission |
to democratize access to web information
|
gptkbp:notableUser |
gptkb:researchers
data scientists machine learning engineers search engine developers |
gptkbp:product |
Common Crawl web archive
|
gptkbp:size |
petabytes
|
gptkbp:taxStatus |
501(c)(3)
|
gptkbp:type |
web corpus
|
gptkbp:website |
https://commoncrawl.org/
|
gptkbp:bfsParent |
gptkb:Common_Crawl_Corpus
|
gptkbp:bfsLayer |
7
|