Statements (21)
Predicate | Object |
---|---|
gptkbp:instanceOf |
web crawl dataset
|
gptkbp:availableOn |
lemurproject.org/clueweb12
|
gptkbp:contains |
over 700 million web pages
|
gptkbp:createdBy |
gptkb:Carnegie_Mellon_University
|
gptkbp:format |
gptkb:WARC
|
https://www.w3.org/2000/01/rdf-schema#label |
ClueWeb12 dataset
|
gptkbp:language |
gptkb:Chinese
English |
gptkbp:license |
research use only
|
gptkbp:notableCollection |
ClueWeb12-A
ClueWeb12-B13 |
gptkbp:releaseYear |
2012
|
gptkbp:size |
about 27TB (compressed)
|
gptkbp:successor |
gptkb:ClueWeb09_dataset
|
gptkbp:usedFor |
web mining
natural language processing research information retrieval research |
gptkbp:usedIn |
gptkb:TREC_Web_Track
NTCIR Web Track |
gptkbp:bfsParent |
gptkb:Lemur_Project
|
gptkbp:bfsLayer |
7
|