ClueWeb12 dataset

GPTKB entity

Statements (21)
Predicate Object
gptkbp:instanceOf web crawl dataset
gptkbp:availableOn lemurproject.org/clueweb12
gptkbp:contains over 700 million web pages
gptkbp:createdBy gptkb:Carnegie_Mellon_University
gptkbp:format gptkb:WARC
https://www.w3.org/2000/01/rdf-schema#label ClueWeb12 dataset
gptkbp:language gptkb:Chinese
English
gptkbp:license research use only
gptkbp:notableCollection ClueWeb12-A
ClueWeb12-B13
gptkbp:releaseYear 2012
gptkbp:size about 27TB (compressed)
gptkbp:successor gptkb:ClueWeb09_dataset
gptkbp:usedFor web mining
natural language processing research
information retrieval research
gptkbp:usedIn gptkb:TREC_Web_Track
NTCIR Web Track
gptkbp:bfsParent gptkb:Lemur_Project
gptkbp:bfsLayer 7