ClueWeb09 dataset

GPTKB entity

Statements (21)
Predicate Object
gptkbp:instanceOf web crawl dataset
gptkbp:access requires license
gptkbp:citation Callan, Jamie, et al. 'The ClueWeb09 Dataset.' (2009)
gptkbp:contains web pages
gptkbp:createdBy gptkb:Carnegie_Mellon_University
gptkbp:format gptkb:WARC
gptkbp:homeTo https://lemurproject.org/clueweb09.php
https://www.w3.org/2000/01/rdf-schema#label ClueWeb09 dataset
gptkbp:language gptkb:Chinese
English
gptkbp:notableCollection ClueWeb09 Category A
ClueWeb09 Category B
gptkbp:partOf gptkb:Lemur_Project
gptkbp:releaseYear 2009
gptkbp:size 1 billion web pages
gptkbp:successor gptkb:ClueWeb12_dataset
gptkbp:usedFor natural language processing
web mining
information retrieval research
gptkbp:bfsParent gptkb:Lemur_Project
gptkbp:bfsLayer 7