Statements (22)
Predicate | Object |
---|---|
gptkbp:instanceOf |
gptkb:dataset
|
gptkbp:contains |
public domain books
English books |
gptkbp:domain |
natural language processing
|
gptkbp:format |
plain text
|
gptkbp:fullName |
Project Gutenberg Dataset (PG-19)
|
https://www.w3.org/2000/01/rdf-schema#label |
Gutenberg (PG-19)
|
gptkbp:language |
English
|
gptkbp:license |
public domain
|
gptkbp:notablePublication |
A Dataset of English Books for Long-Range Language Modeling (Rae et al., 2019)
|
gptkbp:numberOfBooks |
28,595
|
gptkbp:period |
books published before 1919
|
gptkbp:releaseYear |
gptkb:University_of_Edinburgh
2018 |
gptkbp:size |
over 2 billion words
|
gptkbp:source |
gptkb:Project_Gutenberg
|
gptkbp:url |
https://github.com/deepmind/pg19
|
gptkbp:usedFor |
gptkb:machine_learning
language modeling text analysis |
gptkbp:bfsParent |
gptkb:The_Pile
|
gptkbp:bfsLayer |
7
|