RedPajama-Data

GPTKB entity

Statements (21)
Predicate Object
gptkbp:instanceOf open dataset
gptkbp:contains gptkb:GitHub
gptkb:Reddit
gptkb:CommonCrawl
gptkb:Wikipedia
gptkb:ArXiv
Books
StackExchange
gptkbp:createdBy gptkb:Together_AI
gptkbp:format gptkb:text
https://www.w3.org/2000/01/rdf-schema#label RedPajama-Data
gptkbp:language English
gptkbp:license CC BY 4.0
gptkbp:notableFor open reproduction of LLaMA training dataset
gptkbp:purpose training large language models
gptkbp:releaseYear 2023
gptkbp:size 1.2 trillion tokens
gptkbp:url https://github.com/togethercomputer/RedPajama-Data
gptkbp:usedFor RedPajama-INCITE models
gptkbp:bfsParent gptkb:RedPajama
gptkbp:bfsLayer 6