RedPajama dataset

GPTKB entity

Statements (25)
Predicate Object
gptkbp:instanceOf gptkb:dataset
gptkbp:availableOn https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T
gptkbp:contains gptkb:GitHub
gptkb:Reddit
gptkb:CommonCrawl
gptkb:Wikipedia
gptkb:ArXiv
Books
StackExchange
gptkbp:createdBy gptkb:Together_AI
gptkbp:format gptkb:text
https://www.w3.org/2000/01/rdf-schema#label RedPajama dataset
gptkbp:inspiredBy LLaMA dataset
gptkbp:language English
gptkbp:license various open licenses
gptkbp:notableFor large-scale open dataset for LLMs
gptkbp:openSource true
gptkbp:purpose training large language models
gptkbp:releaseYear 2023
gptkbp:size 1.2 trillion tokens
gptkbp:usedFor RedPajama-INCITE models
gptkbp:bfsParent gptkb:RedPajama-INCITE
gptkb:Together_Computer
gptkb:The_Pile
gptkbp:bfsLayer 7