Statements (24)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:dataset
|
| gptkbp:availableOn |
https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T
|
| gptkbp:contains |
gptkb:GitHub
gptkb:Reddit gptkb:CommonCrawl gptkb:Wikipedia gptkb:ArXiv Books StackExchange |
| gptkbp:createdBy |
gptkb:Together_AI
|
| gptkbp:format |
gptkb:text
|
| gptkbp:inspiredBy |
LLaMA dataset
|
| gptkbp:language |
English
|
| gptkbp:license |
various open licenses
|
| gptkbp:notableFor |
large-scale open dataset for LLMs
|
| gptkbp:openSource |
true
|
| gptkbp:purpose |
training large language models
|
| gptkbp:releaseYear |
2023
|
| gptkbp:size |
1.2 trillion tokens
|
| gptkbp:usedFor |
RedPajama-INCITE models
|
| gptkbp:bfsParent |
gptkb:RedPajama-INCITE
gptkb:The_Pile |
| gptkbp:bfsLayer |
8
|
| https://www.w3.org/2000/01/rdf-schema#label |
RedPajama dataset
|