Statements (30)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:large_language_model
|
| gptkbp:activatedBy |
gptkb:SwiGLU
|
| gptkbp:architecture |
gptkb:Mixture_of_Experts
|
| gptkbp:availableOn |
gptkb:Hugging_Face
|
| gptkbp:context |
32K tokens
|
| gptkbp:developer |
gptkb:DeepSeek
|
| gptkbp:github |
https://github.com/deepseek-ai/DeepSeek-MoE
|
| gptkbp:hasModel |
decoder-only
|
| gptkbp:hasVariant |
DeepSeek-MoE-16B
DeepSeek-MoE-236B |
| gptkbp:language |
English
|
| gptkbp:license |
DeepSeek License
|
| gptkbp:numberOfExperts |
16
|
| gptkbp:openSource |
true
|
| gptkbp:parameter |
236B
|
| gptkbp:pdf |
https://arxiv.org/abs/2405.13237
|
| gptkbp:releaseDate |
2024
|
| gptkbp:routerType |
top-2 gating
|
| gptkbp:supports |
chat
code generation question answering summarization text generation reasoning tasks |
| gptkbp:tokenizer |
gptkb:bridge
|
| gptkbp:trainer |
web data
|
| gptkbp:type |
multi-head attention
|
| gptkbp:bfsParent |
gptkb:DeepSeek
|
| gptkbp:bfsLayer |
8
|
| https://www.w3.org/2000/01/rdf-schema#label |
DeepSeek-MoE
|