Statements (30)
Predicate | Object |
---|---|
gptkbp:instanceOf |
large language model
|
gptkbp:activatedBy |
gptkb:SwiGLU
|
gptkbp:architecture |
gptkb:Mixture_of_Experts
|
gptkbp:availableOn |
gptkb:Hugging_Face
|
gptkbp:context |
32K tokens
|
gptkbp:developer |
gptkb:DeepSeek
|
gptkbp:github |
https://github.com/deepseek-ai/DeepSeek-MoE
|
gptkbp:hasModel |
decoder-only
|
gptkbp:hasVariant |
DeepSeek-MoE-16B
DeepSeek-MoE-236B |
https://www.w3.org/2000/01/rdf-schema#label |
DeepSeek-MoE
|
gptkbp:language |
English
|
gptkbp:license |
DeepSeek License
|
gptkbp:numberOfExperts |
16
|
gptkbp:openSource |
true
|
gptkbp:parameter |
236B
|
gptkbp:pdf |
https://arxiv.org/abs/2405.13237
|
gptkbp:releaseDate |
2024
|
gptkbp:routerType |
top-2 gating
|
gptkbp:supports |
chat
code generation question answering summarization text generation reasoning tasks |
gptkbp:tokenizer |
gptkb:bridge
|
gptkbp:trainer |
web data
|
gptkbp:type |
multi-head attention
|
gptkbp:bfsParent |
gptkb:DeepSeek
|
gptkbp:bfsLayer |
6
|