DeepSeek-MoE

GPTKB entity

Statements (30)
Predicate Object
gptkbp:instanceOf large language model
gptkbp:activatedBy gptkb:SwiGLU
gptkbp:architecture gptkb:Mixture_of_Experts
gptkbp:availableOn gptkb:Hugging_Face
gptkbp:context 32K tokens
gptkbp:developer gptkb:DeepSeek
gptkbp:github https://github.com/deepseek-ai/DeepSeek-MoE
gptkbp:hasModel decoder-only
gptkbp:hasVariant DeepSeek-MoE-16B
DeepSeek-MoE-236B
https://www.w3.org/2000/01/rdf-schema#label DeepSeek-MoE
gptkbp:language English
gptkbp:license DeepSeek License
gptkbp:numberOfExperts 16
gptkbp:openSource true
gptkbp:parameter 236B
gptkbp:pdf https://arxiv.org/abs/2405.13237
gptkbp:releaseDate 2024
gptkbp:routerType top-2 gating
gptkbp:supports chat
code generation
question answering
summarization
text generation
reasoning tasks
gptkbp:tokenizer gptkb:bridge
gptkbp:trainer web data
gptkbp:type multi-head attention
gptkbp:bfsParent gptkb:DeepSeek
gptkbp:bfsLayer 6