gptkbp:instanceOf
|
large language model
|
gptkbp:activatedBy
|
gptkb:SwiGLU
|
gptkbp:activeParametersPerToken
|
12.9B
|
gptkbp:architecture
|
gptkb:Mixture_of_Experts
|
gptkbp:attentionHeads
|
48
|
gptkbp:availableOn
|
gptkb:Hugging_Face
|
gptkbp:context
|
32,000 tokens
|
gptkbp:contrastsWith
|
gptkb:Llama_2_70B
gptkb:GPT-3.5
|
gptkbp:developedBy
|
gptkb:Mistral_AI
|
gptkbp:fineTunedWith
|
true
|
gptkbp:format
|
gptkb:text
|
gptkbp:github
|
https://github.com/mistralai/mixtral-8x7b
|
gptkbp:hasModel
|
decoder-only transformer
|
gptkbp:hiddenSize
|
4096
|
https://www.w3.org/2000/01/rdf-schema#label
|
Mixtral 8x7B
|
gptkbp:intendedUse
|
research
commercial applications
|
gptkbp:language
|
gptkb:French
gptkb:German
gptkb:Italian
gptkb:Romanian
gptkb:Spanish
Czech
Dutch
English
Polish
Portuguese
Swedish
|
gptkbp:layer
|
32
|
gptkbp:license
|
Apache 2.0
|
gptkbp:notableFor
|
high efficiency
open weights
state-of-the-art performance
|
gptkbp:numberOfExperts
|
8
|
gptkbp:openSource
|
true
|
gptkbp:parametersPerExpert
|
7B
|
gptkbp:pretrained
|
true
|
gptkbp:releaseYear
|
2023
|
gptkbp:supports
|
translator
code generation
question answering
summarization
text generation
reasoning tasks
|
gptkbp:tokenizer
|
gptkb:SentencePiece
|
gptkbp:totalParameters
|
46.7B
|
gptkbp:trainer
|
gptkb:law
gptkb:Wikipedia
gptkb:Common_Crawl
books
web data
multilingual data
|
gptkbp:uses
|
Mixture of Experts routing
grouped-query attention
rotary positional embeddings
sparse routing
|
gptkbp:bfsParent
|
gptkb:Mistral
|
gptkbp:bfsLayer
|
5
|