Mixtral 8x7B

GPTKB entity

Statements (59)
Predicate Object
gptkbp:instanceOf large language model
gptkbp:activatedBy gptkb:SwiGLU
gptkbp:activeParametersPerToken 12.9B
gptkbp:architecture gptkb:Mixture_of_Experts
gptkbp:attentionHeads 48
gptkbp:availableOn gptkb:Hugging_Face
gptkbp:context 32,000 tokens
gptkbp:contrastsWith gptkb:Llama_2_70B
gptkb:GPT-3.5
gptkbp:developedBy gptkb:Mistral_AI
gptkbp:fineTunedWith true
gptkbp:format gptkb:text
gptkbp:github https://github.com/mistralai/mixtral-8x7b
gptkbp:hasModel decoder-only transformer
gptkbp:hiddenSize 4096
https://www.w3.org/2000/01/rdf-schema#label Mixtral 8x7B
gptkbp:intendedUse research
commercial applications
gptkbp:language gptkb:French
gptkb:German
gptkb:Italian
gptkb:Romanian
gptkb:Spanish
Czech
Dutch
English
Polish
Portuguese
Swedish
gptkbp:layer 32
gptkbp:license Apache 2.0
gptkbp:notableFor high efficiency
open weights
state-of-the-art performance
gptkbp:numberOfExperts 8
gptkbp:openSource true
gptkbp:parametersPerExpert 7B
gptkbp:pretrained true
gptkbp:releaseYear 2023
gptkbp:supports translator
code generation
question answering
summarization
text generation
reasoning tasks
gptkbp:tokenizer gptkb:SentencePiece
gptkbp:totalParameters 46.7B
gptkbp:trainer gptkb:law
gptkb:Wikipedia
gptkb:Common_Crawl
books
web data
multilingual data
gptkbp:uses Mixture of Experts routing
grouped-query attention
rotary positional embeddings
sparse routing
gptkbp:bfsParent gptkb:Mistral
gptkbp:bfsLayer 5