Mixtral 8x7B

URI: https://gptkb.org/entity/Mixtral_8x7B

GPTKB entity

Predicate	Object
gptkbp:instanceOf	gptkb:large_language_model
gptkbp:activatedBy	gptkb:SwiGLU
gptkbp:activeParametersPerToken	12.9B
gptkbp:architecture	gptkb:Mixture_of_Experts
gptkbp:attentionHeads	48
gptkbp:availableOn	gptkb:Hugging_Face
gptkbp:context	32,000 tokens
gptkbp:contrastsWith	gptkb:Llama_2_70B gptkb:GPT-3.5
gptkbp:developedBy	gptkb:Mistral_AI
gptkbp:fineTunedWith	true
gptkbp:format	gptkb:text
gptkbp:github	https://github.com/mistralai/mixtral-8x7b
gptkbp:hasModel	decoder-only transformer
gptkbp:hiddenSize	4096
gptkbp:intendedUse	gptkb:research commercial applications
gptkbp:language	gptkb:French gptkb:German gptkb:Italian gptkb:Romanian gptkb:Spanish Czech Dutch English Polish Portuguese Swedish
gptkbp:layer	32
gptkbp:license	Apache 2.0
gptkbp:notableFor	high efficiency open weights state-of-the-art performance
gptkbp:numberOfExperts	8
gptkbp:openSource	true
gptkbp:parametersPerExpert	7B
gptkbp:pretrained	true
gptkbp:releaseYear	2023
gptkbp:supports	gptkb:translator code generation question answering summarization text generation reasoning tasks
gptkbp:tokenizer	gptkb:SentencePiece
gptkbp:totalParameters	46.7B
gptkbp:trainer	gptkb:law gptkb:Wikipedia gptkb:Common_Crawl books web data multilingual data
gptkbp:uses	Mixture of Experts routing grouped-query attention rotary positional embeddings sparse routing
gptkbp:bfsParent	gptkb:MISTRAL gptkb:Mistral
gptkbp:bfsLayer	6
http://www.w3.org/2000/01/rdf-schema#label	Mixtral 8x7B