Mixture of Experts

URI: https://gptkb.org/entity/Mixture_of_Experts

GPTKB entity

Predicate	Object
gptkbp:instanceOf	gptkb:machine_learning_model_architecture
gptkbp:advantage	parameter efficiency improved scalability specialization of sub-models
gptkbp:application	computer vision natural language processing speech recognition
gptkbp:challenge	load balancing training instability expert collapse
gptkbp:component	gptkb:router experts input output gating network
gptkbp:field	gptkb:artificial_intelligence gptkb:machine_learning
gptkbp:influenced	conditional computation research scalable deep learning sparse transformer architectures
gptkbp:introduced	gptkb:Ronald_A._Jacobs
gptkbp:introducedIn	1991
gptkbp:notablePublication	gptkb:Hierarchical_Mixtures_of_Experts_(Jordan_&_Jacobs,_1994) gptkb:Switch_Transformers:_Scaling_to_Trillion_Parameter_Models_with_Simple_and_Efficient_Sparsity_(Fedus_et_al.,_2021) gptkb:GLaM:_Efficient_Scaling_of_Language_Models_with_Mixture-of-Experts_(Du_et_al.,_2022)
gptkbp:purpose	divide and conquer learning
gptkbp:relatedTo	ensemble learning transformer models conditional computation sparse neural networks
gptkbp:scalesTo	large language models
gptkbp:trainer	gptkb:reinforcement_learning backpropagation end-to-end learning
gptkbp:usedBy	gptkb:GPT-4_(speculated) gptkb:Google_Switch_Transformer gptkb:GLaM
gptkbp:uses	gating network multiple expert models
gptkbp:bfsParent	gptkb:Switch_Transformer gptkb:Mixtral_8x7B
gptkbp:bfsLayer	7
http://www.w3.org/2000/01/rdf-schema#label	Mixture of Experts