Mixture of Experts (MoE)

URI: https://gptkb.org/entity/Mixture_of_Experts_(MoE)

GPTKB entity

Statements (40)

Predicate	Object
gptkbp:instanceOf	gptkb:machine_learning_model_architecture
gptkbp:advantage	improves efficiency enables model sparsity specialization of sub-models
gptkbp:application	computer vision natural language processing speech recognition
gptkbp:category	gptkb:convolutional_neural_network gptkb:ensemble_method deep learning technique
gptkbp:challenge	load balancing training instability routing complexity
gptkbp:component	expert networks gating network
gptkbp:expertNetworkFunction	processes input data
gptkbp:field	gptkb:artificial_intelligence gptkb:machine_learning
gptkbp:gatingNetworkFunction	selects which experts to activate for each input
gptkbp:introduced	gptkb:Ronald_A._Jacobs
gptkbp:introducedIn	1991
gptkbp:notableFor	gptkb:Switch_Transformer gptkb:GShard Pathways
gptkbp:purpose	divide complex tasks among specialized models
gptkbp:relatedPaper	gptkb:GShard:_Scaling_Giant_Models_with_Conditional_Computation_and_Automatic_Sharding_(Lepikhin_et_al.,_2020) gptkb:Hierarchical_Mixtures_of_Experts_(Jordan_&_Jacobs,_1994) gptkb:Switch_Transformers:_Scaling_to_Trillion_Parameter_Models_with_Simple_and_Efficient_Sparsity_(Fedus_et_al.,_2021)
gptkbp:relatedTo	ensemble learning conditional computation sparse neural networks
gptkbp:size	enables large-scale models with fewer computations per example
gptkbp:trainer	backpropagation
gptkbp:usedIn	gptkb:GPT-4 gptkb:DeepSpeed-MoE gptkb:Google's_PaLM gptkb:GLaM
gptkbp:bfsParent	gptkb:Noam_Shazeer
gptkbp:bfsLayer	7
http://www.w3.org/2000/01/rdf-schema#label	Mixture of Experts (MoE)