Mixture of Experts (MoE)

GPTKB entity

Statements (40)
Predicate Object
gptkbp:instanceOf machine learning model architecture
gptkbp:advantage improves efficiency
enables model sparsity
specialization of sub-models
gptkbp:application computer vision
natural language processing
speech recognition
gptkbp:category gptkb:convolutional_neural_network
ensemble method
deep learning technique
gptkbp:challenge load balancing
training instability
routing complexity
gptkbp:component expert networks
gating network
gptkbp:expertNetworkFunction processes input data
gptkbp:field gptkb:artificial_intelligence
gptkb:machine_learning
gptkbp:gatingNetworkFunction selects which experts to activate for each input
https://www.w3.org/2000/01/rdf-schema#label Mixture of Experts (MoE)
gptkbp:introduced gptkb:Ronald_A._Jacobs
gptkbp:introducedIn 1991
gptkbp:notableFor gptkb:Switch_Transformer
gptkb:GShard
Pathways
gptkbp:purpose divide complex tasks among specialized models
gptkbp:relatedPaper gptkb:GShard:_Scaling_Giant_Models_with_Conditional_Computation_and_Automatic_Sharding_(Lepikhin_et_al.,_2020)
gptkb:Hierarchical_Mixtures_of_Experts_(Jordan_&_Jacobs,_1994)
gptkb:Switch_Transformers:_Scaling_to_Trillion_Parameter_Models_with_Simple_and_Efficient_Sparsity_(Fedus_et_al.,_2021)
gptkbp:relatedTo ensemble learning
conditional computation
sparse neural networks
gptkbp:size enables large-scale models with fewer computations per example
gptkbp:trainer backpropagation
gptkbp:usedIn gptkb:GPT-4
gptkb:DeepSpeed-MoE
gptkb:Google's_PaLM
gptkb:GLaM
gptkbp:bfsParent gptkb:Noam_Shazeer
gptkbp:bfsLayer 6