Mixture of Experts

GPTKB entity

Statements (42)
Predicate Object
gptkbp:instanceOf machine learning model architecture
gptkbp:advantage parameter efficiency
improved scalability
specialization of sub-models
gptkbp:application computer vision
natural language processing
speech recognition
gptkbp:challenge load balancing
training instability
expert collapse
gptkbp:component experts
router
input
output
gating network
gptkbp:field gptkb:artificial_intelligence
gptkb:machine_learning
https://www.w3.org/2000/01/rdf-schema#label Mixture of Experts
gptkbp:influenced conditional computation research
scalable deep learning
sparse transformer architectures
gptkbp:introduced gptkb:Ronald_A._Jacobs
gptkbp:introducedIn 1991
gptkbp:notablePublication gptkb:Hierarchical_Mixtures_of_Experts_(Jordan_&_Jacobs,_1994)
gptkb:Switch_Transformers:_Scaling_to_Trillion_Parameter_Models_with_Simple_and_Efficient_Sparsity_(Fedus_et_al.,_2021)
gptkb:GLaM:_Efficient_Scaling_of_Language_Models_with_Mixture-of-Experts_(Du_et_al.,_2022)
gptkbp:purpose divide and conquer learning
gptkbp:relatedTo ensemble learning
transformer models
conditional computation
sparse neural networks
gptkbp:scalesTo large language models
gptkbp:trainer gptkb:reinforcement_learning
backpropagation
end-to-end learning
gptkbp:usedBy gptkb:GPT-4_(speculated)
gptkb:Google_Switch_Transformer
gptkb:GLaM
gptkbp:uses gating network
multiple expert models
gptkbp:bfsParent gptkb:convolutional_neural_network
gptkbp:bfsLayer 5