gptkbp:instanceOf
|
machine learning model architecture
|
gptkbp:advantage
|
parameter efficiency
improved scalability
specialization of sub-models
|
gptkbp:application
|
computer vision
natural language processing
speech recognition
|
gptkbp:challenge
|
load balancing
training instability
expert collapse
|
gptkbp:component
|
experts
router
input
output
gating network
|
gptkbp:field
|
gptkb:artificial_intelligence
gptkb:machine_learning
|
https://www.w3.org/2000/01/rdf-schema#label
|
Mixture of Experts
|
gptkbp:influenced
|
conditional computation research
scalable deep learning
sparse transformer architectures
|
gptkbp:introduced
|
gptkb:Ronald_A._Jacobs
|
gptkbp:introducedIn
|
1991
|
gptkbp:notablePublication
|
gptkb:Hierarchical_Mixtures_of_Experts_(Jordan_&_Jacobs,_1994)
gptkb:Switch_Transformers:_Scaling_to_Trillion_Parameter_Models_with_Simple_and_Efficient_Sparsity_(Fedus_et_al.,_2021)
gptkb:GLaM:_Efficient_Scaling_of_Language_Models_with_Mixture-of-Experts_(Du_et_al.,_2022)
|
gptkbp:purpose
|
divide and conquer learning
|
gptkbp:relatedTo
|
ensemble learning
transformer models
conditional computation
sparse neural networks
|
gptkbp:scalesTo
|
large language models
|
gptkbp:trainer
|
gptkb:reinforcement_learning
backpropagation
end-to-end learning
|
gptkbp:usedBy
|
gptkb:GPT-4_(speculated)
gptkb:Google_Switch_Transformer
gptkb:GLaM
|
gptkbp:uses
|
gating network
multiple expert models
|
gptkbp:bfsParent
|
gptkb:convolutional_neural_network
|
gptkbp:bfsLayer
|
5
|