|
gptkbp:instanceOf
|
gptkb:machine_learning_model_architecture
|
|
gptkbp:advantage
|
scalability
parameter efficiency
specialization
|
|
gptkbp:application
|
computer vision
natural language processing
speech recognition
recommendation systems
|
|
gptkbp:architecture
|
modular neural network
conditional computation model
|
|
gptkbp:challenge
|
load balancing
routing instability
training complexity
|
|
gptkbp:citation
|
gptkb:Jacobs,_R._A.,_Jordan,_M._I.,_Nowlan,_S._J.,_&_Hinton,_G._E._(1991)._Adaptive_mixtures_of_local_experts._Neural_Computation,_3(1),_79-87.
|
|
gptkbp:component
|
expert networks
gating network
|
|
gptkbp:field
|
gptkb:artificial_intelligence
gptkb:machine_learning
|
|
gptkbp:hasComponent
|
gating function
multiple expert models
|
|
gptkbp:introduced
|
gptkb:Michael_I._Jordan
gptkb:Ronald_A._Jacobs
|
|
gptkbp:introducedIn
|
1991
|
|
gptkbp:notableFor
|
gptkb:GShard
gptkb:GPT-4_MoE_variant
gptkb:Google_Switch_Transformer
|
|
gptkbp:purpose
|
divide and conquer learning
enable specialization of sub-models
improve model scalability
|
|
gptkbp:relatedTo
|
ensemble methods
transformer models
conditional computation
sparse activation
|
|
gptkbp:trainer
|
backpropagation
expectation-maximization
|
|
gptkbp:usedIn
|
deep learning
ensemble learning
large language models
|
|
gptkbp:bfsParent
|
gptkb:Gaussian_mixture_models
|
|
gptkbp:bfsLayer
|
7
|
|
https://www.w3.org/2000/01/rdf-schema#label
|
Mixture of experts
|