Statements (41)
Predicate | Object |
---|---|
gptkbp:instanceOf |
machine learning model architecture
|
gptkbp:advantage |
scalability
parameter efficiency specialization |
gptkbp:application |
computer vision
natural language processing speech recognition recommendation systems |
gptkbp:architecture |
modular neural network
conditional computation model |
gptkbp:challenge |
load balancing
routing instability training complexity |
gptkbp:citation |
gptkb:Jacobs,_R._A.,_Jordan,_M._I.,_Nowlan,_S._J.,_&_Hinton,_G._E._(1991)._Adaptive_mixtures_of_local_experts._Neural_Computation,_3(1),_79-87.
|
gptkbp:component |
expert networks
gating network |
gptkbp:field |
gptkb:artificial_intelligence
gptkb:machine_learning |
gptkbp:hasComponent |
gating function
multiple expert models |
https://www.w3.org/2000/01/rdf-schema#label |
Mixture of experts
|
gptkbp:introduced |
gptkb:Michael_I._Jordan
gptkb:Ronald_A._Jacobs |
gptkbp:introducedIn |
1991
|
gptkbp:notableFor |
gptkb:GShard
gptkb:GPT-4_MoE_variant gptkb:Google_Switch_Transformer |
gptkbp:purpose |
divide and conquer learning
enable specialization of sub-models improve model scalability |
gptkbp:relatedTo |
ensemble methods
transformer models conditional computation sparse activation |
gptkbp:trainer |
backpropagation
expectation-maximization |
gptkbp:usedIn |
deep learning
ensemble learning large language models |
gptkbp:bfsParent |
gptkb:Gaussian_mixture_models
|
gptkbp:bfsLayer |
6
|