gptkbp:instanceOf
|
gptkb:convolutional_neural_network
|
gptkbp:advantage
|
reduces computational cost
increases model capacity
|
gptkbp:basedOn
|
gptkb:transformation
|
gptkbp:challenge
|
load balancing
routing efficiency
training stability
|
gptkbp:citation
|
gptkb:Fedus_et_al.,_2021
gptkb:Lepikhin_et_al.,_2021
gptkb:Shazeer_et_al.,_2017
|
gptkbp:component
|
router
attention layers
expert networks
feedforward layers
gating network
|
gptkbp:developedBy
|
gptkb:Google_Research
|
gptkbp:enables
|
efficient computation
conditional computation
scaling to large model sizes
|
gptkbp:fullName
|
gptkb:Mixture_of_Experts_Transformer
|
https://www.w3.org/2000/01/rdf-schema#label
|
MoE Transformer
|
gptkbp:introducedIn
|
2021
|
gptkbp:notableFor
|
gptkb:Switch_Transformer
gptkb:GShard
gptkb:GLaM
|
gptkbp:openSource
|
gptkb:DeepSpeed
gptkb:Fairseq
gptkb:Hugging_Face_Transformers
|
gptkbp:relatedTo
|
gptkb:Dense_Transformer
gptkb:Sparse_Transformer
|
gptkbp:usedIn
|
gptkb:machine_learning
natural language processing
large language models
|
gptkbp:uses
|
gptkb:Mixture_of_Experts
|
gptkbp:bfsParent
|
gptkb:transformation
|
gptkbp:bfsLayer
|
5
|