gptkbp:instanceOf
|
gptkb:convolutional_neural_network
|
gptkbp:affiliation
|
gptkb:Google_Brain
|
gptkbp:application
|
machine translation
natural language processing
language modeling
|
gptkbp:architecture
|
sparse mixture-of-experts
|
gptkbp:author
|
gptkb:Noam_Shazeer
gptkb:Barret_Zoph
gptkb:William_Fedus
|
gptkbp:basedOn
|
gptkb:transformation
|
gptkbp:citation
|
over 1000 (as of 2024)
|
gptkbp:developedBy
|
gptkb:Google_Research
|
gptkbp:feature
|
reduces computational cost compared to dense Transformers
uses switch layers to route tokens to different experts
enables efficient scaling to very large models
|
https://www.w3.org/2000/01/rdf-schema#label
|
Switch Transformer
|
gptkbp:influenced
|
gptkb:GLaM
Pathways
|
gptkbp:introducedIn
|
2021
|
gptkbp:notableFor
|
improving training efficiency
introducing routing with a single expert per token
scaling language models efficiently
|
gptkbp:notablePublication
|
gptkb:Switch_Transformers:_Scaling_to_Trillion_Parameter_Models_with_Simple_and_Efficient_Sparsity
|
gptkbp:openSource
|
gptkb:TensorFlow
gptkb:JAX
|
gptkbp:parameter
|
up to 1.6 trillion
|
gptkbp:relatedTo
|
gptkb:T5
gptkb:GPT-3
gptkb:Mixture_of_Experts
|
gptkbp:url
|
https://arxiv.org/abs/2101.03961
|
gptkbp:usedIn
|
large-scale language models
|
gptkbp:bfsParent
|
gptkb:transformation
|
gptkbp:bfsLayer
|
5
|