Switch Transformer

GPTKB entity

Statements (33)
Predicate Object
gptkbp:instanceOf gptkb:convolutional_neural_network
gptkbp:affiliation gptkb:Google_Brain
gptkbp:application machine translation
natural language processing
language modeling
gptkbp:architecture sparse mixture-of-experts
gptkbp:author gptkb:Noam_Shazeer
gptkb:Barret_Zoph
gptkb:William_Fedus
gptkbp:basedOn gptkb:transformation
gptkbp:citation over 1000 (as of 2024)
gptkbp:developedBy gptkb:Google_Research
gptkbp:feature reduces computational cost compared to dense Transformers
uses switch layers to route tokens to different experts
enables efficient scaling to very large models
https://www.w3.org/2000/01/rdf-schema#label Switch Transformer
gptkbp:influenced gptkb:GLaM
Pathways
gptkbp:introducedIn 2021
gptkbp:notableFor improving training efficiency
introducing routing with a single expert per token
scaling language models efficiently
gptkbp:notablePublication gptkb:Switch_Transformers:_Scaling_to_Trillion_Parameter_Models_with_Simple_and_Efficient_Sparsity
gptkbp:openSource gptkb:TensorFlow
gptkb:JAX
gptkbp:parameter up to 1.6 trillion
gptkbp:relatedTo gptkb:T5
gptkb:GPT-3
gptkb:Mixture_of_Experts
gptkbp:url https://arxiv.org/abs/2101.03961
gptkbp:usedIn large-scale language models
gptkbp:bfsParent gptkb:transformation
gptkbp:bfsLayer 5