Switch Transformer

URI: https://gptkb.org/entity/Switch_Transformer

GPTKB entity

Predicate	Object
gptkbp:instanceOf	gptkb:convolutional_neural_network
gptkbp:affiliation	gptkb:Google_Brain
gptkbp:application	machine translation natural language processing language modeling
gptkbp:architecture	sparse mixture-of-experts
gptkbp:author	gptkb:Noam_Shazeer gptkb:Barret_Zoph gptkb:William_Fedus
gptkbp:basedOn	gptkb:transformation
gptkbp:citation	over 1000 (as of 2024)
gptkbp:developedBy	gptkb:Google_Research
gptkbp:feature	reduces computational cost compared to dense Transformers uses switch layers to route tokens to different experts enables efficient scaling to very large models
gptkbp:influenced	gptkb:GLaM Pathways
gptkbp:introducedIn	2021
gptkbp:notableFor	improving training efficiency introducing routing with a single expert per token scaling language models efficiently
gptkbp:notablePublication	gptkb:Switch_Transformers:_Scaling_to_Trillion_Parameter_Models_with_Simple_and_Efficient_Sparsity
gptkbp:openSource	gptkb:TensorFlow gptkb:JAX
gptkbp:parameter	up to 1.6 trillion
gptkbp:relatedTo	gptkb:T5 gptkb:GPT-3 gptkb:Mixture_of_Experts
gptkbp:url	https://arxiv.org/abs/2101.03961
gptkbp:usedIn	large-scale language models
gptkbp:bfsParent	gptkb:Google_Research
gptkbp:bfsLayer	6
http://www.w3.org/2000/01/rdf-schema#label	Switch Transformer