Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

GPTKB entity

Predicate	Object
gptkbp:instanceOf	gptkb:academic_journal
gptkbp:affiliation	gptkb:Google_Research
gptkbp:arXivID	2101.03961
gptkbp:author	gptkb:Noam_Shazeer gptkb:Barret_Zoph gptkb:William_Fedus
gptkbp:citation	1000+
gptkbp:contribution	Uses mixture-of-experts with a single active expert per token Demonstrates scaling to trillion-parameter models Introduces Switch Transformer, a sparse model architecture Achieves efficiency and simplicity in large-scale models
gptkbp:field	gptkb:machine_learning natural language processing
gptkbp:hasMethod	gptkb:Switch_Transformer
gptkbp:impact	Enabled training of trillion-parameter language models
gptkbp:openAccess	true
gptkbp:publicationYear	2021
gptkbp:publishedIn	gptkb:arXiv
gptkbp:relatedTo	gptkb:Mixture_of_Experts Transformer architecture
gptkbp:url	https://arxiv.org/abs/2101.03961
gptkbp:bfsParent	gptkb:Switch_Transformer
gptkbp:bfsLayer	7
http://www.w3.org/2000/01/rdf-schema#label	Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity