Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

GPTKB entity

Statements (24)
Predicate Object
gptkbp:instanceOf gptkb:academic_journal
gptkbp:affiliation gptkb:Google_Research
gptkbp:arXivID 2101.03961
gptkbp:author gptkb:Noam_Shazeer
gptkb:Barret_Zoph
gptkb:William_Fedus
gptkbp:citation 1000+
gptkbp:contribution Uses mixture-of-experts with a single active expert per token
Demonstrates scaling to trillion-parameter models
Introduces Switch Transformer, a sparse model architecture
Achieves efficiency and simplicity in large-scale models
gptkbp:field gptkb:machine_learning
natural language processing
gptkbp:hasMethod gptkb:Switch_Transformer
https://www.w3.org/2000/01/rdf-schema#label Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
gptkbp:impact Enabled training of trillion-parameter language models
gptkbp:openAccess true
gptkbp:publicationYear 2021
gptkbp:publishedIn gptkb:arXiv
gptkbp:relatedTo gptkb:Mixture_of_Experts
Transformer architecture
gptkbp:url https://arxiv.org/abs/2101.03961
gptkbp:bfsParent gptkb:Switch_Transformer
gptkbp:bfsLayer 6