Switch Transformers: Scaling to Trillion Parameter Models

GPTKB entity

Statements (22)
Predicate Object
gptkbp:instanceOf gptkb:academic_journal
gptkbp:arXivID 2101.03961
gptkbp:author gptkb:Noam_Shazeer
gptkb:Barret_Zoph
gptkb:William_Fedus
gptkbp:citation 1000+
gptkbp:demonstrates scaling language models to over a trillion parameters
gptkbp:evaluatesOn language modeling tasks
gptkbp:focusesOn Switch Transformer architecture
gptkbp:foundIn Switch Transformer outperforms dense models at similar computational cost
https://www.w3.org/2000/01/rdf-schema#label Switch Transformers: Scaling to Trillion Parameter Models
gptkbp:improves computational efficiency
training speed
model quality
gptkbp:proposedBy Mixture-of-Experts (MoE) model
gptkbp:publicationYear 2021
gptkbp:publishedBy gptkb:Google_Research
gptkbp:url https://arxiv.org/abs/2101.03961
gptkbp:uses sparse activation
routing network
gptkbp:bfsParent gptkb:Google_Brain_(former)
gptkbp:bfsLayer 7