Switch Transformers: Scaling to Trillion Parameter Models
GPTKB entity
Statements (22)
Predicate | Object |
---|---|
gptkbp:instanceOf |
gptkb:academic_journal
|
gptkbp:arXivID |
2101.03961
|
gptkbp:author |
gptkb:Noam_Shazeer
gptkb:Barret_Zoph gptkb:William_Fedus |
gptkbp:citation |
1000+
|
gptkbp:demonstrates |
scaling language models to over a trillion parameters
|
gptkbp:evaluatesOn |
language modeling tasks
|
gptkbp:focusesOn |
Switch Transformer architecture
|
gptkbp:foundIn |
Switch Transformer outperforms dense models at similar computational cost
|
https://www.w3.org/2000/01/rdf-schema#label |
Switch Transformers: Scaling to Trillion Parameter Models
|
gptkbp:improves |
computational efficiency
training speed model quality |
gptkbp:proposedBy |
Mixture-of-Experts (MoE) model
|
gptkbp:publicationYear |
2021
|
gptkbp:publishedBy |
gptkb:Google_Research
|
gptkbp:url |
https://arxiv.org/abs/2101.03961
|
gptkbp:uses |
sparse activation
routing network |
gptkbp:bfsParent |
gptkb:Google_Brain_(former)
|
gptkbp:bfsLayer |
7
|