Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
GPTKB entity
Statements (24)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:academic_journal
|
| gptkbp:affiliation |
gptkb:Google_Research
|
| gptkbp:arXivID |
2101.03961
|
| gptkbp:author |
gptkb:Noam_Shazeer
gptkb:Barret_Zoph gptkb:William_Fedus |
| gptkbp:citation |
1000+
|
| gptkbp:contribution |
Uses mixture-of-experts with a single active expert per token
Demonstrates scaling to trillion-parameter models Introduces Switch Transformer, a sparse model architecture Achieves efficiency and simplicity in large-scale models |
| gptkbp:field |
gptkb:machine_learning
natural language processing |
| gptkbp:hasMethod |
gptkb:Switch_Transformer
|
| gptkbp:impact |
Enabled training of trillion-parameter language models
|
| gptkbp:openAccess |
true
|
| gptkbp:publicationYear |
2021
|
| gptkbp:publishedIn |
gptkb:arXiv
|
| gptkbp:relatedTo |
gptkb:Mixture_of_Experts
Transformer architecture |
| gptkbp:url |
https://arxiv.org/abs/2101.03961
|
| gptkbp:bfsParent |
gptkb:Switch_Transformer
|
| gptkbp:bfsLayer |
7
|
| https://www.w3.org/2000/01/rdf-schema#label |
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
|