Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
GPTKB entity
Statements (24)
Predicate | Object |
---|---|
gptkbp:instanceOf |
gptkb:academic_journal
|
gptkbp:affiliation |
gptkb:Google_Research
|
gptkbp:arXivID |
2101.03961
|
gptkbp:author |
gptkb:Noam_Shazeer
gptkb:Barret_Zoph gptkb:William_Fedus |
gptkbp:citation |
1000+
|
gptkbp:contribution |
Uses mixture-of-experts with a single active expert per token
Demonstrates scaling to trillion-parameter models Introduces Switch Transformer, a sparse model architecture Achieves efficiency and simplicity in large-scale models |
gptkbp:field |
gptkb:machine_learning
natural language processing |
gptkbp:hasMethod |
gptkb:Switch_Transformer
|
https://www.w3.org/2000/01/rdf-schema#label |
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
|
gptkbp:impact |
Enabled training of trillion-parameter language models
|
gptkbp:openAccess |
true
|
gptkbp:publicationYear |
2021
|
gptkbp:publishedIn |
gptkb:arXiv
|
gptkbp:relatedTo |
gptkb:Mixture_of_Experts
Transformer architecture |
gptkbp:url |
https://arxiv.org/abs/2101.03961
|
gptkbp:bfsParent |
gptkb:Switch_Transformer
|
gptkbp:bfsLayer |
6
|