Switch Transformers: Scaling to Trillion Parameter Models

GPTKB entity

Predicate	Object
gptkbp:instanceOf	gptkb:academic_journal
gptkbp:arXivID	2101.03961
gptkbp:author	gptkb:Noam_Shazeer gptkb:Barret_Zoph gptkb:William_Fedus
gptkbp:citation	1000+
gptkbp:demonstrates	scaling language models to over a trillion parameters
gptkbp:evaluatesOn	language modeling tasks
gptkbp:focusesOn	Switch Transformer architecture
gptkbp:foundIn	Switch Transformer outperforms dense models at similar computational cost
gptkbp:improves	computational efficiency training speed model quality
gptkbp:proposedBy	Mixture-of-Experts (MoE) model
gptkbp:publicationYear	2021
gptkbp:publishedBy	gptkb:Google_Research
gptkbp:url	https://arxiv.org/abs/2101.03961
gptkbp:uses	sparse activation routing network
gptkbp:bfsParent	gptkb:Google_Brain_(former)
gptkbp:bfsLayer	7
https://www.w3.org/2000/01/rdf-schema#label	Switch Transformers: Scaling to Trillion Parameter Models