Mixture of Experts Transformer

GPTKB entity

Predicate	Object
gptkbp:instanceOf	gptkb:model gptkb:convolutional_neural_network
gptkbp:enables	parameter efficiency efficient training of large models scaling to billions of parameters
gptkbp:feature	scalability conditional computation sparse activation multiple expert networks routing mechanism
gptkbp:field	gptkb:artificial_intelligence gptkb:machine_learning deep learning
gptkbp:firstPublished	2021
https://www.w3.org/2000/01/rdf-schema#label	Mixture of Experts Transformer
gptkbp:notablePublication	Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity (2021)
gptkbp:relatedTo	gptkb:Google gptkb:Switch_Transformer gptkb:GShard
gptkbp:usedIn	natural language processing large language models
gptkbp:bfsParent	gptkb:MoE_Transformer
gptkbp:bfsLayer	6