Mixture of Experts Transformer
GPTKB entity
Statements (23)
Predicate | Object |
---|---|
gptkbp:instanceOf |
gptkb:model
gptkb:convolutional_neural_network |
gptkbp:enables |
parameter efficiency
efficient training of large models scaling to billions of parameters |
gptkbp:feature |
scalability
conditional computation sparse activation multiple expert networks routing mechanism |
gptkbp:field |
gptkb:artificial_intelligence
gptkb:machine_learning deep learning |
gptkbp:firstPublished |
2021
|
https://www.w3.org/2000/01/rdf-schema#label |
Mixture of Experts Transformer
|
gptkbp:notablePublication |
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity (2021)
|
gptkbp:relatedTo |
gptkb:Google
gptkb:Switch_Transformer gptkb:GShard |
gptkbp:usedIn |
natural language processing
large language models |
gptkbp:bfsParent |
gptkb:MoE_Transformer
|
gptkbp:bfsLayer |
6
|