MoE Transformer

URI: https://gptkb.org/entity/MoE_Transformer

GPTKB entity

Predicate	Object
gptkbp:instanceOf	gptkb:convolutional_neural_network
gptkbp:advantage	reduces computational cost increases model capacity
gptkbp:basedOn	gptkb:transformation
gptkbp:challenge	load balancing routing efficiency training stability
gptkbp:citation	gptkb:Fedus_et_al.,_2021 gptkb:Lepikhin_et_al.,_2021 gptkb:Shazeer_et_al.,_2017
gptkbp:component	router attention layers expert networks feedforward layers gating network
gptkbp:developedBy	gptkb:Google_Research
gptkbp:enables	efficient computation conditional computation scaling to large model sizes
gptkbp:fullName	gptkb:Mixture_of_Experts_Transformer
https://www.w3.org/2000/01/rdf-schema#label	MoE Transformer
gptkbp:introducedIn	2021
gptkbp:notableFor	gptkb:Switch_Transformer gptkb:GShard gptkb:GLaM
gptkbp:openSource	gptkb:DeepSpeed gptkb:Fairseq gptkb:Hugging_Face_Transformers
gptkbp:relatedTo	gptkb:Dense_Transformer gptkb:Sparse_Transformer
gptkbp:usedIn	gptkb:machine_learning natural language processing large language models
gptkbp:uses	gptkb:Mixture_of_Experts
gptkbp:bfsParent	gptkb:transformation
gptkbp:bfsLayer	5