GLaM: Efficient Scaling of Language Models with Mixture-of-Experts (Du et al., 2022)

GPTKB entity