Statements (52)
Predicate | Object |
---|---|
gptkbp:instanceOf |
large language model
|
gptkbp:citation |
https://arxiv.org/abs/1909.08053
|
gptkbp:developedBy |
gptkb:NVIDIA
|
gptkbp:firstReleased |
2019
|
gptkbp:hasFeature |
fast training
mixed precision support checkpointing efficient memory usage gradient accumulation custom datasets activation checkpointing custom tokenization distributed optimizer flexible model configuration |
https://www.w3.org/2000/01/rdf-schema#label |
Megatron-LM
|
gptkbp:license |
gptkb:Apache_License_2.0
|
gptkbp:openSource |
true
|
gptkbp:optimizedFor |
gptkb:NVIDIA_GPUs
distributed training |
gptkbp:programmingLanguage |
gptkb:Python
|
gptkbp:purpose |
training large transformer models
|
gptkbp:relatedTo |
gptkb:PyTorch
deep learning transformer architecture |
gptkbp:repository |
https://github.com/NVIDIA/Megatron-LM
|
gptkbp:scalableTo |
trillions of parameters
|
gptkbp:supports |
gptkb:T5
gptkb:GPT-2 gptkb:GPT-3 gptkb:BERT FP16 data parallelism mixed precision training multi-GPU training pipeline parallelism tensor parallelism bfloat16 multi-node training |
gptkbp:usedBy |
gptkb:industry
gptkb:researchers |
gptkbp:usedFor |
natural language processing
text generation language modeling fine-tuning pretraining |
gptkbp:uses |
data parallelism
model parallelism pipeline parallelism |
gptkbp:bfsParent |
gptkb:NVIDIA_AI_Research
gptkb:Megatron-Turing_NLG gptkb:GPT-NeoX |
gptkbp:bfsLayer |
6
|