Statements (52)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:large_language_model
|
| gptkbp:citation |
https://arxiv.org/abs/1909.08053
|
| gptkbp:developedBy |
gptkb:NVIDIA
|
| gptkbp:firstReleased |
2019
|
| gptkbp:hasFeature |
fast training
mixed precision support checkpointing efficient memory usage gradient accumulation custom datasets activation checkpointing custom tokenization distributed optimizer flexible model configuration |
| gptkbp:license |
gptkb:Apache_License_2.0
|
| gptkbp:openSource |
true
|
| gptkbp:optimizedFor |
gptkb:NVIDIA_GPUs
distributed training |
| gptkbp:programmingLanguage |
gptkb:Python
|
| gptkbp:purpose |
training large transformer models
|
| gptkbp:relatedTo |
gptkb:PyTorch
deep learning transformer architecture |
| gptkbp:repository |
https://github.com/NVIDIA/Megatron-LM
|
| gptkbp:scalableTo |
trillions of parameters
|
| gptkbp:supports |
gptkb:T5
gptkb:GPT-2 gptkb:GPT-3 gptkb:BERT FP16 data parallelism mixed precision training multi-GPU training pipeline parallelism tensor parallelism bfloat16 multi-node training |
| gptkbp:usedBy |
gptkb:industry
gptkb:researchers |
| gptkbp:usedFor |
natural language processing
text generation language modeling fine-tuning pretraining |
| gptkbp:uses |
data parallelism
model parallelism pipeline parallelism |
| gptkbp:bfsParent |
gptkb:Transformer_models
gptkb:Language_modeling gptkb:Megatron-Turing_NLG |
| gptkbp:bfsLayer |
7
|
| https://www.w3.org/2000/01/rdf-schema#label |
Megatron-LM
|