Statements (57)
| Predicate | Object |
|---|---|
| gptkbp:instanceOf |
gptkb:deep_learning_library
|
| gptkbp:designedFor |
large-scale distributed training
|
| gptkbp:developedBy |
gptkb:Microsoft
|
| gptkbp:enables |
data parallelism
model parallelism pipeline parallelism zero redundancy optimizer |
| gptkbp:feature |
gptkb:Mixture_of_Experts_(MoE)
gptkb:DeepSpeed-MoE compression scalability mixed precision training multi-GPU training checkpointing offloading gradient accumulation FP16 support multi-node training BERT support GPT support memory optimization CPU offload DeepSpeed-Inference Transformer support Turing-NLG support ZeRO-Infinity ZeRO-Offload activation partitioning automatic partitioning custom optimizer support dynamic loss scaling flexible API inference optimization low communication overhead model parallelism API optimizer state partitioning pipeline parallelism API scaling to thousands of GPUs sparse attention training efficiency |
| gptkbp:firstReleased |
2020
|
| gptkbp:license |
gptkb:MIT_License
|
| gptkbp:openSource |
true
|
| gptkbp:programmingLanguage |
gptkb:Python
|
| gptkbp:repository |
https://github.com/microsoft/DeepSpeed
|
| gptkbp:supports |
gptkb:PyTorch
|
| gptkbp:usedBy |
gptkb:OpenAI
gptkb:EleutherAI |
| gptkbp:usedFor |
training large language models
|
| gptkbp:bfsParent |
gptkb:Transformer_models
gptkb:Transformers_library gptkb:Language_modeling gptkb:Megatron-Turing_NLG gptkb:NCCL gptkb:Jonathan_Meisner |
| gptkbp:bfsLayer |
7
|
| https://www.w3.org/2000/01/rdf-schema#label |
DeepSpeed
|