Statements (56)
Predicate | Object |
---|---|
gptkbp:instanceOf |
deep learning library
|
gptkbp:designedFor |
large-scale distributed training
|
gptkbp:developedBy |
gptkb:Microsoft
|
gptkbp:enables |
data parallelism
model parallelism pipeline parallelism zero redundancy optimizer |
gptkbp:feature |
gptkb:Mixture_of_Experts_(MoE)
gptkb:DeepSpeed-MoE compression scalability mixed precision training multi-GPU training checkpointing offloading gradient accumulation FP16 support multi-node training BERT support GPT support memory optimization CPU offload DeepSpeed-Inference Transformer support Turing-NLG support ZeRO-Infinity ZeRO-Offload activation partitioning automatic partitioning custom optimizer support dynamic loss scaling flexible API inference optimization low communication overhead model parallelism API optimizer state partitioning pipeline parallelism API scaling to thousands of GPUs sparse attention training efficiency |
gptkbp:firstReleased |
2020
|
https://www.w3.org/2000/01/rdf-schema#label |
DeepSpeed
|
gptkbp:license |
gptkb:MIT_License
|
gptkbp:openSource |
true
|
gptkbp:programmingLanguage |
gptkb:Python
|
gptkbp:repository |
https://github.com/microsoft/DeepSpeed
|
gptkbp:supports |
gptkb:PyTorch
|
gptkbp:usedBy |
gptkb:OpenAI
gptkb:EleutherAI |
gptkbp:usedFor |
training large language models
|
gptkbp:bfsParent |
gptkb:Transformers_library
gptkb:Megatron-Turing_NLG gptkb:GPT-NeoX gptkb:MoE_Transformer gptkb:Jonathan_Meisner |
gptkbp:bfsLayer |
6
|