Statements (58)
Predicate | Object |
---|---|
gptkbp:instance_of |
gptkb:Library
|
gptkbp:developed_by |
gptkb:NVIDIA
|
gptkbp:enables |
data parallelism
|
gptkbp:features |
gptkb:broadcasting
asynchronous operations error handling load balancing performance monitoring asynchronous communication peer-to-peer communication synchronous communication multi-threading support reduce dynamic topology all-gather collective communication algorithms ring all-reduce tree all-reduce |
https://www.w3.org/2000/01/rdf-schema#label |
NCCL
|
gptkbp:integrates_with |
gptkb:CUDA
|
gptkbp:is_compatible_with |
gptkb:Tensor_Flow
gptkb:MXNet gptkb:Py_Torch |
gptkbp:is_optimized_for |
gptkb:NVIDIA_GPUs
|
gptkbp:provides |
high bandwidth
interoperability low latency performance optimization resource management scalability high throughput performance tuning tools user-friendly API collective operations |
gptkbp:released_in |
gptkb:2016
|
gptkbp:supports |
gptkb:NVIDIA_RTX
gptkb:NVIDIA_DGX_systems gptkb:NVIDIA_NVLink gptkb:NVIDIA_A100 gptkb:NVIDIA_T4 gptkb:NVIDIA_V100 gptkb:PCIe multi-node communication multi-GPU communication CUDA-aware MPI NVIDIA GPUs in data centers |
gptkbp:used_for |
collective communication
|
gptkbp:used_in |
gptkb:cloud_computing
gptkb:machine_learning data analytics high-performance computing scientific computing deep learning frameworks distributed training AI training |
gptkbp:written_in |
gptkb:C++
|
gptkbp:bfsParent |
gptkb:Torch_Distributed
|
gptkbp:bfsLayer |
6
|