gptkbp:instance_of
|
gptkb:Library
|
gptkbp:available_at
|
gptkb:intellectual_property
|
gptkbp:available_on
|
gptkb:Linux
gptkb:Windows
|
gptkbp:designed_for
|
high-performance computing
|
gptkbp:developed_by
|
gptkb:NVIDIA
|
gptkbp:has
|
tutorials
community support
user guides
performance benchmarks
|
https://www.w3.org/2000/01/rdf-schema#label
|
NVIDIA NCCL
|
gptkbp:integrates_with
|
gptkb:Tensor_Flow
gptkb:Caffe
gptkb:MXNet
gptkb:Py_Torch
|
gptkbp:is_compatible_with
|
gptkb:CUDA
|
gptkbp:is_optimized_for
|
gptkb:NVIDIA_GPUs
|
gptkbp:provides
|
gptkb:Documentation
API for developers
error handling
performance monitoring
scalability
asynchronous communication
point-to-point communication
reduce operations
synchronous communication
high bandwidth communication
broadcast operations
sample codes
gather operations
all-reduce operations
scatter operations
|
gptkbp:purpose
|
collective communication
|
gptkbp:released_in
|
gptkb:2016
|
gptkbp:supports
|
gptkb:Ethernet
gptkb:NVIDIA_RTX
gptkb:NVIDIA_DGX_systems
gptkb:band
gptkb:NVIDIA_NVLink
gptkb:NVIDIA_HPC_SDK
gptkb:NVIDIA_A100
gptkb:NVIDIA_T4
gptkb:NVIDIA_V100
data parallelism
model parallelism
mixed precision training
multi-node communication
multi-GPU communication
single-node communication
|
gptkbp:used_in
|
deep learning frameworks
|
gptkbp:uses
|
direct communication
ring algorithm
tree algorithm
|
gptkbp:bfsParent
|
gptkb:NVIDIA_V100_Tensor_Core_GPU
|
gptkbp:bfsLayer
|
5
|