Statements (98)
Predicate | Object |
---|---|
gptkbp:instanceOf |
gptkb:software
|
gptkbp:developedBy |
gptkb:NVIDIA
|
https://www.w3.org/2000/01/rdf-schema#label |
CUDA profiler
|
gptkbp:measures |
latency
occupancy memory usage register spilling atomic operations memory allocation memory bandwidth throughput API calls memory leaks register usage thread divergence memory deallocation memory access patterns block size concurrent kernel execution instruction throughput context switches DRAM utilization L1 cache hit rate L2 cache hit rate PCIe throughput SM efficiency bank conflicts branch efficiency cache utilization clock frequency coalesced memory accesses compute bottlenecks compute pressure compute utilization device utilization event timing instruction bottlenecks instruction issue rate instruction mix instruction replay instruction serialization kernel compute usage kernel dependencies kernel efficiency kernel execution order kernel execution time kernel grid size kernel latency kernel launch overhead kernel launch time kernel memory usage kernel occupancy kernel resource usage kernel serialization kernel synchronization kernel throughput latency bottlenecks memory alignment memory bottlenecks memory copy efficiency memory fragmentation memory padding memory pressure memory serialization memory transactions occupancy bottlenecks occupancy limiters page faults power usage register pressure serialization events shared memory bank conflicts shared memory usage stall reasons stream synchronization stream utilization surface cache usage texture cache usage throughput bottlenecks uncoalesced memory accesses unified memory usage warp execution efficiency warp serialization warp size |
gptkbp:platform |
gptkb:Windows
gptkb:Linux |
gptkbp:relatedTo |
gptkb:NVIDIA_Nsight
gptkb:NVIDIA_Visual_Profiler gptkb:Nsight_Compute gptkb:Nsight_Systems nvprof |
gptkbp:supports |
CUDA applications
|
gptkbp:usedFor |
performance analysis
GPU profiling CUDA application optimization |
gptkbp:bfsParent |
gptkb:CUDA_Toolkit
gptkb:NVIDIA_CUDA_Toolkit |
gptkbp:bfsLayer |
7
|