-
I want to do profiling of the deepmd-kit to understand the complete flow and which Cuda kernel taking more time or less.
Another Issue I am encountering -- @denghuilu Please suggest where I am going wrong. I am using nsys-cli. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 4 replies
-
I think the execution time of the profiling progress was too long. You can control the number of training steps to be about 10 steps by modifying the parameters in the training script. In fact, the DP training of each step basically goes through the same network size, so a short-time profiling still works. |
Beta Was this translation helpful? Give feedback.
-
Solution--
For more advanced profiling options - https://docs.nvidia.com/nsight-compute/NsightComputeCli/index.html |
Beta Was this translation helpful? Give feedback.
Solution--
Issue -- ERR_NVGPUCTRPERM: Permission issue with Performance Counters
Steps to solve (Ask System Administrator to follow these steps)--
Write
options nvidia "NVreg_RestrictProfilingToAdminUsers=0
into /etc/modprobe.d/nvidia-prof.conf
Reboot
Kernel level profiling and FLOP count can be done now using Nsight compute rather than nvprof.
For more advanced profiling options - https://docs.nvidia.com/nsight-compute/NsightComputeCli/index.html