You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to use nvprof to profile my cuda+mpi application. But the little test shows that the options --annote-mpi openmpi does not produce any information about MPI interface as described in the nvprof document. The following is the information of example for the test:
The text was updated successfully, but these errors were encountered:
sheltongeosx
changed the title
MPI annotion option does not output any MPI information
MPI annotation option does not output any MPI information
Jan 10, 2020
Dear Nvprof developers:
I want to use nvprof to profile my cuda+mpi application. But the little test shows that the options --annote-mpi openmpi does not produce any information about MPI interface as described in the nvprof document. The following is the information of example for the test:
Sample Test:
From Link: http://geco.mines.edu/tesla/cuda_tutorial_mio/
Source Files: mpi_hello_gpu.cu, vecadd.cu
OpenMPI Version: 4.0.2
Cuda Version: 10.1
Command: $ mpirun -np 2 nvprof --annotate-mpi openmpi ./mpi_cuda
Output ( using 2 mpi processes):
rank 0 of 2 on p3dev02 received bcastme[3]=3 [gpu 0]
rank 1 of 2 on p3dev02 received bcastme[3]=3 [gpu 1]
==70253== NVPROF is profiling process 70253, command: ./mpi_cuda
==70254== NVPROF is profiling process 70254, command: ./mpi_cuda
rank 0: cudaGetDevice()=0
rank 1: cudaGetDevice()=1
rank 1: C[0]=0.000000
ranksum= 1
==70253== Profiling application: ./mpi_cuda
==70253== Profiling result:
Type Time(%) Time Calls Avg Min Max Name
GPU activities: 62.58% 3.1040us 2 1.5520us 1.3440us 1.7600us [CUDA memcpy HtoD]
37.42% 1.8560us 1 1.8560us 1.8560us 1.8560us [CUDA memcpy DtoH]
API calls: 86.74% 352.44ms 3 117.48ms 10.267us 352.42ms cudaMalloc
5.39% 21.910ms 582 37.645us 258ns 2.0794ms cuDeviceGetAttribute
4.75% 19.303ms 50000 386ns 303ns 102.73us cudaLaunchKernel
2.07% 8.3917ms 6 1.3986ms 1.1406ms 1.4661ms cuDeviceTotalMem
0.68% 2.7607ms 1 2.7607ms 2.7607ms 2.7607ms cudaGetDeviceProperties
0.34% 1.3713ms 6 228.55us 215.41us 247.59us cuDeviceGetName
0.02% 66.319us 3 22.106us 14.092us 30.931us cudaMemcpy
0.01% 20.708us 3 6.9020us 1.8690us 16.755us cudaFree
0.00% 12.278us 6 2.0460us 1.3700us 4.3850us cuDeviceGetPCIBusId
0.00% 7.5770us 12 631ns 375ns 973ns cuDeviceGet
0.00% 6.6190us 1 6.6190us 6.6190us 6.6190us cudaSetDevice
0.00% 6.2070us 4 1.5510us 867ns 2.3670us cuPointerGetAttributes
0.00% 2.3390us 6 389ns 354ns 461ns cuDeviceGetUuid
0.00% 1.8280us 3 609ns 437ns 780ns cuDeviceGetCount
0.00% 1.5210us 1 1.5210us 1.5210us 1.5210us cudaGetDevice
0.00% 1.2300us 1 1.2300us 1.2300us 1.2300us cudaGetDeviceCount
==70254== Profiling application: ./mpi_cuda
==70254== Profiling result:
Type Time(%) Time Calls Avg Min Max Name
GPU activities: 100.00% 179.83ms 50000 3.5960us 3.5510us 4.0640us vecAdd(float*, float*, float*)
0.00% 3.0400us 2 1.5200us 1.3440us 1.6960us [CUDA memcpy HtoD]
0.00% 2.0480us 1 2.0480us 2.0480us 2.0480us [CUDA memcpy DtoH]
API calls: 68.49% 884.64ms 50000 17.692us 16.647us 1.4335ms cudaLaunchKernel
28.85% 372.61ms 3 124.20ms 15.212us 372.57ms cudaMalloc
1.55% 20.003ms 582 34.368us 453ns 1.2518ms cuDeviceGetAttribute
0.76% 9.7675ms 6 1.6279ms 1.6077ms 1.6602ms cuDeviceTotalMem
0.25% 3.2029ms 1 3.2029ms 3.2029ms 3.2029ms cudaGetDeviceProperties
0.10% 1.2356ms 6 205.93us 135.78us 224.53us cuDeviceGetName
0.01% 103.42us 3 34.473us 19.464us 60.273us cudaMemcpy
0.00% 60.895us 3 20.298us 4.2420us 51.665us cudaFree
0.00% 16.364us 4 4.0910us 2.0370us 9.1220us cuPointerGetAttributes
0.00% 14.154us 6 2.3590us 1.9510us 3.1620us cuDeviceGetPCIBusId
0.00% 11.338us 12 944ns 580ns 1.5080us cuDeviceGet
0.00% 7.3840us 1 7.3840us 7.3840us 7.3840us cudaSetDevice
0.00% 3.8410us 6 640ns 592ns 673ns cuDeviceGetUuid
0.00% 2.7020us 3 900ns 699ns 1.0970us cuDeviceGetCount
0.00% 1.9360us 1 1.9360us 1.9360us 1.9360us cudaGetDevice
0.00% 1.2750us 1 1.2750us 1.2750us 1.2750us cudaGetDeviceCount
Hope you can reproduce the issue.
Best,
Shelton
The text was updated successfully, but these errors were encountered: