You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When transferring memory allocated by cudaMalloc3D via MPI a segfault is possible. Probably, UCX fails to identify the address as device memory and tries to access it as host memory, causing segfault.
Steps to Reproduce
Compile and run the program test.txt (rename test.txt to test.cu)
CUDA Runtime correctly identifies 0x7fdbdf6a3800 as device memory (type = 2), but uct_am_short_fill_data in backtrace indicates that host memory transfer was attempted.
The problem does not occur if any of the following modifications are done:
the program is run as UCX_MEMTYPE_CACHE=n mpirun -n 2 ./test
code is using cudaMemcpy with manual padding instead of cudaMemcpy3D
Nx is divisible by 128 or close (resulting in pitch < xsize / 0.95)
Seems that device memory region allocated by cudaMemcpy3D is incorrectly marked as having size xsize * ysize * depth instead of pitch * ysize * depth
UCX version used + UCX configure flags: bundled with NVHPC 21.9
Describe the bug
When transferring memory allocated by cudaMalloc3D via MPI a segfault is possible. Probably, UCX fails to identify the address as device memory and tries to access it as host memory, causing segfault.
Steps to Reproduce
Compile and run the program test.txt (rename test.txt to test.cu)
$ module purge $ module load nvhpc $ mpic++ test.cu -o test $ mpirun -n 2 ./test
The program segfaults with a message:
CUDA Runtime correctly identifies 0x7fdbdf6a3800 as device memory (type = 2), but uct_am_short_fill_data in backtrace indicates that host memory transfer was attempted.
The problem does not occur if any of the following modifications are done:
UCX_MEMTYPE_CACHE=n mpirun -n 2 ./test
cudaMemcpy
with manual padding instead ofcudaMemcpy3D
pitch < xsize / 0.95
)Seems that device memory region allocated by cudaMemcpy3D is incorrectly marked as having size
xsize * ysize * depth
instead ofpitch * ysize * depth
UCX version used + UCX configure flags: bundled with NVHPC 21.9
Setup and versions
lsmod|grep nv_peer_mem
and/or gdrcopy:lsmod|grep gdrdrv
: No, empty outputAdditional information (depending on the issue)
ucx_info -d
to show transports and devices recognized by UCX:Output of `ucx_info -d`
The text was updated successfully, but these errors were encountered: