-
Notifications
You must be signed in to change notification settings - Fork 448
Can't get correct result when use cub in CUDA12.0 #719
Comments
Hello @YuanRisheng! I've tried CUDA 12.0 and 12.1 and the result seems fine: -- The CUDA compiler identification is NVIDIA 12.0.140
-- The CXX compiler identification is GNU 11.3.0
CUB version : 200001
tmp_bytes:15615
102400 Your CMakeLists seems incomplete, so I've added a few lines: project(test LANGUAGES CUDA CXX) # <-- 1
set(CMAKE_CUDA_ARCHITECTURES "89") # <-- 2
add_library(test_moduleA SHARED test_moduleA.cu)
add_library(test_moduleB SHARED test_moduleB.cu)
target_link_libraries(test_moduleB test_moduleA)
add_executable(test_main test_main.cc)
target_link_libraries(test_main test_moduleB) Could you validate the change? Also, which host compiler are you using and what's the exact version of nvcc and GPU architecture? |
Thanks for your reply! @senior-zero I change my CMakeLists as below and it doesn't work:
I use
|
There is also an interesting phenomena. When I change the size of |
@YuanRisheng I'm still unable to reproduce the issue. Could you please:
:cmake -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.0/bin/nvcc ..
-- The CUDA compiler identification is NVIDIA 12.0.140
-- The CXX compiler identification is GNU 11.3.0 # <--- this line
In case I'm unable to reproduce the issue, we might need a docker image with a reproducer. |
@YuanRisheng @elstehle here are the commands with docker: docker run --gpus all -it nvidia/cuda:12.0.0-devel-ubuntu22.04
cd
apt update && apt install cmake g++ vim git
git clone https://github.com/senior-zero/reduce_repro.git
cd reduce_repro/
./run.sh |
@senior-zero I used this docker image on my machine:
Maybe you could reproduce this issue using this image. Thank you for your help again! |
Thank your for sharing your docker image, @YuanRisheng. Unfortunately, I am still not able to reproduce using the given image. Are you able to reproduce the issue when using your image along with @senior-zero 's reproducer?
|
@elstehle WoW! I get the difference. Somehow, I opened O3 optimization before. Please use |
Thank you, @YuanRisheng! I am now able to reproduce the issue with the container image you had shared with us. So far I wasn't able to pin down the issue. I'll continue the investigation and will keep you posted. |
Thank you for bringing this issue to our attention, @YuanRisheng. We are exploring options to fix the issue. We will update this issue once we have a concrete solution. |
Thanks for your attention! This issue blocking my work in some situation and I will continue to pay attention it! |
Remind, please don't forget this issue. Thank you! |
Hi @YuanRisheng We have been quite busy recently consolidating our libraries into a single unified monorepository. That and the inherent high risk of breaking working code when changing linkage of kernels have pushed this item past the deadline for the upcoming release. Consequently, we wont have time to work on this issue until release work is done. That said, once a fix has been implemented you will be able to directly pull it from https://github.com/NVIDIA/cccl which is our new unified monorepository |
I got it and thanks! |
Hello @YuanRisheng! While we are figuring out how to address the issue, you could workaround it by wrapping CUB/Thrust namespaces in each library: test_moduleA.cu // goes before all the headers or in target_compile_definitions
#define CUB_WRAPPED_NAMESPACE A
#define THRUST_WRAPPED_NAMESPACE A
...
CUDA_CHECK(A::cub::DeviceReduce::Reduce(nullptr, tmp_bytes, trans_x, gpu_ret, n, addf, 0.0f)); test_moduleB.cu // goes before all the headers or in target_compile_definitions
#define CUB_WRAPPED_NAMESPACE B
#define THRUST_WRAPPED_NAMESPACE B
...
CUDA_CHECK(B::cub::DeviceReduce::Reduce(nullptr, tmp_bytes, trans_x, gpu_ret, n, addf, 0.0f)); |
@senior-zero I will try it! Thanks for your solution! |
Hello senior-zero May I ask if this problem has been resolved? I tried to use the above method to solve it, but failed. I think this is a very serious bug. Can you help me take a look? |
@tianshuo78520a we haven't addressed the issue on our side yet. You can track the status here NVIDIA/cccl#166. The above method should work. Note that you have to wrap cub/thrust namespaces in every shared library that uses cub/thrust for now. If wrapping namespaces doesn't work for you, it might be a different problem. In this case, please, provide a reproducer. |
Thank you for your reply. We will follow this issue and try using the above method again. |
@YuanRisheng, @tianshuo78520a we've just merged a fix that should help address the issue when wrapped namespace is not specified. Could you please verify if it works for you? |
@senior-zero Thank you for your fix. But I don't know how to use the "main" branch of cccl. The cub I have used is in cuda install dir. Do I need update cuda? or pull cccl for overriding cub in my environment? |
@YuanRisheng this page might help you use the latest CCCL version: https://github.com/NVIDIA/cccl/tree/main/examples/example_project |
@senior-zero This fix could solve my problem. Thanks for what your team has done again! |
@YuanRisheng thank you for reporting the issue! Since the fix works for you, I'm closing it. |
I get error result when I use
cub::DeviceReduce::Reduce
in CUDA12.0. This error occurs only when build shared target. This is my code:test_main.cc
test_functor.h
test_moduleA.cu
test_moduleB.cu
CMakeList code:
I get zero result when run test_main in CUDA12:
![image](https://private-user-images.githubusercontent.com/29249150/246840412-1d44350b-150c-4c8a-a576-5ed243b72356.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg5NDU0MzMsIm5iZiI6MTczODk0NTEzMywicGF0aCI6Ii8yOTI0OTE1MC8yNDY4NDA0MTItMWQ0NDM1MGItMTUwYy00YzhhLWE1NzYtNWVkMjQzYjcyMzU2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA3VDE2MTg1M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTAzMzdkOTA1ODgzM2M4YjlmYjNiYjgwNzdkOGM3MmFmMDlmNTdiMzFkMzM0NDgyNjM0M2Y5OGU1YTBjZTQyM2UmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.SveNi8NXNiEi_uHY6PZ5xoCRsLH8bgXpd2dadouX8ig)
But I get correct result in CUDA11.2:
![image](https://private-user-images.githubusercontent.com/29249150/246840659-4a042df1-b814-45c3-a9d3-069a8b3a7658.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg5NDU0MzMsIm5iZiI6MTczODk0NTEzMywicGF0aCI6Ii8yOTI0OTE1MC8yNDY4NDA2NTktNGEwNDJkZjEtYjgxNC00NWMzLWE5ZDMtMDY5YThiM2E3NjU4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA3VDE2MTg1M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTc5ZTg0MjFhZjY4MWNmYTRlMmM2MDUzMDQ2MDk2ZTBmYjQ1YWY0OTQzNjNjZjE1YTU1ODY0ZTkwMzg5MGVjZTImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.RBLQd2BA2mvilOZYmdqpXd85Uq5JgUXKW7hS26MkQqw)
Please help me to deal with this issue. Thank you!
The text was updated successfully, but these errors were encountered: