-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forward Compatibility CUDA bug with CMake 3.17+ #110
Comments
Do you not see this issue with RAJA or Umpire otherwise? That dependency comes through BLT, so I'm trying to track down how best to solve this. Also, where in that documentation does it mention forward compatibility or this problem? Mislink? |
https://cmake.org/cmake/help/latest/module/FindCUDA.html - this is the link to the old way of finding cuda that is deprecated in favor of the link that I sent through. The old targets here are called I didn't see this issue with RAJA 0.14.0 and Umpire 6.0.0, and I suspect that it is related to some of the C++ 14 changes that were made. That is largely a hunch though. Changing back to those versions with C++11 lets everything build just fine. In terms of where this dependency comes from, I narrowed it down to this section of camp CMake https://github.com/LLNL/camp/blob/main/CMakeLists.txt#L69: set (camp_runtime_backends cuda hip)
foreach (backend ${camp_backends})
string(TOUPPER "${backend}" suffix)
if ("${ENABLE_${suffix}}")
set ("CAMP_ENABLE_${suffix}" On)
endif()
if (${CAMP_ENABLE_${suffix}})
if (backend IN_LIST camp_runtime_backends)
set (backend ${backend}_runtime)
endif()
if (TARGET blt::${backend})
set (backend blt::${backend})
endif()
list (APPEND camp_depends ${backend})
endif()
endforeach()
# ...
blt_add_library (
NAME camp
HEADERS ${camp_headers}
SOURCES ./src/errors.cpp
DEPENDS_ON ${camp_depends}
) Parsing this manually, it seems like the string literal I was tempted to make this sort of change myself, but I am not as familiar with camp as I would like, but I would imagine something like the following would acheive the same thing? It is a bit difficult (as we have seen in HiOp) to pick the correct static/shared library target to use: set (camp_runtime_backends cuda hip)
foreach (backend ${camp_backends})
string(TOUPPER "${backend}" suffix)
if ("${ENABLE_${suffix}}")
set ("CAMP_ENABLE_${suffix}" On)
endif()
if (${CAMP_ENABLE_${suffix}})
if (backend IN_LIST camp_runtime_backends)
if(backend STREQUAL cuda)
find_package(CUDAToolkit)
# Linking only static could cause issues?
set(backend CUDA::cudart_static)
else()
set (backend ${backend}_runtime)
endif()
endif()
if (TARGET blt::${backend})
set (backend blt::${backend})
endif()
list (APPEND camp_depends ${backend})
endif()
endforeach()
# ...
blt_add_library (
NAME camp
HEADERS ${camp_headers}
SOURCES ./src/errors.cpp
DEPENDS_ON ${camp_depends}
) Hopefully that clarifies. |
It’s because BLT is porting from `{backend}_runtime` to
`{backend}::runtime`, starting with hip. The cuda one isn’t done, and
at present `cuda_runtime` is the correct dependence to request the cuda
runtime from blt, it actually has nothing to do with the underlying cuda
finding implementation other than that BLT currently still supports
versions of CMake from before the new one existed, so we’re a bit
stuck with it. Docs for BLT here:
https://llnl-blt.readthedocs.io/en/develop/tutorial/common_hpc_dependencies.html#cuda
Ideally we should update the handling in BLT to make this work better,
but we may need a stopgap to get you working again. Would you be
alright with something like “CAMP_NO_INTERFACE_RUNTIMES” or similar
to remove that type of dependency from the interface? That’s the
fastest thing I can think of to work around this problem generically.
Sadly the change you suggested would break our BLT integration. That’s
actually almost exactly how camp used to work until we became BLT
integrated to fix some integration issues with RAJA and Umpire in the
last release.
|
@trws FWIW, Chris is working on the BLT updates for CUDA support to align what is done for HIP. He's run into some issues ripping out the object library stuff, which has issues with CUDA. So it's taking longer than we expected. |
Thanks @rhornung67, I'm glad to hear that's going forward, but thankfully I don't think we're necessarily dependent on that rework to fix this. I can put in the stopgap in camp for now at least, and we may be able to do something in BLT to either select the static runtime or allow the runtime interface deps to not propagate. |
I'm satisfied with whatever is the most practical solution. I think a CMake If there is a fix coming through BLT that we are just waiting on, then I think we can wait for that. We are fine using the older RAJA and Umpire versions for the moment, but a temporary stopgap would be fine. I think this would have to be added to the camp spack package in the end, or I can use a specific camp branch for temporary development. Thanks for the quick updates! |
@cameronrutherford we'll keep you posted when the BLT updates and related changes in RAJA, Umpire, Camp, etc. are available. |
@cameronrutherford, in working on some updates to the cmake here and in RAJA (partly spurred by this but also to fix other issues) I think we may already support this by use of BLT_EXPORT_THIRDPARTY=Off in the cmake configure. I'm not 100% sure that will remove the library dependency, but it will remove the target from the export. If that doesn't do it, we'll have to add something to completely prevent camp from linking to the library at all, which is a bit more complicated. |
I've been working with @cameronrutherford on testing out this suggested fix and am still running into some issues. These are the changes I made to the RAJA spack package to add a toggle for the BLT_EXPORT_THIRDPARTY
With the suggested BLT_EXPORT_THIRDPARTY=Off, we run into an OpenMP compiler compatibility error in RAJA. We ran into this error with gcc8.5.0 and gcc10.2.0. Error snippet:
(full output below the cut)
|
Thanks for debugging this @jaelynlitz. We can probably disable OpenMP to avoid this issue for the moment, but I think this is worth debugging. We tried to debug the OpenMP compatibility with our compiler by looking at docs here, but struggled to interpret what version we should use. It is also unclear through spack and this build configuration what version of OpenMP we are pulling in in the first place. |
When CMake is invoked, is ENABLE_OPENMP (no RAJA_ prefix) being set to on? |
@rhornung67 How do I check for that? I don't see it at hiop's cmake level |
This may be a red herring, but I thought it may spark a thought. It looks like ENABLE_OPENMP must be on for RAJA_ENABLE_OPENMP to be on since RAJA_ENABLE_OPENMP is a CMake dependent option in RAJA. It doesn't make sense to me that gcc would not support OpenMP, or at least define the variable _OPENMP when OpenMP is enabled, which is standard. The logic here indicates that _OPENMP is not defined https://github.com/LLNL/RAJA/blob/develop/include/RAJA/config.hpp.in#L244 |
I'm not sure if it will be helpful or harmful for this issue, but the target export setup and various issues around it have been substantially reworked in the most recent releases of camp and umpire, and on develop in RAJA. It may be worth trying with those, and if you are using spack ensuring that you use the packages for those three from the most recent develop version of spack, to see if the rework helped or not. For the OpenMP issue, what compiler and version is this with? It's extremely strange that the build would be able to get to that error, since cmake should have prevented it before that point, unless it's happening in a project including RAJA that is building without the OpenMP flags somehow. |
I updated spack to grab raja@develop, [email protected], and [email protected] and am still running into the issue. I've tested with compilers gcc8.5.0 and gcc10.2.0 with no change - but I cannot find what version of OpenMP is being used. Error output below the cut for reference:
|
I'm still noticing I realize that the original issue has been resolved, but I think continuing discussion is reasonable. |
This error should only come up when compiling an application without OpenMP against a RAJA that was built with OpenMP. I also notice the camp in that spec does not list a +openmp variant, which tells me you aren't using the version of the camp package from spack's develop branch, so camp isn't getting the correct flags to add the OpenMP target correctly to the output. |
Sorry, it took me long to test this. I cloned recursively RAJA repo and checked out the latest RAJA release. These are versions I got:
I also used
I built RAJA as a shared library with CUDA backend enabled and with Turning off BLT exporting third party libraries helped me pass the CMake configuration for HiOp, but the "empty" flag [ 37%] Built target hiopLinAlg
[ 38%] Linking CUDA device code CMakeFiles/hiop_shared.dir/cmake_device_link.o
[ 38%] Linking CXX shared library libhiop.so
/usr/bin/ld: cannot find -lcuda_runtime
collect2: error: ld returned 1 exit status @cameronrutherford's brute force workaround to comment out any mentioning of The target |
The OpenMP issue @jaelynlitz found is unrelated to the forward compatibility with CMake 3.17, so I suggest we discuss it elsewhere. |
@pelesh We do not use blt in SUNDIALS, so we run into this issue of the target |
In HiOp, we link against CUDA in the following fashion:
However, when building [email protected] with [email protected] and [email protected], the camp CMake exported configuration ends up creating the camp target as follows (in campTargets.cmake):
Per CMake documentation https://cmake.org/cmake/help/latest/module/FindCUDAToolkit.html#module:FindCUDAToolkit, this breaks forward compatibility for including and linking CUDA libraries.
When building and running with this configuration of camp, RAJA and Umpire, I end up getting the following adding to my link line:
Which generates an ld error:
Simply commenting out the line
INTERFACE_LINK_LIBRARIES "cuda_runtime"
in campTargets.cmake temporarily resolves the issue, but this should probably be updated for forward compatibility.@davidbeckingsale @pelesh
The text was updated successfully, but these errors were encountered: