Defer cuFile feature checks until finding kvikio package#342
Defer cuFile feature checks until finding kvikio package#342rapids-bot[bot] merged 5 commits intorapidsai:branch-24.04from
Conversation
| rapids_export( | ||
| BUILD kvikio |
There was a problem hiding this comment.
Do we need any changes in the rapids_export(INSTALL kvikio ...) command above? It doesn't seem that we do, from my local testing. I just want to be sure I'm not missing something.
There was a problem hiding this comment.
Yes both the BUILD and INSTALL export commands should have the same final code block.
|
Thanks Bradley! 🙏 How would this affect the Python packages built on CUDA 12.2 (especially when they are used on CUDA 12.0)? |
| rapids_export( | ||
| BUILD kvikio |
There was a problem hiding this comment.
Yes both the BUILD and INSTALL export commands should have the same final code block.
| endif() | ||
|
|
||
| # Enable supported cuFile features in KvikIO | ||
| if(cuFile_FOUND) |
There was a problem hiding this comment.
You are going to keep adding these defines, includes, link libraries each time find_package is called. We want to do this as rarely as possible,
One option is to wrap the whole check in something along the lines of:
get_property(already_set_kvikio DIRECTORY PROPERTY kvikio_already_set_defines SET)
if(NOT already_set_kvikio)
set_property(DIRECTORY PROPERTY kvikio_already_set_defines "ON")
find_package(cuFile)
...
endif()…d at DIRECTORY scope.
To clarify, what happens if cuDF Python is build against KvikIO on CUDA 12.2 and then runs on CUDA 12.0? Will we start making use of new symbols from cuFile on CUDA 12.2 that are not available on CUDA 12.0? How do we handle this? |
The actual calls to cuFile are guarded by dlopen and conditional checks for the relevant symbols. Building any library that depends on libkvikio with 12.2 and running on 12.0 will still be safe. This PR just makes it possible to build a library depending on libkvikio, such as libcudf, with different CUDA 12.x versions (12.0 or newer), rather than only the CUDA version that was used to build the libkvikio package (which is now 12.2). |
|
Are we sure? It looks compile time defined kvikio/cpp/include/kvikio/shim/cufile.hpp Lines 93 to 99 in e5bc184 |
|
Should add am ok going ahead with the build improvement Am more saying there are other issues we might encounter next and maybe we should get out ahead of them |
Correct. This PR defers the definition of those macros to the libcudf compilation rather than the libkvikio "compilation" (which is just using CMake to package libkvikio for conda, since libkvikio is header-only). The problem this solves is that libkvikio conda packages built with CUDA 12.2 assumed the batch/stream features were always available, and defined these macros even when building libcudf with CUDA 12.0 (cuFile from CUDA 12.0 doesn't have batch/stream features). That caused compilation errors. By deferring this check until If you build libcudf on CUDA 12.0, the batch/stream features aren't built into the library at all (no macros). |
vyasr
left a comment
There was a problem hiding this comment.
A couple of small suggestions, but nothing critical.
Co-authored-by: Vyas Ramasubramani <vyasr@nvidia.com>
Yep this is fine
This is what I was worried about. Though it sounds like it shouldn't be an issue |
|
/merge |
…idsai#342)" This reverts commit c434cef.
…idsai#342)" This reverts commit c434cef.
This PR closes #341.
The kvikio
INTERFACE_COMPILE_DEFINITIONSwere being set based on the packages available during the libkvikio conda build (e.g. CUDA 12.2 since #328), which might not be the same packages/versions as when libkvikio is actually being used (e.g. to build libcudf with CUDA 12.0). Because we built libkvikio with CUDA 12.2 and then tried to use it with CUDA 12.0 devcontainers, the build failed to find the cuFile Stream APIs that were introduced in CUDA 12.2.This PR defers these definitions until the call to
find_package, which will then use the exact cuFile features present (if cuFile is available at all) when building a package like cudf that depends on kvikio. The libkvikio example/test binary is built with the cuFile features available at build time, for use in thelibkvikio-testsconda package. However, this test binary will still be compatible with a runtime where cuFile is unavailable or is version 12.0, as it is dlopen-ing the library and has runtime checks for the batch/stream features it tries to use.I did local testing of this PR with cudf devcontainers. I tested both 12.0 and 12.2 to reproduce (and fix) the failure, and also tested clean builds of libcudf after removing
libcufile(to test when cuFile is not found). All seems to work as intended.