Skip to content

Defer cuFile feature checks until finding kvikio package#342

Merged
rapids-bot[bot] merged 5 commits intorapidsai:branch-24.04from
bdice:cufile-deferred-check
Feb 14, 2024
Merged

Defer cuFile feature checks until finding kvikio package#342
rapids-bot[bot] merged 5 commits intorapidsai:branch-24.04from
bdice:cufile-deferred-check

Conversation

@bdice
Copy link
Copy Markdown
Contributor

@bdice bdice commented Feb 13, 2024

This PR closes #341.

The kvikio INTERFACE_COMPILE_DEFINITIONS were being set based on the packages available during the libkvikio conda build (e.g. CUDA 12.2 since #328), which might not be the same packages/versions as when libkvikio is actually being used (e.g. to build libcudf with CUDA 12.0). Because we built libkvikio with CUDA 12.2 and then tried to use it with CUDA 12.0 devcontainers, the build failed to find the cuFile Stream APIs that were introduced in CUDA 12.2.

This PR defers these definitions until the call to find_package, which will then use the exact cuFile features present (if cuFile is available at all) when building a package like cudf that depends on kvikio. The libkvikio example/test binary is built with the cuFile features available at build time, for use in the libkvikio-tests conda package. However, this test binary will still be compatible with a runtime where cuFile is unavailable or is version 12.0, as it is dlopen-ing the library and has runtime checks for the batch/stream features it tries to use.

I did local testing of this PR with cudf devcontainers. I tested both 12.0 and 12.2 to reproduce (and fix) the failure, and also tested clean builds of libcudf after removing libcufile (to test when cuFile is not found). All seems to work as intended.

@bdice bdice added bug Something isn't working non-breaking Introduces a non-breaking change labels Feb 13, 2024
@bdice bdice self-assigned this Feb 13, 2024
@bdice bdice requested a review from a team as a code owner February 13, 2024 05:12
Comment thread cpp/CMakeLists.txt
Comment on lines 179 to 180
rapids_export(
BUILD kvikio
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need any changes in the rapids_export(INSTALL kvikio ...) command above? It doesn't seem that we do, from my local testing. I just want to be sure I'm not missing something.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes both the BUILD and INSTALL export commands should have the same final code block.

@bdice bdice requested review from robertmaynard and vyasr February 13, 2024 05:19
@jakirkham
Copy link
Copy Markdown
Member

Thanks Bradley! 🙏

How would this affect the Python packages built on CUDA 12.2 (especially when they are used on CUDA 12.0)?

Comment thread cpp/CMakeLists.txt
Comment on lines 179 to 180
rapids_export(
BUILD kvikio
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes both the BUILD and INSTALL export commands should have the same final code block.

Comment thread cpp/CMakeLists.txt Outdated
endif()

# Enable supported cuFile features in KvikIO
if(cuFile_FOUND)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are going to keep adding these defines, includes, link libraries each time find_package is called. We want to do this as rarely as possible,

One option is to wrap the whole check in something along the lines of:

get_property(already_set_kvikio DIRECTORY PROPERTY kvikio_already_set_defines SET)
if(NOT already_set_kvikio)
  set_property(DIRECTORY PROPERTY kvikio_already_set_defines "ON")
   find_package(cuFile)
   ...
endif()

@jakirkham
Copy link
Copy Markdown
Member

How would this affect the Python packages built on CUDA 12.2 (especially when they are used on CUDA 12.0)?

To clarify, what happens if cuDF Python is build against KvikIO on CUDA 12.2 and then runs on CUDA 12.0?

Will we start making use of new symbols from cuFile on CUDA 12.2 that are not available on CUDA 12.0? How do we handle this?

@bdice
Copy link
Copy Markdown
Contributor Author

bdice commented Feb 13, 2024

To clarify, what happens if cuDF Python is build against KvikIO on CUDA 12.2 and then runs on CUDA 12.0?

Will we start making use of new symbols from cuFile on CUDA 12.2 that are not available on CUDA 12.0? How do we handle this?

The actual calls to cuFile are guarded by dlopen and conditional checks for the relevant symbols. Building any library that depends on libkvikio with 12.2 and running on 12.0 will still be safe. This PR just makes it possible to build a library depending on libkvikio, such as libcudf, with different CUDA 12.x versions (12.0 or newer), rather than only the CUDA version that was used to build the libkvikio package (which is now 12.2).

@jakirkham
Copy link
Copy Markdown
Member

Are we sure? It looks compile time defined

#ifdef KVIKIO_CUFILE_BATCH_API_FOUND
get_symbol(BatchIOSetUp, lib, KVIKIO_STRINGIFY(cuFileBatchIOSetUp));
get_symbol(BatchIOSubmit, lib, KVIKIO_STRINGIFY(cuFileBatchIOSubmit));
get_symbol(BatchIOGetStatus, lib, KVIKIO_STRINGIFY(cuFileBatchIOGetStatus));
get_symbol(BatchIOCancel, lib, KVIKIO_STRINGIFY(cuFileBatchIOCancel));
get_symbol(BatchIODestroy, lib, KVIKIO_STRINGIFY(cuFileBatchIODestroy));
#endif

@jakirkham
Copy link
Copy Markdown
Member

Should add am ok going ahead with the build improvement

Am more saying there are other issues we might encounter next and maybe we should get out ahead of them

@bdice
Copy link
Copy Markdown
Contributor Author

bdice commented Feb 13, 2024

Are we sure? It looks compile time defined

Correct. This PR defers the definition of those macros to the libcudf compilation rather than the libkvikio "compilation" (which is just using CMake to package libkvikio for conda, since libkvikio is header-only).

The problem this solves is that libkvikio conda packages built with CUDA 12.2 assumed the batch/stream features were always available, and defined these macros even when building libcudf with CUDA 12.0 (cuFile from CUDA 12.0 doesn't have batch/stream features). That caused compilation errors. By deferring this check until find_package(KvikIO) is called, we ensure that the library that depends on libkvikio has build-time compatibility rules that are similar to CEC runtime compatibility rules.

If you build libcudf on CUDA 12.0, the batch/stream features aren't built into the library at all (no macros).
If you build libcudf on CUDA 12.2, the batch/stream features will be enabled by the macros and use runtime compatibility checks to determine whether to call those features (12.0 no, 12.2+ yes).

Copy link
Copy Markdown
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of small suggestions, but nothing critical.

Comment thread cpp/CMakeLists.txt Outdated
Comment thread cpp/CMakeLists.txt Outdated
Comment thread cpp/examples/CMakeLists.txt
Co-authored-by: Vyas Ramasubramani <vyasr@nvidia.com>
@jakirkham
Copy link
Copy Markdown
Member

The problem this solves is that libkvikio conda packages built with CUDA 12.2 assumed the batch/stream features were always available, and defined these macros even when building libcudf with CUDA 12.0 (cuFile from CUDA 12.0 doesn't have batch/stream features). That caused compilation errors. By deferring this check until find_package(KvikIO) is called, we ensure that the library that depends on libkvikio has build-time compatibility rules that are similar to CEC runtime compatibility rules.

Yep this is fine

If you build libcudf on CUDA 12.0, the batch/stream features aren't built into the library at all (no macros).
If you build libcudf on CUDA 12.2, the batch/stream features will be enabled by the macros and use runtime compatibility checks to determine whether to call those features (12.0 no, 12.2+ yes).

This is what I was worried about. Though it sounds like it shouldn't be an issue

@bdice
Copy link
Copy Markdown
Contributor Author

bdice commented Feb 13, 2024

/merge

@rapids-bot rapids-bot Bot merged commit c434cef into rapidsai:branch-24.04 Feb 14, 2024
madsbk added a commit to madsbk/kvikio that referenced this pull request May 15, 2024
madsbk added a commit to madsbk/kvikio that referenced this pull request May 15, 2024
@madsbk madsbk mentioned this pull request May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

libkvikio conda packages are built with INTERFACE_COMPILE_DEFINITIONS that require CUDA 12.2 for cuFile Stream APIs

4 participants