Skip to content

Upgrade to nvCOMP 5.0.0.6#19636

Merged
rapids-bot[bot] merged 22 commits intorapidsai:branch-25.10from
vuule:test-nvcomp-5
Aug 20, 2025
Merged

Upgrade to nvCOMP 5.0.0.6#19636
rapids-bot[bot] merged 22 commits intorapidsai:branch-25.10from
vuule:test-nvcomp-5

Conversation

@vuule
Copy link
Contributor

@vuule vuule commented Aug 8, 2025

Description

Issue #19686

Update the nvCOMP version and the JNI code that directly uses nvCOMP. The libcudf code is already compatible with this version.

Depends on rapidsai/rapids-cmake#896 and rapidsai/kvikio#800

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

firestarman and others added 5 commits July 8, 2025 21:33
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Aug 8, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@vyasr
Copy link
Contributor

vyasr commented Aug 13, 2025

To get the wheel builds passing you need to change all lines in build_wheel*.sh files that do an auditwheel exclude of libnvcomp.so.4 to libnvcomp.so.5. Here is one example.

@bdice
Copy link
Contributor

bdice commented Aug 13, 2025

change all lines in build_wheel*.sh files that do an auditwheel exclude of libnvcomp.so.4 to libnvcomp.so.5

We could use libnvcomp.so.* instead. That’d be cleanest imo.

@vuule
Copy link
Contributor Author

vuule commented Aug 13, 2025

I changed the only place where libnvcomp.so.4 is used to libnvcomp.so.* and now the wheel builds pass. However, the wheel tests fail.
@vyasr @bdice any ideas what else is missing here?

# repair wheels and write to the location that artifact-uploading code expects to find them
python -m auditwheel repair \
--exclude libnvcomp.so.4 \
--exclude libnvcomp.so.* \
Copy link
Contributor

@bdice bdice Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vuule Aha. This excludes nvcomp 5 from the cudf wheel. Previously we were safe excluding nvcomp 4 because the kvikio wheel was shipping the nvcomp shared library for us. We need to update kvikio to nvcomp 5 first, or just delete this line for testing to bundle nvcomp 5 into the cudf wheel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I've removed the --exclude lines, the wheel build is back to failing.
Any help would be appreciated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's land rapidsai/kvikio#798 first and then come back to this.

@vuule vuule added feature request New feature or request non-breaking Non-breaking change labels Aug 18, 2025
@vuule
Copy link
Contributor Author

vuule commented Aug 18, 2025

/ok to test

@copy-pr-bot
Copy link

copy-pr-bot bot commented Aug 18, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@vuule
Copy link
Contributor Author

vuule commented Aug 18, 2025

/ok to test 50a2820

@vuule vuule added DO NOT MERGE Hold off on merging; see PR for details and removed 5 - Ready to Merge Testing and reviews complete, ready to merge labels Aug 19, 2025
@rapidsai rapidsai deleted a comment from copy-pr-bot bot Aug 19, 2025
@vuule vuule closed this Aug 19, 2025
@vuule vuule reopened this Aug 19, 2025
@vuule vuule marked this pull request as ready for review August 19, 2025 22:30
@vuule vuule requested review from a team as code owners August 19, 2025 22:30
@vuule vuule requested a review from jameslamb August 19, 2025 22:30
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving with one question about the FIXMEs.

Comment on lines +123 to +124
// FIXME how to use these statuses ? They are not used either in the corresponding
// decompressor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will we track this? Follow-up work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged this as-is from #19309
@firestarman?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the new API write into this comp_statuses array? Compression status for each chunk?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just read the doc:

device_statuses – [out] Array with size num_chunks of statuses in device-accessible memory. This argument needs to be preallocated. For each chunk, if the compression is successful, the status will be set to nvcompSuccess, and an error code otherwise.

So should we launch another kernel thrust::count_if(any error)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't want to sync stream here, we probably need to return the entire status array for checking later on. Otherwise, we would not be able to check for compression status of each data chunk. The status returned from this async API is only checking for launching the kernel, not checking the compression results.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is pretty much what we do in libcudf
since this is unrelated to nvcomp 5, I think this could be a follow-up. Also, not sure if you want to use the adapter here long-term.

Copy link
Member

@mhaseeb123 mhaseeb123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving CMake changes

Comment on lines +285 to +287
// FIXME how to use these statuses ? They are not used either in the corresponding
// decompressor.
auto comp_statuses = rmm::device_uvector<nvcompStatus_t>(batch_size, stream);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, we have the decompression status written into this array but here we don't check it to see if there is any failure.

Copy link
Contributor

@ttnghia ttnghia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java approval, pending the follow up work for checking batch comp/decomp status.

@bdice
Copy link
Contributor

bdice commented Aug 20, 2025

/ok to test

@copy-pr-bot
Copy link

copy-pr-bot bot commented Aug 20, 2025

/ok to test

@bdice, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

@bdice
Copy link
Contributor

bdice commented Aug 20, 2025

/ok to test b45faae

@vuule vuule removed the DO NOT MERGE Hold off on merging; see PR for details label Aug 20, 2025
@bdice
Copy link
Contributor

bdice commented Aug 20, 2025

CI failed, probably because 9f0d3f0 was needed for a fix. I merged the upstream and will rerun CI.

@bdice
Copy link
Contributor

bdice commented Aug 20, 2025

/ok to test 780610a

@bdice
Copy link
Contributor

bdice commented Aug 20, 2025

/merge

@rapids-bot rapids-bot bot merged commit aea777d into rapidsai:branch-25.10 Aug 20, 2025
92 checks passed
@vuule vuule deleted the test-nvcomp-5 branch August 20, 2025 03:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants

Comments