Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] Test CCCL version bump #2358

Closed
wants to merge 3 commits into from

Conversation

sleeepyjack
Copy link

This PR tests an upcoming CCCL version bump (rapidsai/rapids-cmake#631) and should not be merged.

@sleeepyjack
Copy link
Author

sleeepyjack commented Jun 12, 2024

One of the tests is running into an error: https://github.com/rapidsai/raft/actions/runs/9475150045/job/26109283214?pr=2358#step:7:824

We ran into a similar issue yesterday with cuco when bumping the CCCL version to 2.5.0 (see NVIDIA/cuCollections#504). Turns out the culprit was a sticky CUDA error that resurfaced during a downstream Thrust/cub call.

I'm not very familiar with this codebase. Could someone help investigate this issue?

@bdice
Copy link
Contributor

bdice commented Jun 12, 2024

I can reproduce the issue locally in a devcontainer by running build-raft-cpp && ~/raft/cpp/build/latest/gtests/CORE_TEST -V. I won't have much time to look deeper today but at least we have a reproducer outside of CI.

rapids-bot bot pushed a commit that referenced this pull request Jun 13, 2024
This PR fixes a bug uncovered by CCCL version bump #2358

Authors:
  - Yunsong Wang (https://github.com/PointKernel)

Approvers:
  - Ben Frederickson (https://github.com/benfred)

URL: #2361
@sleeepyjack
Copy link
Author

Closing this PR since all checks have passed

rapids-bot bot pushed a commit to rapidsai/rapids-cmake that referenced this pull request Jun 24, 2024
This PR updates the CCCL version to include a fix for `cuda::std::span` which is required for cuCollections to work properly with CCCL 2.5.0. 

Most of the changes between the last CCCL version bump (#607) and this one were related to doc updates and unit test fixes, so I don't expect much functional impact for RAPIDS.

After this PR we likely have to bump the cuco version again to include the new changes.

### CCCL PR:

- NVIDIA/cccl#1836

### CUCO PR:

- NVIDIA/cuCollections#502

### RAPIDS PRs:

- [x] rapidsai/cudf#15986
  - [error during docs-build](https://github.com/rapidsai/cudf/actions/runs/9406273871/job/25911619452?pr=15946#step:9:2243) but seems unrelated
- [x] rapidsai/cugraph#4483
  - some CI jobs ran into a network timeout -> rerunning
- [x] rapidsai/cuml#5924
  - successful apart from some optional conda python tests
- [x] rapidsai/raft#2358
  - This one is weird as it [shows an error](https://github.com/rapidsai/raft/actions/runs/9475150045/job/26109283214?pr=2358#step:7:824) that is similar to the one we found in cuco when bumping the CCCL version to 2.5.0. However, we thought the problem was on cuco's end and pushed a fix that resolved the issue (see [Slack thread](https://nvidia.slack.com/archives/CCP05T27R/p1718060955876199))
- [x] rapidsai/rmm#1584

Authors:
  - Daniel Jünger (https://github.com/sleeepyjack)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #631
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants