Build and test with CUDA 13.0.0#1162
Build and test with CUDA 13.0.0#1162rapids-bot[bot] merged 9 commits intorapidsai:branch-0.46from jameslamb:cuda-13.0.0
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
|
Ok, 2 things we need to resolve. Problem 1: building wheels against
|
|
I think this comes up from a place of need, but could we skip CUDA 13 for UCX-Py? The plan is to start archival of it next Monday, see #1160, which I presume will take approximately a week or two at the most, so the timing is a little bit off here. Anyway, I want us to explore if there's a way we can move forward with CUDA 13 support without UCX-Py support, since this work here is likely gonna be removed in about a week. |
If we're not going to release However we DO need to get CUDA 13 support shipped in RAPIDS 25.10 (cc @robertmaynard). There are 18 (U.S.) working days left until burndown begins for the 25.10 release: https://docs.rapids.ai/maintainers/ If we wait for I tried to roughly map out the dependencies in rapidsai/build-planning#208 With that plus the list you provided in #1160, I think we're going to get stuck pretty soon without For example.... Here are some options I see:
Trying to work around |
|
I've also started an internal chat thread about this, if you'd rather talk through the options there. |
|
@jameslamb I recommend skipping/disabling the tests that require cuDF for now. |
|
/ok to test |
|
Just pushed some changes I think will help. Just ignoring that one compiler warning from #1162 (comment), I was able to build and run most tests locally like this: docker run \
--rm \
-v $(pwd):/opt/work \
-w /opt/work \
-it rapidsai/ci-wheel:25.10-cuda13.0.0-rockylinux8-py3.13 \
./ci/build_wheel.sh
# NOTE: with test_wheel.sh changed to install locally-build wheel in dist/ instead of downloading it
docker run \
--rm \
--gpus all \
-v $(pwd):/opt/work \
-w /opt/work \
-it rapidsai/ci-wheel:25.10-cuda13.0.0-rockylinux8-py3.13 \
./ci/test_wheel.shJust a few |
|
At this point everyone involved is subscribed to notifications anyway via comments, so moved this out of draft. |
|
Some CUDA 13 tests are segfaulting like this: Seeing |
| # CUDA 13 | ||
| conda create -n ucx -c conda-forge -c rapidsai \ | ||
| cuda-version=13.0 ucx-py | ||
|
|
||
| # CUDA 12 | ||
| conda create -n ucx -c conda-forge -c rapidsai \ | ||
| cuda-version=12.9 ucx-py |
There was a problem hiding this comment.
| # CUDA 13 | |
| conda create -n ucx -c conda-forge -c rapidsai \ | |
| cuda-version=13.0 ucx-py | |
| # CUDA 12 | |
| conda create -n ucx -c conda-forge -c rapidsai \ | |
| cuda-version=12.9 ucx-py | |
| # CUDA 12 | |
| conda create -n ucx -c conda-forge -c rapidsai \ | |
| cuda-version=12.9 ucx-py | |
| # CUDA 13 | |
| conda create -n ucx -c conda-forge -c rapidsai \ | |
| cuda-version=13.0 ucx-py |
There was a problem hiding this comment.
Just to keep consistent ordering with text and keep chronological ordering.
There was a problem hiding this comment.
@bdice asked me in rapidsai/kvikio#803 (comment) to order these types of things with newer CUDA first.
I'm indifferent about it, will let him comment here and will do whatever you two agree on.
There was a problem hiding this comment.
I'd prefer we keep it consistent with the text, it looks very strange to me having 13 come before 12, this is how it always looked in the past (11 then 12).
There was a problem hiding this comment.
We want to shift the default (first command offered) to be the newest. The newest CUDA version will be supported for longer and we want to encourage users to adopt new versions.
There was a problem hiding this comment.
We can rearrange the text as needed but should prefer 13 over 12 in ordering and in any situations where we only give one example.
There was a problem hiding this comment.
I'd prefer if we then rephrase (some of) the text, if we want to encourage that we could start referring to CUDA 12 as "legacy" or something like that to provide some strong encouragement.
There was a problem hiding this comment.
I don't think we should hold up this PR over this phrasing in documentation. Since you've both approved the PR (so approved all the other changes), I'm planning to merge this as soon as CI finishes, to keep moving forward with supporting CUDA 13 in RAPIDS.
I'd be happy to review a follow-up PR changing the text around these install instructions.
| # CUDA 13 | ||
| pip install ucx-py-cu13 | ||
|
|
||
| # CUDA 12 | ||
| pip install ucx-py-cu12 |
There was a problem hiding this comment.
| # CUDA 13 | |
| pip install ucx-py-cu13 | |
| # CUDA 12 | |
| pip install ucx-py-cu12 | |
| # CUDA 12 | |
| pip install ucx-py-cu12 | |
| # CUDA 13 | |
| pip install ucx-py-cu13 |
| # CUDA 13 | ||
| pip install 'libucx-cu13>=1.19.0,<1.20' | ||
|
|
||
| # CUDA 12 |
There was a problem hiding this comment.
| # CUDA 13 | |
| pip install 'libucx-cu13>=1.19.0,<1.20' | |
| # CUDA 12 | |
| # CUDA 12 | |
| pip install 'libucx-cu12>=1.19.0,<1.20' | |
| # CUDA 13 |
|
|
||
| # CUDA 12 | ||
| pip install 'libucx-cu12>=1.16.0,<1.17' | ||
| pip install 'libucx-cu12>=1.19.0,<1.20' |
There was a problem hiding this comment.
| pip install 'libucx-cu12>=1.19.0,<1.20' | |
| pip install 'libucx-cu13>=1.19.0,<1.20' |
bdice
left a comment
There was a problem hiding this comment.
One tiny suggestion, otherwise LGTM
Co-authored-by: Bradley Dice <bdice@bradleydice.com>
|
/merge |
Contributes to rapidsai/build-planning#208 #1162 temporarily removed the `cudf` test-time dependency here, because there weren't yet CUDA 13 `cudf` packages. Those now exist (rapidsai/cudf#19768), so this restores that dependency. Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) - Vyas Ramasubramani (https://github.com/vyasr) URL: #1164
Contributes to rapidsai/build-planning#208
cupy:>=13.6.0Contributes to rapidsai/build-planning#68
dependencies.yamlmatrices (i.e., the ones that get written topyproject.tomlin source control)Notes for Reviewers
This switches GitHub Actions workflows to the
cuda13.0branch from here: rapidsai/shared-workflows#413A future round of PRs will revert that back to
branch-25.10, once all of RAPIDS supports CUDA 13.