Build and test with CUDA 13.0.0#803
Build and test with CUDA 13.0.0#803rapids-bot[bot] merged 11 commits intorapidsai:branch-25.10from jameslamb:cuda-13.0.0
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
Problem 1: nvCOMP 4.2.0.11 does not have CUDA 13 packagesPython conda builds are failing like this: Luckily #804 will remove the dependency on nvCOMP here, which should resolve that. Problem 2: failing Python testsThere are a bunch of failing Python tests, concentrated on many different parameterizations of 2 test cases. |
|
/ok to test |
|
/ok to test |
|
/ok to test aa320b5 |
cpp/src/shim/cuda.cpp
Outdated
| decltype(cuMemcpyBatchAsync)* fp; | ||
| get_symbol(fp, lib, KVIKIO_STRINGIFY(cuMemcpyBatchAsync)); | ||
| MemcpyBatchAsync.set(fp); | ||
| if ((CUDA_VERSION >= 12080 && CUDA_VERSION < 13000 && driver_version >= 12080 && |
There was a problem hiding this comment.
I feel like there must be a way for us to support CUDA 12.8 runtime with CUDA 13.0 driver. Can we investigate this a little further?
There was a problem hiding this comment.
Many of our CUDA 12 tests are running with driver 580, which should cover this case.
There was a problem hiding this comment.
Thanks.
It looks like because of backward compatibility the code compiled against CUDA driver 12.9 is able to dynamically load its own version of batched copy function from libcuda.so.580.65.06 when run on the system with CUDA driver 13.0.
Handling simplified.
dependencies.yaml
Outdated
| - output_types: conda | ||
| packages: | ||
| - cupy>=12.0.0 | ||
| - &cupy_unsuffixed cupy>=12.0.0 |
There was a problem hiding this comment.
We need to bump to cupy>=13.6 for CUDA 13. Maybe we should require that same pinning on CUDA 12, too? Historically I believe we've bumped cupy bounds for all CUDA versions whenever some new CUDA version needs a higher pinning.
I don't think we are testing cupy==12.0.0 in our "oldest" dependencies jobs right now so I don't have a good idea of whether 12.0 still works. Bumping to 13.6 seems safest.
There was a problem hiding this comment.
Ok thanks for that historical context.
On https://anaconda.org/conda-forge/cupy/files?sort=basename&sort_order=desc&version=13.6.0 I do see cupy==13.6.0 packages published for CUDA 11, 12, and 13, so hopefully this should be ok.
Updated in 6e7aff3, and I'll update the other CUDA 13 PRs (including rapids-reviser example).
|
At this point everyone involved in the PR is already subscribed to notifications via comments, so I've moved this out of draft so that we don't have to |
bdice
left a comment
There was a problem hiding this comment.
One small docs suggestion, otherwise LGTM
There was a problem hiding this comment.
@jameslamb Just as a heads-up, make sure the newly created devcontainers include Paul's build cluster changes. I think this looks right but just verify that for all the repos as you add CUDA 13 support to them.
There was a problem hiding this comment.
Good note, thank you. I think we have what's needed here, based on what I see in the diff from #797
The only changes in the devcontainer.json files themselves were things like this, which this branch has:
"runArgs": [
"--rm",
"--name",
"${localEnv:USER:anon}-rapids-${localWorkspaceFolderBasename}-25.10-cuda13.0-conda",
+ "--ulimit",
+ "nofile=500000"
],Co-authored-by: Bradley Dice <bdice@bradleydice.com>
|
/merge |
Contributes to rapidsai/build-planning#208
Contributes to rapidsai/build-planning#68
dependencies.yamlmatrices (i.e., the ones that get written topyproject.tomlin source control)Notes for Reviewers
This switches GitHub Actions workflows to the
cuda13.0branch from here: rapidsai/shared-workflows#413A future round of PRs will revert that back to
branch-25.10, once all of RAPIDS supports CUDA 13.