Skip to content

Build and test with CUDA 13.0.0#803

Merged
rapids-bot[bot] merged 11 commits intorapidsai:branch-25.10from
jameslamb:cuda-13.0.0
Aug 20, 2025
Merged

Build and test with CUDA 13.0.0#803
rapids-bot[bot] merged 11 commits intorapidsai:branch-25.10from
jameslamb:cuda-13.0.0

Conversation

@jameslamb
Copy link
Member

@jameslamb jameslamb commented Aug 19, 2025

Contributes to rapidsai/build-planning#208

  • uses CUDA 13.0.0 to build and test

Contributes to rapidsai/build-planning#68

  • updates to CUDA 13 dependencies in fallback entries in dependencies.yaml matrices (i.e., the ones that get written to pyproject.toml in source control)

Notes for Reviewers

This switches GitHub Actions workflows to the cuda13.0 branch from here: rapidsai/shared-workflows#413

A future round of PRs will revert that back to branch-25.10, once all of RAPIDS supports CUDA 13.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Aug 19, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@jameslamb jameslamb added non-breaking Introduces a non-breaking change improvement Improves an existing functionality labels Aug 19, 2025
@jameslamb
Copy link
Member Author

/ok to test

@jameslamb
Copy link
Member Author

Problem 1: nvCOMP 4.2.0.11 does not have CUDA 13 packages

Python conda builds are failing like this:

Error:   × Failed to resolve dependencies: Cannot solve the request because of:
  │ libnvcomp-dev 4.2.0.11.* cannot be installed because there are no viable
  │ options:
  │ └─ libnvcomp-dev 4.2.0.11 would require
  │    └─ cuda-version >=12,<13.0a0, for which no candidates were found.
  │ 
  ╰─▶ Cannot solve the request because of: libnvcomp-dev 4.2.0.11.* cannot be
      installed because there are no viable options:
      └─ libnvcomp-dev 4.2.0.11 would require
         └─ cuda-version >=12,<13.0a0, for which no candidates were found.

(build link)

Luckily #804 will remove the dependency on nvCOMP here, which should resolve that.

Problem 2: failing Python tests

There are a bunch of failing Python tests, concentrated on many different parameterizations of 2 test cases.

FAILED python/kvikio/tests/test_mmap.py::test_read_seq[cupy-0-None] - TypeError: bad any_cast
FAILED python/kvikio/tests/test_mmap.py::test_read_seq[cupy-0-10] - TypeError: bad any_cast
FAILED python/kvikio/tests/test_mmap.py::test_read_seq[cupy-0-9999] - TypeError: bad any_cast
...
FAILED python/kvikio/tests/test_mmap.py::test_read_with_default_arguments[cupy] - TypeError: bad any_cast
FAILED python/kvikio/tests/test_mmap.py::test_read_with_default_arguments[cupy_async] - TypeError: bad any_cast
================= 92 failed, 1404 passed, 13 skipped in 59.70s =================

(build link)

@jameslamb
Copy link
Member Author

/ok to test

@jameslamb
Copy link
Member Author

/ok to test

@kingcrimsontianyu
Copy link
Contributor

/ok to test aa320b5

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting a few related version bumps that should ensure a smoother CUDA 13 experience.

decltype(cuMemcpyBatchAsync)* fp;
get_symbol(fp, lib, KVIKIO_STRINGIFY(cuMemcpyBatchAsync));
MemcpyBatchAsync.set(fp);
if ((CUDA_VERSION >= 12080 && CUDA_VERSION < 13000 && driver_version >= 12080 &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like there must be a way for us to support CUDA 12.8 runtime with CUDA 13.0 driver. Can we investigate this a little further?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many of our CUDA 12 tests are running with driver 580, which should cover this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.
It looks like because of backward compatibility the code compiled against CUDA driver 12.9 is able to dynamically load its own version of batched copy function from libcuda.so.580.65.06 when run on the system with CUDA driver 13.0.
Handling simplified.

- output_types: conda
packages:
- cupy>=12.0.0
- &cupy_unsuffixed cupy>=12.0.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to bump to cupy>=13.6 for CUDA 13. Maybe we should require that same pinning on CUDA 12, too? Historically I believe we've bumped cupy bounds for all CUDA versions whenever some new CUDA version needs a higher pinning.

I don't think we are testing cupy==12.0.0 in our "oldest" dependencies jobs right now so I don't have a good idea of whether 12.0 still works. Bumping to 13.6 seems safest.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks for that historical context.

On https://anaconda.org/conda-forge/cupy/files?sort=basename&sort_order=desc&version=13.6.0 I do see cupy==13.6.0 packages published for CUDA 11, 12, and 13, so hopefully this should be ok.

Updated in 6e7aff3, and I'll update the other CUDA 13 PRs (including rapids-reviser example).

@jameslamb jameslamb changed the title WIP: Build and test with CUDA 13.0.0 Build and test with CUDA 13.0.0 Aug 20, 2025
@jameslamb jameslamb marked this pull request as ready for review August 20, 2025 14:13
@jameslamb jameslamb requested review from a team as code owners August 20, 2025 14:13
@jameslamb jameslamb requested a review from bdice August 20, 2025 14:13
@jameslamb
Copy link
Member Author

At this point everyone involved in the PR is already subscribed to notifications via comments, so I've moved this out of draft so that we don't have to /ok to test new commits.

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small docs suggestion, otherwise LGTM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jameslamb Just as a heads-up, make sure the newly created devcontainers include Paul's build cluster changes. I think this looks right but just verify that for all the repos as you add CUDA 13 support to them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good note, thank you. I think we have what's needed here, based on what I see in the diff from #797

The only changes in the devcontainer.json files themselves were things like this, which this branch has:

  "runArgs": [
    "--rm",
    "--name",
    "${localEnv:USER:anon}-rapids-${localWorkspaceFolderBasename}-25.10-cuda13.0-conda",
+    "--ulimit",
+    "nofile=500000"
  ],

https://github.com/jameslamb/kvikio/blob/3162121ad72d303b150e7e80a0e61c0bac06fd3b/.devcontainer/cuda13.0-conda/devcontainer.json#L15-L16

Co-authored-by: Bradley Dice <bdice@bradleydice.com>
@jameslamb
Copy link
Member Author

/merge

@rapids-bot rapids-bot bot merged commit c774079 into rapidsai:branch-25.10 Aug 20, 2025
77 checks passed
@jameslamb jameslamb deleted the cuda-13.0.0 branch August 20, 2025 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants