Skip to content

Build and test with CUDA 13.0.0#7128

Merged
rapids-bot[bot] merged 9 commits intorapidsai:branch-25.10from
jameslamb:cuda-13.0.0
Sep 4, 2025
Merged

Build and test with CUDA 13.0.0#7128
rapids-bot[bot] merged 9 commits intorapidsai:branch-25.10from
jameslamb:cuda-13.0.0

Conversation

@jameslamb
Copy link
Member

@jameslamb jameslamb commented Aug 22, 2025

Contributes to rapidsai/build-planning#208

Contributes to rapidsai/build-planning#68

  • updates to CUDA 13 dependencies in fallback entries in dependencies.yaml matrices (i.e., the ones that get written to pyproject.toml in source control)

Notes for Reviewers

This switches GitHub Actions workflows to the cuda13.0 branch from here: rapidsai/shared-workflows#413

A future round of PRs will revert that back to branch-25.10, once all of RAPIDS supports CUDA 13.

@jameslamb jameslamb added non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels Aug 22, 2025
@copy-pr-bot

This comment was marked as resolved.

@robertmaynard
Copy link
Contributor

#6823 was merged yesterday and added a dependency to pynvml constrainted to the 12.X series. We will need to update this PR with latest 25.10 and update that logic for CTK 12 and 13 @jameslamb

@jameslamb
Copy link
Member Author

jameslamb commented Sep 2, 2025

Problem 1: missing "treebank" dataset

update: seemed to be a network error, not seen on re-runs.

details (click me)

One CUDA 12.0.1 conda-python-tests-singlegpu job failed like this:

[gw1] linux -- Python 3.12.11 /opt/conda/envs/test/bin/python
Traceback (most recent call last):
  File "/opt/conda/envs/test/lib/python3.12/site-packages/nltk/corpus/util.py", line 84, in __load
    root = nltk.data.find(f"{self.subdir}/{zip_name}")
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/test/lib/python3.12/site-packages/nltk/data.py", line 579, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource treebank not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk

(conda-python-tests-singlegpu build link)

Hopefully just a temporary issue that's be resolved by a re-run.

Problem 2: Failing Naive Bayes tests (conda)

update: Related to scikit-learn dataset downloads interacting in unexpected way with tests run in parallel. Fixed by #7169

details (click me)

CUDA 13 conda-python-tests jobs both failed like this:

FAILED test_naive_bayes.py::test_gaussian_parameters[1e-05-balanced] - ValueError: Number of priors must match number of classes.
FAILED test_naive_bayes.py::test_gaussian_parameters[1e-05-unbalanced] - ValueError: Number of priors must match number of classes.
..
FAILED test_naive_bayes.py::test_categorical_partial_fit[True-int32-float32] - assert 0.1452 <= (0.104 + 0.0001)
FAILED test_naive_bayes.py::test_categorical_partial_fit[True-int32-float64] - assert 0.1452 <= (0.104 + 0.0001)
..
FAILED test_naive_bayes.py::test_categorical_parameters[False-False-0.1-balanced] - ValueError: Number of classes must match number of priors
FAILED test_naive_bayes.py::test_categorical_parameters[False-False-0.1-unbalanced] - ValueError: Number of classes must match number of priors
..
= 40 failed, 14374 passed, 6142 skipped, 1225 xfailed, 24 xpassed, 1225 warnings in 4949.54s (1:22:29) =

(conda-python-tests build link)

tracked in #7152

Problem 3: Failing Naive Bayes and Logistic Regression tests (wheels)

update: same as above, fixed by #7169

details (click me)

All of the amd64 CUDA 13 wheels tests passed, but on arm64 there were some test failures.

FAILED test_naive_bayes.py::test_multinomial_partial_fit[int32-float32] - assert 0.9238375200427579 >= 0.924
 +  where 0.9238375200427579 = accuracy_score(array([ 5, 16,  6, ..., 14,  7,  3], shape=(11226,), dtype=int32), array([ 5, 16,  6, ..., 14,  7,  3], shape=(11226,), dtype=int32))
FAILED test_naive_bayes.py::test_multinomial_partial_fit[int32-float64] - assert 0.9238375200427579 >= 0.924
 +  where 0.9238375200427579 = accuracy_score(array([ 5, 16,  6, ..., 14,  7,  3], shape=(11226,), dtype=int32), array([ 5, 16,  6, ..., 14,  7,  3], shape=(11226,)))
...
FAILED test_naive_bayes.py::test_multinomial[int32-float32] - assert 0.9238375200427579 >= 0.924
 +  where 0.9238375200427579 = accuracy_score(array([ 5, 16,  6, ..., 14,  7,  3], shape=(11226,)), array([ 5, 16,  6, ..., 14,  7,  3], shape=(11226,), dtype=int32))
FAILED test_naive_bayes.py::test_multinomial[int32-float64] - assert 0.9238375200427579 >= 0.924
 +  where 0.9238375200427579 = accuracy_score(array([ 5, 16,  6, ..., 14,  7,  3], shape=(11226,)), array([ 5, 16,  6, ..., 14,  7,  3], shape=(11226,)))
...
FAILED test_naive_bayes.py::test_gaussian_partial_fit - assert 0.988 >= 0.99
 +  where 0.988 = accuracy_score(array([ 5, 16,  6, ...,  8, 13, 11], shape=(1500,), dtype=int32), array([ 5, 16,  6, ...,  8, 13, 11], shape=(1500,), dtype=int32))
...
FAILED test_naive_bayes.py::test_categorical_partial_fit[False-int64-float32] - assert 0.1082 <= (0.104 + 0.0001)
FAILED test_naive_bayes.py::test_categorical_partial_fit[False-int64-float64] - assert 0.1082 <= (0.104 + 0.0001)
FAILED test_linear_model.py::test_logistic_regression_sparse_only - ExceptionGroup: Hypothesis found 2 distinct failures in explicit examples. (2 sub-exceptions)
= 22 failed, 14354 passed, 6150 skipped, 1223 xfailed, 24 xpassed, 1217 warnings in 1611.55s (0:26:51) =

(wheel-tests-cuml build link)

tracked in #7152 and #7162

@jameslamb
Copy link
Member Author

/ok to test

rapids-bot bot pushed a commit that referenced this pull request Sep 3, 2025
…cy pins (#7164)

Contributes to rapidsai/build-planning#208 (breaking some changes off of #7128 to help with review and debugging there)

* switches to using `dask-cuda[cu12]` extra for wheels (added in rapidsai/dask-cuda#1536)
* bumps pins on some dependencies to match the rest of RAPIDS
  - `cuda-python`: >=12.9.2 (CUDA 12)
  - `cupy`: >=13.6.0
  - `numba`: >=0.60.0
* adds explicit runtime dependency on `numba-cuda`
  - *`cuml` uses this unconditionally but does not declare runtime dependency on it today*

Contributes to rapidsai/build-infra#293

* replaces dependency on `pynvml` package with `nvidia-ml-py` package (see that issue for details)

## Notes for Reviewers

### These dependency pin changes should be low-risk

All of these pins and requirements are already coming through `cuml`'s dependencies, e.g. `cudf` carries most of them via rapidsai/cudf#19806

So they shouldn't change much about the test environments in CI.

Authors:
  - James Lamb (https://github.com/jameslamb)
  - Simon Adorf (https://github.com/csadorf)

Approvers:
  - Simon Adorf (https://github.com/csadorf)
  - Gil Forsyth (https://github.com/gforsyth)

URL: #7164
@jameslamb
Copy link
Member Author

/ok to test

@jameslamb jameslamb changed the title WIP: Build and test with CUDA 13.0.0 Build and test with CUDA 13.0.0 Sep 3, 2025
@jameslamb jameslamb marked this pull request as ready for review September 3, 2025 21:44
@jameslamb jameslamb requested review from a team as code owners September 3, 2025 21:44
@jameslamb jameslamb requested review from gforsyth and jcrist September 3, 2025 21:44
Copy link
Contributor

@csadorf csadorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks a lot!

@jameslamb
Copy link
Member Author

/merge

@rapids-bot rapids-bot bot merged commit 0fa871d into rapidsai:branch-25.10 Sep 4, 2025
124 checks passed
@jameslamb jameslamb deleted the cuda-13.0.0 branch September 4, 2025 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

conda conda issue Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants