Skip to content

Comments

Build and test with CUDA 13.0.0#366

Merged
rapids-bot[bot] merged 20 commits intoNVIDIA:branch-25.10from
jameslamb:cuda-13.0.0
Sep 5, 2025
Merged

Build and test with CUDA 13.0.0#366
rapids-bot[bot] merged 20 commits intoNVIDIA:branch-25.10from
jameslamb:cuda-13.0.0

Conversation

@jameslamb
Copy link
Member

Description

Contributes to rapidsai/build-planning#208

Contributes to rapidsai/build-planning#68

  • updates to CUDA 13 dependencies in fallback entries in dependencies.yaml matrices (i.e., the ones that get written to pyproject.toml in source control)

Notes for Reviewers

This switches GitHub Actions workflows to the cuda13.0 branch from here: rapidsai/shared-workflows#413

A future round of PRs will revert that back to branch-25.10, once all of RAPIDS supports CUDA 13.

Issue

Closes #294

Checklist

  • I am familiar with the Contributing Guidelines.
  • Testing
    • New or existing tests cover these changes
    • Added tests
    • Created an issue to follow-up
    • NA
  • Documentation
    • The documentation is up to date with these changes
    • Added new documentation
    • NA

@jameslamb jameslamb added non-breaking Introduces a non-breaking change improvement Improves an existing functionality labels Sep 2, 2025
@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 2, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

&set_bounds_changed_node, // "to" nodes
#if CUDA_VER_13_0_UP
nullptr, // edge data
#endif
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Builds were failing like this:

│ │ $SRC_DIR/cpp/src/mip/presolve/load_balanced_bounds_presolve_helpers.cuh(474): error: argument of type "int" is incompatible with parameter of type "const cudaGraphEdgeData *"
│ │ cudaGraphAddDependencies(act_graph, &act_sub_warp_node, &set_bounds_changed_node, 1);

(conda-cpp-build)

Starting in CUDA 13, cudaGraphAddDependencies() picked up a new argument for optional edge data for the graphs.

ref:

This attempts to patch around that. This code was already not passing any data about edges in the graph, as far as I can tell, so hopefully this will be fine.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rg20 @chris-maes for viz

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After that change, there is a new build failure I don't understand:

│ │ $BUILD_PREFIX/bin/../lib/gcc/aarch64-conda-linux-gnu/14.3.0/../../../../aarch64-conda-linux-gnu/bin/ld: tmpxft_0000132e_00000000-6_execute_insertion.compute_120.cudafe1.cpp:(.text._ZN5cuopt7routing6detail24guided_ejection_search_tIifLNS0_9request_tE1EE40execute_best_insertion_ejection_solutionEPNS1_14request_info_tIiLS3_1EvEERi[_ZN5cuopt7routing6detail24guided_ejection_search_tIifLNS0_9request_tE1EE40execute_best_insertion_ejection_solutionEPNS1_14request_info_tIiLS3_1EvEERi]+0x52c): undefined reference to void cuopt::routing::detail::kernel_get_best_insertion_ejection_solution<64, int, float, (cuopt::routing::request_t)1>(cuopt::routing::detail::solution_t<int, float, (cuopt::routing::request_t)1>::view_t, cuopt::routing::detail::request_info_t<int, (cuopt::routing::request_t)1, void> const*, int*, int, int, cuopt::routing::detail::feasible_move_t, long)' │ │ $BUILD_PREFIX/bin/../lib/gcc/aarch64-conda-linux-gnu/14.3.0/../../../../aarch64-conda-linux-gnu/bin/ld: tmpxft_0000132e_00000000-6_execute_insertion.compute_120.cudafe1.cpp:(.text._ZN5cuopt7routing6detail24guided_ejection_search_tIifLNS0_9request_tE1EE40execute_best_insertion_ejection_solutionEPNS1_14request_info_tIiLS3_1EvEERi[_ZN5cuopt7routing6detail24guided_ejection_search_tIifLNS0_9request_tE1EE40execute_best_insertion_ejection_solutionEPNS1_14request_info_tIiLS3_1EvEERi]+0x680): undefined reference to void cuopt::routing::detail::kernel_get_best_insertion_ejection_solution<64, int, float, (cuopt::routing::request_t)1>(cuopt::routing::detail::solution_t<int, float, (cuopt::routing::request_t)1>::view_t, cuopt::routing::detail::request_info_t<int, (cuopt::routing::request_t)1, void> const*, int*, int, int, cuopt::routing::detail::feasible_move_t, long)'
│ │ $BUILD_PREFIX/bin/../lib/gcc/aarch64-conda-linux-gnu/14.3.0/../../../../aarch64-conda-linux-gnu/bin/ld: libcuopt.so: hidden symbol `_ZN5cuopt18linear_programming6detail33update_changed_constraints_kernelIidEEvNS1_4fj_tIT_T0_E14climber_data_t6view_tE' isn't defined

(conda-cpp-build link)

@rgsl888prabhu @rg20 @chris-maes are any of you available to help with me with this? You're welcome to push commits directly to my branch here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to get to this tmrw. Thank you @jameslamb for the help on the CUDA 13

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am working on this, setting up build environment now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jameslamb I have pushed changes which fixed the cpp build, I think wheels needs to be worked on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, thanks!

Error: Failed to CreateArtifact: Received non-retryable error: Failed request: (409) Conflict: an artifact with this name already exists on the workflow run

(wheel-build-cuopt-mps-parser link)

I think I can fix this.

Signed-off-by: Ramakrishna Prabhu <ramakrishnap@nvidia.com>
@anandhkb anandhkb added this to the 25.10 milestone Sep 4, 2025
@NVIDIA NVIDIA deleted a comment from rgsl888prabhu Sep 4, 2025
@jameslamb jameslamb changed the title WIP: Build and test with CUDA 13.0.0 Build and test with CUDA 13.0.0 Sep 4, 2025
@jameslamb jameslamb marked this pull request as ready for review September 4, 2025 15:03
@jameslamb jameslamb requested review from a team as code owners September 4, 2025 15:03
append-cuda-suffix: false
pure-wheel: true
# only need 1 build (noarch package): this selects amd64, oldest-supported Python, latest-supported CUDA
matrix_filter: '[map(select(.ARCH == "amd64")) | min_by((.PY_VER | split(".") | map(tonumber)), (.CUDA_VER | split(".") | map(-tonumber)))]'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(moved to a corrected version of this filter)

Changes like this fix the errors about conflicting artifact names. For pure-Python packages, we only need to build 1 time... not once per combination of Python version, CUDA versions, operating system, and CPU architecture.

This change should save cuopt some CI time by avoiding lots of unnecessary wheel builds 😁

@jameslamb jameslamb changed the title WIP: Build and test with CUDA 13.0.0 Build and test with CUDA 13.0.0 Sep 5, 2025
nvidia-cuda-runtime-cu12==12.9.* \
libcuopt-cu12==25.10.*
'nvidia-cuda-runtime-cu12==12.9.*' \
'libcuopt-cu12==25.10.*'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Touching these because they came up in merge conflicts with #367

You have to single-quote specifiers with special shell characters like <, >, or * .... otherwise they may be treated specially by some shells and do surprising things.

@jameslamb jameslamb removed the do not merge Do not merge if this flag is set label Sep 5, 2025
Copy link
Collaborator

@rgsl888prabhu rgsl888prabhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jameslamb Awesome work and thank you for handling this, have a minor suggestion for the doc, but rest looks good with respect to infra.

Copy link
Contributor

@hlinsen hlinsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cpp changes look good. Thanks @jameslamb!

@jameslamb
Copy link
Member Author

Thank you both!!! I'll merge this once CI passes.

@jameslamb
Copy link
Member Author

/merge

@rapids-bot rapids-bot bot merged commit af47d49 into NVIDIA:branch-25.10 Sep 5, 2025
102 checks passed
@jameslamb jameslamb deleted the cuda-13.0.0 branch September 5, 2025 18:55
rapids-bot bot pushed a commit to rapidsai/ci-imgs that referenced this pull request Sep 5, 2025
Contributes to rapidsai/build-planning#208

Updates the `:latest` and `:25.10-latest` tags to CUDA 13.0.0.

## Notes for Reviewers

### is this safe to merge?

Once these are in, I think yes:

* [x] NVIDIA/cuopt#366
* [x] rapidsai/cugraph-gnn#286

At that point, the only thing it should affect are docs builds across repos that are already supporting CUDA 13 in all their other conda-based tests.

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

URL: #303
Copy link
Contributor

@cwilkinson76 cwilkinson76 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

rapids-bot bot pushed a commit that referenced this pull request Sep 6, 2025
Contributes to rapidsai/build-planning#208

* updates all GitHub Actions branch references back to `branch-25.10`, now that rapidsai/shared-workflows#413 is merged
* fixes docs mistakes (#366 (comment))
* fixes nightly builds

Nightly wheel builds of `cuopt-mps-parser` are failing like this:

> Error: Failed to CreateArtifact: Received non-retryable error: Failed request: (409) Conflict: an artifact with this name already exists on the workflow run

([wheel-build-cuopt-mps-parser link](https://github.com/NVIDIA/cuopt/actions/runs/17501959410/job/49716771454))

Because I forgot to bring over all of the artifact-naming changes made in `pr.yaml` to the corresponding entries on `build.yaml`, sorry 😬

Authors:
  - James Lamb (https://github.com/jameslamb)
  - Ramakrishnap (https://github.com/rgsl888prabhu)

Approvers:
  - Ramakrishnap (https://github.com/rgsl888prabhu)

URL: #377
aliceb-nv pushed a commit that referenced this pull request Sep 22, 2025
Contributes to rapidsai/build-planning#208

* uses CUDA 13.0.0 to build and test
* moves some dependency pins:
  - `cuda-python`: `>=12.9.2` (CUDA 12), `>=13.0.1` (CUDA 13)
  - `cupy`: `>=13.6.0`
* declares `cuda-python` runtime dependency for wheels ([it was previously only declared for conda packages](https://github.com/NVIDIA/cuopt/blob/c62320447414c47f25ea67bb2570e05c7d0d29ac/conda/recipes/cuopt/recipe.yaml#L70))

Contributes to rapidsai/build-planning#68

* updates to CUDA 13 dependencies in fallback entries in `dependencies.yaml` matrices (i.e., the ones that get written to `pyproject.toml` in source control)

## Notes for Reviewers

This switches GitHub Actions workflows to the `cuda13.0` branch from here: rapidsai/shared-workflows#413

A future round of PRs will revert that back to `branch-25.10`, once all of RAPIDS supports CUDA 13.

## Issue

Closes #294

Authors:
  - James Lamb (https://github.com/jameslamb)
  - Ramakrishnap (https://github.com/rgsl888prabhu)

Approvers:
  - Ramakrishnap (https://github.com/rgsl888prabhu)
  - Hugo Linsenmaier (https://github.com/hlinsen)

URL: #366
aliceb-nv pushed a commit that referenced this pull request Sep 22, 2025
Contributes to rapidsai/build-planning#208

* updates all GitHub Actions branch references back to `branch-25.10`, now that rapidsai/shared-workflows#413 is merged
* fixes docs mistakes (#366 (comment))
* fixes nightly builds

Nightly wheel builds of `cuopt-mps-parser` are failing like this:

> Error: Failed to CreateArtifact: Received non-retryable error: Failed request: (409) Conflict: an artifact with this name already exists on the workflow run

([wheel-build-cuopt-mps-parser link](https://github.com/NVIDIA/cuopt/actions/runs/17501959410/job/49716771454))

Because I forgot to bring over all of the artifact-naming changes made in `pr.yaml` to the corresponding entries on `build.yaml`, sorry 😬

Authors:
  - James Lamb (https://github.com/jameslamb)
  - Ramakrishnap (https://github.com/rgsl888prabhu)

Approvers:
  - Ramakrishnap (https://github.com/rgsl888prabhu)

URL: #377
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEA] Support CUDA 13 in 25.10

5 participants