Build and test with CUDA 13.0.0#366
Build and test with CUDA 13.0.0#366rapids-bot[bot] merged 20 commits intoNVIDIA:branch-25.10from jameslamb:cuda-13.0.0
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
| &set_bounds_changed_node, // "to" nodes | ||
| #if CUDA_VER_13_0_UP | ||
| nullptr, // edge data | ||
| #endif |
There was a problem hiding this comment.
Builds were failing like this:
│ │ $SRC_DIR/cpp/src/mip/presolve/load_balanced_bounds_presolve_helpers.cuh(474): error: argument of type "int" is incompatible with parameter of type "const cudaGraphEdgeData *"
│ │ cudaGraphAddDependencies(act_graph, &act_sub_warp_node, &set_bounds_changed_node, 1);
Starting in CUDA 13, cudaGraphAddDependencies() picked up a new argument for optional edge data for the graphs.
ref:
- https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__GRAPH.html#group__CUDART__GRAPH_1g575372c40c161728bfe500ebf676466c
- Update cudaGraphAddDependencies for 13.0 cccl#5691
- [STF] Fix CUDA graph API calls for CUDA 13 cccl#5636
This attempts to patch around that. This code was already not passing any data about edges in the graph, as far as I can tell, so hopefully this will be fine.
There was a problem hiding this comment.
After that change, there is a new build failure I don't understand:
│ │ $BUILD_PREFIX/bin/../lib/gcc/aarch64-conda-linux-gnu/14.3.0/../../../../aarch64-conda-linux-gnu/bin/ld: tmpxft_0000132e_00000000-6_execute_insertion.compute_120.cudafe1.cpp:(.text._ZN5cuopt7routing6detail24guided_ejection_search_tIifLNS0_9request_tE1EE40execute_best_insertion_ejection_solutionEPNS1_14request_info_tIiLS3_1EvEERi[_ZN5cuopt7routing6detail24guided_ejection_search_tIifLNS0_9request_tE1EE40execute_best_insertion_ejection_solutionEPNS1_14request_info_tIiLS3_1EvEERi]+0x52c): undefined reference to
void cuopt::routing::detail::kernel_get_best_insertion_ejection_solution<64, int, float, (cuopt::routing::request_t)1>(cuopt::routing::detail::solution_t<int, float, (cuopt::routing::request_t)1>::view_t, cuopt::routing::detail::request_info_t<int, (cuopt::routing::request_t)1, void> const*, int*, int, int, cuopt::routing::detail::feasible_move_t, long)' │ │ $BUILD_PREFIX/bin/../lib/gcc/aarch64-conda-linux-gnu/14.3.0/../../../../aarch64-conda-linux-gnu/bin/ld: tmpxft_0000132e_00000000-6_execute_insertion.compute_120.cudafe1.cpp:(.text._ZN5cuopt7routing6detail24guided_ejection_search_tIifLNS0_9request_tE1EE40execute_best_insertion_ejection_solutionEPNS1_14request_info_tIiLS3_1EvEERi[_ZN5cuopt7routing6detail24guided_ejection_search_tIifLNS0_9request_tE1EE40execute_best_insertion_ejection_solutionEPNS1_14request_info_tIiLS3_1EvEERi]+0x680): undefined reference tovoid cuopt::routing::detail::kernel_get_best_insertion_ejection_solution<64, int, float, (cuopt::routing::request_t)1>(cuopt::routing::detail::solution_t<int, float, (cuopt::routing::request_t)1>::view_t, cuopt::routing::detail::request_info_t<int, (cuopt::routing::request_t)1, void> const*, int*, int, int, cuopt::routing::detail::feasible_move_t, long)'
│ │ $BUILD_PREFIX/bin/../lib/gcc/aarch64-conda-linux-gnu/14.3.0/../../../../aarch64-conda-linux-gnu/bin/ld: libcuopt.so: hidden symbol `_ZN5cuopt18linear_programming6detail33update_changed_constraints_kernelIidEEvNS1_4fj_tIT_T0_E14climber_data_t6view_tE' isn't defined
@rgsl888prabhu @rg20 @chris-maes are any of you available to help with me with this? You're welcome to push commits directly to my branch here.
There was a problem hiding this comment.
I will try to get to this tmrw. Thank you @jameslamb for the help on the CUDA 13
There was a problem hiding this comment.
I am working on this, setting up build environment now.
There was a problem hiding this comment.
@jameslamb I have pushed changes which fixed the cpp build, I think wheels needs to be worked on.
There was a problem hiding this comment.
Excellent, thanks!
Error: Failed to CreateArtifact: Received non-retryable error: Failed request: (409) Conflict: an artifact with this name already exists on the workflow run
(wheel-build-cuopt-mps-parser link)
I think I can fix this.
Signed-off-by: Ramakrishna Prabhu <ramakrishnap@nvidia.com>
.github/workflows/build.yaml
Outdated
| append-cuda-suffix: false | ||
| pure-wheel: true | ||
| # only need 1 build (noarch package): this selects amd64, oldest-supported Python, latest-supported CUDA | ||
| matrix_filter: '[map(select(.ARCH == "amd64")) | min_by((.PY_VER | split(".") | map(tonumber)), (.CUDA_VER | split(".") | map(-tonumber)))]' |
There was a problem hiding this comment.
(moved to a corrected version of this filter)
Changes like this fix the errors about conflicting artifact names. For pure-Python packages, we only need to build 1 time... not once per combination of Python version, CUDA versions, operating system, and CPU architecture.
This change should save cuopt some CI time by avoiding lots of unnecessary wheel builds 😁
| nvidia-cuda-runtime-cu12==12.9.* \ | ||
| libcuopt-cu12==25.10.* | ||
| 'nvidia-cuda-runtime-cu12==12.9.*' \ | ||
| 'libcuopt-cu12==25.10.*' |
There was a problem hiding this comment.
Touching these because they came up in merge conflicts with #367
You have to single-quote specifiers with special shell characters like <, >, or * .... otherwise they may be treated specially by some shells and do surprising things.
rgsl888prabhu
left a comment
There was a problem hiding this comment.
@jameslamb Awesome work and thank you for handling this, have a minor suggestion for the doc, but rest looks good with respect to infra.
hlinsen
left a comment
There was a problem hiding this comment.
cpp changes look good. Thanks @jameslamb!
|
Thank you both!!! I'll merge this once CI passes. |
|
/merge |
Contributes to rapidsai/build-planning#208 Updates the `:latest` and `:25.10-latest` tags to CUDA 13.0.0. ## Notes for Reviewers ### is this safe to merge? Once these are in, I think yes: * [x] NVIDIA/cuopt#366 * [x] rapidsai/cugraph-gnn#286 At that point, the only thing it should affect are docs builds across repos that are already supporting CUDA 13 in all their other conda-based tests. Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Kyle Edwards (https://github.com/KyleFromNVIDIA) URL: #303
Contributes to rapidsai/build-planning#208 * updates all GitHub Actions branch references back to `branch-25.10`, now that rapidsai/shared-workflows#413 is merged * fixes docs mistakes (#366 (comment)) * fixes nightly builds Nightly wheel builds of `cuopt-mps-parser` are failing like this: > Error: Failed to CreateArtifact: Received non-retryable error: Failed request: (409) Conflict: an artifact with this name already exists on the workflow run ([wheel-build-cuopt-mps-parser link](https://github.com/NVIDIA/cuopt/actions/runs/17501959410/job/49716771454)) Because I forgot to bring over all of the artifact-naming changes made in `pr.yaml` to the corresponding entries on `build.yaml`, sorry 😬 Authors: - James Lamb (https://github.com/jameslamb) - Ramakrishnap (https://github.com/rgsl888prabhu) Approvers: - Ramakrishnap (https://github.com/rgsl888prabhu) URL: #377
Contributes to rapidsai/build-planning#208 * uses CUDA 13.0.0 to build and test * moves some dependency pins: - `cuda-python`: `>=12.9.2` (CUDA 12), `>=13.0.1` (CUDA 13) - `cupy`: `>=13.6.0` * declares `cuda-python` runtime dependency for wheels ([it was previously only declared for conda packages](https://github.com/NVIDIA/cuopt/blob/c62320447414c47f25ea67bb2570e05c7d0d29ac/conda/recipes/cuopt/recipe.yaml#L70)) Contributes to rapidsai/build-planning#68 * updates to CUDA 13 dependencies in fallback entries in `dependencies.yaml` matrices (i.e., the ones that get written to `pyproject.toml` in source control) ## Notes for Reviewers This switches GitHub Actions workflows to the `cuda13.0` branch from here: rapidsai/shared-workflows#413 A future round of PRs will revert that back to `branch-25.10`, once all of RAPIDS supports CUDA 13. ## Issue Closes #294 Authors: - James Lamb (https://github.com/jameslamb) - Ramakrishnap (https://github.com/rgsl888prabhu) Approvers: - Ramakrishnap (https://github.com/rgsl888prabhu) - Hugo Linsenmaier (https://github.com/hlinsen) URL: #366
Contributes to rapidsai/build-planning#208 * updates all GitHub Actions branch references back to `branch-25.10`, now that rapidsai/shared-workflows#413 is merged * fixes docs mistakes (#366 (comment)) * fixes nightly builds Nightly wheel builds of `cuopt-mps-parser` are failing like this: > Error: Failed to CreateArtifact: Received non-retryable error: Failed request: (409) Conflict: an artifact with this name already exists on the workflow run ([wheel-build-cuopt-mps-parser link](https://github.com/NVIDIA/cuopt/actions/runs/17501959410/job/49716771454)) Because I forgot to bring over all of the artifact-naming changes made in `pr.yaml` to the corresponding entries on `build.yaml`, sorry 😬 Authors: - James Lamb (https://github.com/jameslamb) - Ramakrishnap (https://github.com/rgsl888prabhu) Approvers: - Ramakrishnap (https://github.com/rgsl888prabhu) URL: #377
Description
Contributes to rapidsai/build-planning#208
cuda-python:>=12.9.2(CUDA 12),>=13.0.1(CUDA 13)cupy:>=13.6.0cuda-pythonruntime dependency for wheels (it was previously only declared for conda packages)Contributes to rapidsai/build-planning#68
dependencies.yamlmatrices (i.e., the ones that get written topyproject.tomlin source control)Notes for Reviewers
This switches GitHub Actions workflows to the
cuda13.0branch from here: rapidsai/shared-workflows#413A future round of PRs will revert that back to
branch-25.10, once all of RAPIDS supports CUDA 13.Issue
Closes #294
Checklist