Set up a new VM-based CI infrastructure #604

leofang · 2025-11-20T17:02:19Z

~~TODO: Rebase and write up descriptions~~

See WIP report here #604 (comment) and here #604 (comment).

UPDATE: This PR brings in a new CI infra that is a clone of what cuda-python uses today. The new CI is fully VM-based instead of container-based, except for

cibuildwheel launching a manylinux container
we launch a vanilla Ubuntu container for testing due to the requirement of nv-gha-runners

This is desirable because containers are the major bottleneck that we should seriously consider moving away from:

it takes time to pull on a per-PR basis
it blocks us from performing Day 1 rollout (to support new CUDA or new Python versions) which is now a requirement for CUDA Python
- related: it takes nontrivial amount of efforts in maintaining our own containers

Furthermore, my opinion is that we really need to make sure the CUDA Python CI infrastructure is "copy-paste-able" (with some caveats discussed internally, which I am not going to repeat here). It was designed with future application to CuPy and numba-cuda in mind, since at the Python level lots of our projects have similar/same support matrix, and there is no reason for each project to rebuild the wheel from scratch and suffer from maintenance issues

Currently this PR is made such that it runs in parallel with the old CI. We can have a separate PR to follow up and hook more old CI pieces with the new CI if we decide to move forward.

Below is a detailed breakdown of what this PR entails.

c1f1cec: copy/paste minimal CI infra from cuda-python, with zero change
85566c7: CI changes needed to tailor for numba-cuda needs
1b939c3: Enable the cibuildwheel GHA
950078e: Disable Python 3.14 for now
- This shows how easy it is to add support when a new Python version is out. For example, once Support python 3.14 #599 is merged we can revert this commit to start testing.
6fa44fc: a drive-by fix to update the warning when cuda-bindings is not installed
593fcf3: Ensure Linux executables can be found when installed by our custom fetch_ctk action
f183ebe: Ensure cuobjdump is installed by fetch_ctk (whose default does not include it)
72da422: Fixes to ensure the Makefile can be run in the new CI env (fully Bash-based on both Linux and Windows)
a973726: Suppress NVRTC warnings on V100 + CUDA 12 (otherwise they are turned into error by pytest)
df8f583: Fix to ensure libcudadevrt.a installed by fetch_ctk can be found
861eede: Ensure the tests that need cuobjdump can be skipped in a pure-wheel test env.

Commits 593fcf3 and df8f583 are bug fixes to fetch_ctk that we should backport to cuda-python.

brandon-b-miller · 2025-12-17T14:41:35Z

/ok to test 8e96a09

brandon-b-miller · 2025-12-17T14:53:51Z

/ok to test 64068b7

brandon-b-miller · 2025-12-17T15:20:04Z

/ok to test 48cca14

brandon-b-miller · 2025-12-17T15:53:37Z

/ok to test ccd8382

brandon-b-miller · 2025-12-17T16:11:18Z

/ok to test 476082b

brandon-b-miller · 2025-12-17T17:10:48Z

/ok to test 822b37d

brandon-b-miller · 2025-12-17T18:21:43Z

/ok to test cafdf68

greptile-apps · 2025-12-17T18:24:33Z

Greptile Summary

This PR introduces a new VM-based CI infrastructure cloned from cuda-python, replacing the container-based approach to enable faster builds and Day 1 rollout support for new CUDA/Python versions. The new CI runs in parallel with the existing infrastructure.

Key Changes:

Added new GitHub Actions workflows (ci-new.yaml, build-wheel.yml, test-wheel-linux.yml, test-wheel-windows.yml) with VM-based runners
Implemented test matrix configuration in JSON for flexible platform/Python/CUDA version combinations
Created custom actions for fetching CUDA toolkit, installing dependencies, and extracting PR numbers
Added CI utility scripts for environment setup, test execution, and artifact management
Updated pyproject.toml to use cuda-toolkit metapackage instead of individual CUDA components
Fixed test skips for wheel-only environments and NVVM bugs on newer compute capabilities
Updated error messages to reference correct package names (cuda-bindings)

Issues Found:

Line 202 in ci-new.yaml references non-existent needs.doc job that will cause workflow failure

Confidence Score: 4/5

Safe to merge after fixing the needs.doc reference bug in the checks job
The PR is well-structured with comprehensive CI infrastructure changes. However, there's a critical bug on line 202 of ci-new.yaml that references a non-existent job, which will cause the checks job to fail. Once fixed, the infrastructure appears solid with proper error handling, caching, and test coverage.
.github/workflows/ci-new.yaml requires immediate attention to fix the needs.doc reference

Important Files Changed

Filename	Overview
.github/workflows/ci-new.yaml	New main CI workflow added - references non-existent `needs.doc` job on line 202
.github/workflows/build-wheel.yml	Reusable workflow for building wheels across platforms with sccache support
.github/workflows/test-wheel-linux.yml	Linux test workflow with dynamic matrix computation from JSON config
.github/workflows/test-wheel-windows.yml	Windows test workflow with GPU driver mode switching and verification
.github/actions/fetch_ctk/action.yml	Action for fetching mini CUDA toolkit with caching and PATH setup
ci/test-matrix.json	Test matrix configuration defining Python/CUDA/GPU combinations for PR and nightly tests
pyproject.toml	Switched to `cuda-toolkit` metapackage and added cibuildwheel configuration

greptile-apps

Additional Comments (1)

.github/workflows/ci-new.yaml, line 202 (link)

logic: references non-existent job needs.doc

_{23 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

- Add arch specific target support (NVIDIA#549) - chore: disable `locked` flag to bypass prefix-dev/pixi#5256 (NVIDIA#714) - ci: relock pixi (NVIDIA#712) - ci: remove redundant conda build in ci (NVIDIA#711) - chore(deps): bump numba-cuda version and relock pixi (NVIDIA#707) - Dropping bits in the old CI & Propagating recent changes from cuda-python (NVIDIA#683) - Fix `test_wheel_deps_wheels.sh` to actually uninstall `nvvm` and `nvrtc` packages for CUDA 13 (NVIDIA#701) - perf: remove some exception control flow and buffer-exception penalization for arrays (NVIDIA#700) - perf: let CAI fall through instead of calling from_cuda_array_interface (NVIDIA#694) - chore: perf lint (NVIDIA#697) - chore(deps): bump deps in pixi lockfile (NVIDIA#693) - fix: use freethreading-supported `_PySet_NextItemRef` where possible (NVIDIA#682) - Support python `3.14` (NVIDIA#599) - Remove customized address space tracking and address class emission in debug info (NVIDIA#669) - Drop `experimental` from cuda.core namespace imports (NVIDIA#676) - Remove dangling references to NUMBA_CUDA_ENABLE_MINOR_VERSION_COMPATIBILITY (NVIDIA#675) - Use `rapidsai/sccache` in CI (NVIDIA#674) - chore(dev-deps): remove ipython and pyinstrument (NVIDIA#670) - Set up a new VM-based CI infrastructure (NVIDIA#604)

- Add arch specific target support (#549) - chore: disable `locked` flag to bypass prefix-dev/pixi#5256 (#714) - ci: relock pixi (#712) - ci: remove redundant conda build in ci (#711) - chore(deps): bump numba-cuda version and relock pixi (#707) - Dropping bits in the old CI & Propagating recent changes from cuda-python (#683) - Fix `test_wheel_deps_wheels.sh` to actually uninstall `nvvm` and `nvrtc` packages for CUDA 13 (#701) - perf: remove some exception control flow and buffer-exception penalization for arrays (#700) - perf: let CAI fall through instead of calling from_cuda_array_interface (#694) - chore: perf lint (#697) - chore(deps): bump deps in pixi lockfile (#693) - fix: use freethreading-supported `_PySet_NextItemRef` where possible (#682) - Support python `3.14` (#599) - Remove customized address space tracking and address class emission in debug info (#669) - Drop `experimental` from cuda.core namespace imports (#676) - Remove dangling references to NUMBA_CUDA_ENABLE_MINOR_VERSION_COMPATIBILITY (#675) - Use `rapidsai/sccache` in CI (#674) - chore(dev-deps): remove ipython and pyinstrument (#670) - Set up a new VM-based CI infrastructure (#604)

copy/paste minimal CI infra from cuda-python

c1f1cec