-
Notifications
You must be signed in to change notification settings - Fork 55
CI: Add CUDA 13.1 testing support #705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: leofang <[email protected]>
Co-authored-by: leofang <[email protected]>
|
@copilot Thanks. Please kindly comment out the free-threading (3.14t) pipelines for now. We don't support it yet in this repo, but once we get to it we want to re-enable the pipelines while staying in sync with cuda-python. |
Co-authored-by: leofang <[email protected]>
Done in commit e4e0bff. All 7 free-threading (3.14t) pipeline configurations have been commented out:
The configurations remain in the file so they can be easily re-enabled when free-threading support is added. |
|
/test mini e4e0bff |
|
/ok to test e4e0bff |
(Context switching error... CuPy uses a different command lol) |
Greptile SummaryThis PR successfully adds CUDA 13.1.0 testing support to the CI pipeline while maintaining build compatibility with nvjitlink 13.0.2. The changes are well-structured and properly handle known test failures on specific hardware configurations. Key changes:
Confidence Score: 5/5
Important Files Changed
|
|
@copilot
|
Co-authored-by: leofang <[email protected]>
Reverted in commit 0158d34. The |
Co-authored-by: leofang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR successfully adds CUDA 13.1.0 testing support across all Python versions and architectures, maintaining proper version ordering and configuration consistency.
Key Changes:
- Added 16 new CUDA 13.1.0 test configurations (10 Linux, 3 Windows, 1 special runner, 2 Linux arm64)
- Kept build version at 13.0.2 with clear documentation explaining nvjitlink compatibility requirements
- Commented out Python 3.14t (free-threading) configurations as not yet supported
- Adjusted some Python 3.14 GPU assignments (e.g.,
rtxpro6000→l4) for consistency with other Python versions - Maintained proper YAML structure and sorting order by
[ARCH, PY_VER, CUDA_VER, LOCAL_CTK, GPU, DRIVER]
Configuration Summary:
- CUDA 12.9.1: 16 configs
- CUDA 13.0.2: 17 configs (2 special runners)
- CUDA 13.1.0: 16 configs (new)
- Total: 49 active test configurations
The changes appropriately adapt cuda-python#1315 to numba-cuda's repository structure while preserving existing CUDA 12.9.1 support for Python 3.14.
Confidence Score: 5/5
- This PR is safe to merge with no issues found
- The changes are well-documented, maintain proper YAML structure and sorting order, and appropriately add CUDA 13.1.0 testing support. The build version correctly remains at 13.0.2 with clear explanation of nvjitlink compatibility requirements. All configuration additions follow existing patterns and the PR description accurately reflects the changes made.
- No files require special attention
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| ci/versions.yml | 5/5 | Added helpful comment explaining why build version stays at 13.0.2 for nvjitlink compatibility |
| ci/test-matrix.yml | 5/5 | Added 16 CUDA 13.1.0 test configs across Linux/Windows/special runners, commented out 3.14t configs, adjusted some Python 3.14 GPU assignments for consistency |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
Added CUDA 13.1.0 testing support across all platforms and Python versions (3.10-3.14), bringing total active test configurations to 47. The implementation properly keeps the build version at 13.0.2 for nvjitlink compatibility while expanding test coverage to CUDA 13.1.0.
- Added 16 active CUDA 13.1.0 test configurations (10 for Linux pull-request, 5 for Windows pull-request, 1 for Linux special_runners with H100 GPU)
- Commented out 7 Python 3.14t (free-threading) configurations across all CUDA versions, maintaining sync with cuda-python
- Added clear documentation in
ci/versions.ymlexplaining why build version remains at 13.0.2 despite testing against 13.1.0 - Updated GPU assignments for some Python 3.14 configurations (e.g.,
rtxpro6000→l4) to align with infrastructure requirements - Maintained existing CUDA 12.9.1 and 13.0.2 test configurations
The changes successfully propagate CUDA 13.1.0 testing support from cuda-python#1315 while respecting numba-cuda's specific requirements around nvjitlink version compatibility.
Confidence Score: 5/5
- This PR is safe to merge with no identified issues
- The changes are well-documented, maintain backward compatibility, and follow the established pattern from cuda-python#1315. The YAML syntax is valid, configuration counts are accurate, and the build/test version strategy is clearly explained with proper rationale for nvjitlink compatibility.
- No files require special attention
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| ci/versions.yml | 5/5 | Added documentation comment explaining why build version stays at 13.0.2 for nvjitlink compatibility |
| ci/test-matrix.yml | 4/5 | Added 16 CUDA 13.1.0 test configs across platforms, commented out 3.14t configs, changed some GPU assignments for Python 3.14 |
|
/ok to test 1fb2e77 |
|
@copilot please skip the failing tests (calling |
|
@copilot I think you skipped too many tests. Please make sure you read the CI logs and only skip the failing tests (as of commit 1fb2e77). |
I don't have direct access to the CI logs from commit 1fb2e77. Could you please specify which specific tests failed? I skipped these 11 tests:
I did NOT skip test_all_basic and test_any_basic. Should I only skip a subset of the above 11, or are there different tests from other files that failed? |
@copilot Only 5 tests failed: |
Co-authored-by: leofang <[email protected]>
Fixed in commit 56362b4. Removed the skip decorator from the 6 tests that didn't fail (test_var_basic, test_std_basic, test_min_basic, test_max_basic, test_nanmin_basic, test_nanmax_basic). Now only the 5 tests that actually failed in the CI logs are skipped: test_sum_basic, test_mean_basic, test_nansum_basic, test_nanmean_basic, and test_nanprod_basic. |
|
/ok to test 56362b4 |
|
@copilot please fix the pre-commit error, see log at |
Fixed in commit 239d825. The issue was that |
|
/ok to test 239d825 |
239d825 to
76fa015
Compare
|
/ok to test 76fa015 |
|
/ok to test 7d3f80a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
Request: Can you update PR description with why nansum is failing on sm120? Thank you!
|
- Add Python 3.14 to the wheel publishing matrix (NVIDIA#750) - feat: swap out internal device array usage with `StridedMemoryView` (NVIDIA#703) - Fix max block size computation in `forall` (NVIDIA#744) - Fix prologue debug line info pointing to decorator instead of def line (NVIDIA#746) - Fix kernel return type in DISubroutineType debug metadata (NVIDIA#745) - Fix missing line info in Jupyter notebooks (NVIDIA#742) - Fix: Pass correct flags to linker when debugging in the presence of LTOIR code (NVIDIA#698) - chore(deps): add cuda-pathfinder to pixi deps (NVIDIA#741) - fix: enable flake8-bugbear lints and fix found problems (NVIDIA#708) - fix: Fix race condition in CUDA Simulator (NVIDIA#690) - ci: run tests in parallel (NVIDIA#740) - feat: users can pass `shared_memory_carveout` to @cuda.jit (NVIDIA#642) - Fix compatibility with NumPy 2.4: np.trapz and np.in1d removed (NVIDIA#739) - Pass the -numba-debug flag to libnvvm (NVIDIA#681) - ci: remove rapids containers from conda ci (NVIDIA#737) - Use `pathfinder` for dynamic libraries (NVIDIA#308) - CI: Add CUDA 13.1 testing support (NVIDIA#705) - Adding `pixi run test` and `pixi run test-par` support (NVIDIA#724) - Disable per-PR nvmath tests + follow same test practice (NVIDIA#723) - chore(deps): regenerate pixi lockfile (NVIDIA#722) - Fix DISubprogram line number to point to function definition line (NVIDIA#695) - revert: chore(dev): build pixi using rattler (NVIDIA#713) (NVIDIA#719) - [feat] Initial version of the Numba CUDA GDB pretty-printer (NVIDIA#692) - chore(dev): build pixi using rattler (NVIDIA#713) - build(deps): bump the actions-monthly group across 1 directory with 8 updates (NVIDIA#704)
- Add Python 3.14 to the wheel publishing matrix (#750) - feat: swap out internal device array usage with `StridedMemoryView` (#703) - Fix max block size computation in `forall` (#744) - Fix prologue debug line info pointing to decorator instead of def line (#746) - Fix kernel return type in DISubroutineType debug metadata (#745) - Fix missing line info in Jupyter notebooks (#742) - Fix: Pass correct flags to linker when debugging in the presence of LTOIR code (#698) - chore(deps): add cuda-pathfinder to pixi deps (#741) - fix: enable flake8-bugbear lints and fix found problems (#708) - fix: Fix race condition in CUDA Simulator (#690) - ci: run tests in parallel (#740) - feat: users can pass `shared_memory_carveout` to @cuda.jit (#642) - Fix compatibility with NumPy 2.4: np.trapz and np.in1d removed (#739) - Pass the -numba-debug flag to libnvvm (#681) - ci: remove rapids containers from conda ci (#737) - Use `pathfinder` for dynamic libraries (#308) - CI: Add CUDA 13.1 testing support (#705) - Adding `pixi run test` and `pixi run test-par` support (#724) - Disable per-PR nvmath tests + follow same test practice (#723) - chore(deps): regenerate pixi lockfile (#722) - Fix DISubprogram line number to point to function definition line (#695) - revert: chore(dev): build pixi using rattler (#713) (#719) - [feat] Initial version of the Numba CUDA GDB pretty-printer (#692) - chore(dev): build pixi using rattler (#713) - build(deps): bump the actions-monthly group across 1 directory with 8 updates (#704) <!-- Thank you for contributing to numba-cuda :) Here are some guidelines to help the review process go smoothly. 1. Please write a description in this text box of the changes that are being made. 2. Please ensure that you have written units tests for the changes made/features added. 3. If you are closing an issue please use one of the automatic closing words as noted here: https://help.github.com/articles/closing-issues-using-keywords/ 4. If your pull request is not ready for review but you want to make use of the continuous integration testing facilities please label it with `[WIP]`. 5. If your pull request is ready to be reviewed without requiring additional work on top of it, then remove the `[WIP]` label (if present) and replace it with `[REVIEW]`. If assistance is required to complete the functionality, for example when the C/C++ code of a feature is complete but Python bindings are still required, then add the label `[HELP-REQ]` so that others can triage and assist. The additional changes then can be implemented on top of the same PR. If the assistance is done by members of the rapidsAI team, then no additional actions are required by the creator of the original PR for this, otherwise the original author of the PR needs to give permission to the person(s) assisting to commit to their personal fork of the project. If that doesn't happen then a new PR based on the code of the original PR can be opened by the person assisting, which then will be the PR that will be merged. 6. Once all work has been done and review has taken place please do not add features or make changes out of the scope of those requested by the reviewer (doing this just add delays as already reviewed code ends up having to be re-reviewed/it is hard to tell what is new etc!). Further, please do not rebase your branch on main/force push/rewrite history, doing any of these causes the context of any comments made by reviewers to be lost. If conflicts occur against main they should be resolved by merging main into the branch used for making the pull request. Many thanks in advance for your cooperation! -->
Summary: Add CUDA 13.1 Testing Support ✅
Successfully propagated CUDA 13.1 test configurations from NVIDIA/cuda-python#1315 while maintaining the correct build version for artifact generation and handling known test failures.
ci/test-matrix.ymlto add CUDA 13.1.0 test configurations for all Python versions and architecturesci/versions.ymlat CUDA 13.0.2 for build artifacts (reverted from 13.1.0)Changes Made
1. ci/versions.yml
cuda.build.versionat13.0.2(reverted from temporary change to 13.1.0)2. ci/test-matrix.yml
Added 16 new CUDA 13.1.0 test configurations (excluding 3.14t):
Linux (10 active configs, 2 commented out):
Windows (3 active configs, 3 commented out):
Special runners (1 config):
3. Test Skips for nvjitlink 13.1 + sm_120
Added skip decorator to handle known link-time failures on sm_120 GPUs with nvjitlink 13.1:
New skip decorator in
numba_cuda/numba/cuda/testing.py:skip_on_nvjitlink_13_1_sm_120(): Skips tests when nvjitlink version is 13.1.x AND compute capability is 12.0 (sm_120)Applied to only the 5 failing tests in
numba_cuda/numba/cuda/tests/cudapy/test_array_reductions.py:test_sum_basic✓test_mean_basic✓test_nansum_basic✓test_nanmean_basic✓test_nanprod_basic✓Tests that are NOT skipped (these passed in CI):
These 5 tests fail at link time with the specific combination of nvjitlink 13.1 and sm_120 GPUs due to an internal compiler error in ptxas.
4. Code Formatting
5. Preserved Configurations
Maintained existing CUDA 12.9.1 configurations for Python 3.14 on Linux
6. Free-Threading (3.14t) Pipelines
Commented out 7 free-threading configurations to stay in sync with cuda-python
Final Configuration Summary
Build Configuration:
cuda.build.version: 13.0.2 (kept at lowest 13.0 version for artifact generation and nvjitlink compatibility)cuda.prev_build.version: 12.9.1Total active test configurations:
CUDA version breakdown:
Validation ✅
Key Points
Why build version stays at 13.0.2:
cuda.build.version(13.0.2)Test Skip Strategy:
Test Coverage Strategy:
Changes from cuda-python#1315 not applicable to numba-cuda:
merge_cuda_core_wheels.py- file doesn't exist in numba-cudarun-tests- numba-cuda has different structurebackport_branchremoval - numba-cuda never had this fieldRepository-specific considerations:
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.