Add more thirdparty tests by gmarkall · Pull Request #586 · NVIDIA/numba-cuda

gmarkall · 2025-11-10T15:55:23Z

This adds tests of additional third party libraries and fixes up issues that were found during testing them. Notes on the changes:

The latest release, 0.6.0, of nvmath-python is used, and its device API tests are run.
nvmath-python patches types.number_domain, which was not handled by our vendoring. The changes in cudaimpl.py resolve this issue.
nvmath-python also creates extensions that can lead to a Numba-CUDA Signature class leaking into Numba, which checks for an instance of a Numba Signature only. The change in templates.py mitigates this by always constructing core Signature instances, if Numba is present.
I tested the code with my "Extending Numba CUDA" tutorials, and noticed that the Dim3 and GridGroup were not in our types package, so I've added these back. I haven't added these examples as part of our CI as there would be some extra work involved in converting them to tests, but I may add them in a future PR.
Awkward 2.8.10 is used for tests, as outlined in Running Awkward test suite as part of Numba-CUDA CI scikit-hep/awkward#3587
The nvmath-python tests take 45 minutes so I've set them to only run on pushes to main rather than slowing down CI for PR testing. This is a little less than ideal, but better than increasing the iteration time for testing in CI. The commit e85e8d7 demonstrates that they are successful (green tick from CI) before the commit fa92cb2 sets them to only run on main.
It may be possible to run a reduced set of nvmath-python tests for every PR, but I need to work with the nvmath-python team to understand how to do that without sacrificing too much coverage.

copy-pr-bot · 2025-11-10T15:55:27Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

gmarkall · 2025-11-10T15:55:30Z

/ok to test

cpcloud · 2025-11-10T16:02:39Z

Can we exercise the pixi build infrastructure for this? It looks like nvmath-python@0.6.0 is available on conda-forge.

gmarkall · 2025-11-10T17:49:37Z

/ok to test

gmarkall · 2025-11-10T17:50:31Z

/ok to test

gmarkall · 2025-11-10T17:56:34Z

Can we exercise the pixi build infrastructure for this? It looks like nvmath-python@0.6.0 is available on conda-forge.

I'm not sure - what does it mean for us to "exercise the pixi build infrastructure"? Where should I look to get an understanding? (Unfortunately I've not kept up with developments in this area so I'm not sure what to focus on to begin understanding the question)

gmarkall · 2025-11-10T18:08:12Z

/ok to test

gmarkall · 2025-11-11T09:47:04Z

/ok to test

gmarkall · 2025-11-11T10:28:59Z

/ok to test

gmarkall · 2025-11-12T08:53:14Z

/ok to test

gmarkall · 2025-11-12T10:11:58Z

/ok to test

gmarkall · 2025-11-12T10:23:16Z

/ok to test

gmarkall · 2025-11-12T10:41:58Z

/ok to test

gmarkall · 2025-11-12T15:29:07Z

/ok to test

gmarkall · 2025-11-12T15:51:30Z

/ok to test

gmarkall · 2025-11-12T16:37:14Z

/ok to test

gmarkall · 2025-11-12T17:50:31Z

/ok to test

gmarkall · 2025-11-12T19:17:07Z

/ok to test

gmarkall · 2025-11-17T20:22:54Z

/ok to test

copy-pr-bot · 2025-11-17T20:44:43Z

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

brandon-b-miller · 2025-11-18T02:50:40Z

ci/test_thirdparty_nvmath.sh

+# Required for nvmath-python to locate pip-install MathDx
+export SYS_PREFIX=`python -c "import sys; print(sys.prefix)"`
+export MATHDX_HOME=${SYS_PREFIX}/lib/python3.13/site-packages/nvidia/mathdx
+python -m pytest nvmath_tests/device --tb=native -x


RE: clogging up CI, did you try scaling out with multiple pytest workers?

I did, but the tests use a lot of memory so it's not easy to find a sweet spot that speeds things up and doesn't OOM.

brandon-b-miller

One non-blocking Q

gmarkall · 2025-11-18T10:19:50Z

/ok to test

gmarkall · 2025-11-19T12:11:26Z

Thanks for the reviews! I'm still needing to resolve the issue with pr-builder / run being considered skipped, prior to merging this.

gmarkall · 2025-11-20T08:34:59Z

/ok to test

- Add support for cache-hinted load and store operations (NVIDIA#587) - Add more thirdparty tests (NVIDIA#586) - Add sphinx-lint to pre-commit and fix errors (NVIDIA#597) - Add DWARF variant part support for polymorphic variables in CUDA debug info (NVIDIA#544) - chore: clean up dead workaround for unavailable `lru_cache` (NVIDIA#598) - chore(docs): format types docs (NVIDIA#596) - refactor: decouple `Context` from `Stream` and `Event` objects (NVIDIA#579) - Fix freezing in of constant arrays with negative strides (NVIDIA#589) - Update tests to accept variants of generated PTX (NVIDIA#585) - refactor: replace device functionality with `cuda.core` APIs (NVIDIA#581) - Move frontend tests to `cudapy` namespace (NVIDIA#558) - Generalize the concurrency group for main merges (NVIDIA#582) - ci: move pre-commit checks to pre commit action (NVIDIA#577) - chore(pixi): set up doc builds; remove most `build-conda` dependencies (NVIDIA#574) - ci: ensure that python version in ci matches matrix (NVIDIA#575) - Fix the `cuda.is_supported_version()` API (NVIDIA#571) - Fix checks on main (NVIDIA#576) - feat: add `math.nextafter` (NVIDIA#543) - ci: replace conda testing with pixi (NVIDIA#554) - [CI] Run PR workflow on merge to main (NVIDIA#572) - Propose Alternative Module Path for `ext_types` and Maintain `numba.cuda.types.bfloat16` Import API (NVIDIA#569) - test: enable fail-on-warn and clean up resulting failures (NVIDIA#529) - [Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific changes (NVIDIA#565) - Fix registration with Numba, vendor MakeFunctionToJITFunction tests (NVIDIA#566) - [Refactor][NFC][Cleanups] Update imports to upstream numba to use the numba.cuda modules (NVIDIA#561) - test: refactor process-based tests to use concurrent futures in order to simplify tests (NVIDIA#550) - test: revert back to ipc futures that await each iteration (NVIDIA#564) - chore(deps): move to self-contained pixi.toml to avoid mixed-pypi-pixi environments (NVIDIA#551) - [Refactor][NFC] Vendor-in errors for future CUDA-specific changes (NVIDIA#534) - Remove dependencies on target_extension for CUDA target (NVIDIA#555) - Relax the pinning to `cuda-core` to allow it floating across minor releases (NVIDIA#559) - [WIP] Port numpy reduction tests to CUDA (NVIDIA#523) - ci: add timeout to avoid blocking the job queue (NVIDIA#556) - Handle `cuda.core.Stream` in driver operations (NVIDIA#401) - feat: add support for `math.exp2` (NVIDIA#541) - Vendor in types and datamodel for CUDA-specific changes (NVIDIA#533) - refactor: cleanup device constructor (NVIDIA#548) - bench: add cupy to array constructor kernel launch benchmarks (NVIDIA#547) - perf: cache dimension computations (NVIDIA#542) - perf: remove duplicated size computation (NVIDIA#537) - chore(perf): add torch to benchmark (NVIDIA#539) - test: speed up ipc tests by ~6.5x (NVIDIA#527) - perf: speed up kernel launch (NVIDIA#510) - perf: remove context threading in various pointer abstractions (NVIDIA#536) - perf: reduce the number of `__cuda_array_interface__` accesses (NVIDIA#538) - refactor: remove unnecessary custom map and set implementations (NVIDIA#530) - [Refactor][NFC] Vendor-in vectorize decorators for future CUDA-specific changes (NVIDIA#513) - test: add benchmarks for kernel launch for reproducibility (NVIDIA#528) - test(pixi): update pixi testing command to work with the new `testing` directory (NVIDIA#522) - refactor: fully remove `USE_NV_BINDING` (NVIDIA#525) - Draft: Vendor in the IR module (NVIDIA#439) - pyproject.toml: add search path for Pyrefly (NVIDIA#524) - Vendor in numba.core.typing for CUDA-specific changes (NVIDIA#473) - Use numba.config when available, otherwise use numba.cuda.config (NVIDIA#497) - [MNT] Drop NUMBA_CUDA_USE_NVIDIA_BINDING; always use cuda.core and cuda.bindings as fallback (NVIDIA#479) - Vendor in dispatcher, entrypoints, pretty_annotate for CUDA-specific changes (NVIDIA#502) - build: allow parallelization of nvcc testing builds (NVIDIA#521) - chore(dev-deps): add pixi (NVIDIA#505) - Vendor the imputils module for CUDA refactoring (NVIDIA#448) - Don't use `MemoryLeakMixin` for tests that don't use NRT (NVIDIA#519) - Switch back to stable cuDF release in thirdparty tests (NVIDIA#518) - Updating .gitignore with binaries in the `testing` folder (NVIDIA#516) - Remove some unnecessary uses of ContextResettingTestCase (NVIDIA#507) - Vendor in _helperlib cext for CUDA-specific changes (NVIDIA#512) - Vendor in typeconv for future CUDA-specific changes (NVIDIA#499) - [Refactor][NFC] Vendor-in numba.cpython modules for future CUDA-specific changes (NVIDIA#493) - [Refactor][NFC] Vendor-in numba.np modules for future CUDA-specific changes (NVIDIA#494) - Make the CUDA target the default for CUDA overload decorators (NVIDIA#511) - Remove C extension loading hacks (NVIDIA#506) - Ensure NUMBA can manipulate memory from CUDA graphs before the graph is launched (NVIDIA#437) - [Refactor][NFC] Vendor-in core Numba analysis utils for CUDA-specific changes (NVIDIA#433) - Fix Bf16 Test OB Error (NVIDIA#509) - Vendor in components from numba.core.runtime for CUDA-specific changes (NVIDIA#498) - [Refactor] Vendor in _dispatcher, _devicearray, mviewbuf C extension for CUDA-specific customization (NVIDIA#373) - [MNT] Managed UM memset fallback and skip CUDA IPC tests on WSL2 (NVIDIA#488) - Improve debug value range coverage (NVIDIA#461) - Add `compile_all` API (NVIDIA#484) - Vendor in core.registry for CUDA-specific changes (NVIDIA#485) - [Refactor][NFC] Vendor in numba.misc for CUDA-specific changes (NVIDIA#457) - Vendor in optional, boxing for CUDA-specific changes, fix dangling imports (NVIDIA#476) - [test] Remove dependency on cpu_target (NVIDIA#490) - Change dangling imports of numba.core.lowering to numba.cuda.lowering (NVIDIA#475) - [test] Use numpy's tolerance for float16 (NVIDIA#491) - [Refactor][NFC] Vendor-in numba.extending for future CUDA-specific changes (NVIDIA#466) - [Refactor][NFC] Vendor-in more cpython registries for future CUDA-specific changes (NVIDIA#478)

- Add support for cache-hinted load and store operations (#587) - Add more thirdparty tests (#586) - Add sphinx-lint to pre-commit and fix errors (#597) - Add DWARF variant part support for polymorphic variables in CUDA debug info (#544) - chore: clean up dead workaround for unavailable `lru_cache` (#598) - chore(docs): format types docs (#596) - refactor: decouple `Context` from `Stream` and `Event` objects (#579) - Fix freezing in of constant arrays with negative strides (#589) - Update tests to accept variants of generated PTX (#585) - refactor: replace device functionality with `cuda.core` APIs (#581) - Move frontend tests to `cudapy` namespace (#558) - Generalize the concurrency group for main merges (#582) - ci: move pre-commit checks to pre commit action (#577) - chore(pixi): set up doc builds; remove most `build-conda` dependencies (#574) - ci: ensure that python version in ci matches matrix (#575) - Fix the `cuda.is_supported_version()` API (#571) - Fix checks on main (#576) - feat: add `math.nextafter` (#543) - ci: replace conda testing with pixi (#554) - [CI] Run PR workflow on merge to main (#572) - Propose Alternative Module Path for `ext_types` and Maintain `numba.cuda.types.bfloat16` Import API (#569) - test: enable fail-on-warn and clean up resulting failures (#529) - [Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific changes (#565) - Fix registration with Numba, vendor MakeFunctionToJITFunction tests (#566) - [Refactor][NFC][Cleanups] Update imports to upstream numba to use the numba.cuda modules (#561) - test: refactor process-based tests to use concurrent futures in order to simplify tests (#550) - test: revert back to ipc futures that await each iteration (#564) - chore(deps): move to self-contained pixi.toml to avoid mixed-pypi-pixi environments (#551) - [Refactor][NFC] Vendor-in errors for future CUDA-specific changes (#534) - Remove dependencies on target_extension for CUDA target (#555) - Relax the pinning to `cuda-core` to allow it floating across minor releases (#559) - [WIP] Port numpy reduction tests to CUDA (#523) - ci: add timeout to avoid blocking the job queue (#556) - Handle `cuda.core.Stream` in driver operations (#401) - feat: add support for `math.exp2` (#541) - Vendor in types and datamodel for CUDA-specific changes (#533) - refactor: cleanup device constructor (#548) - bench: add cupy to array constructor kernel launch benchmarks (#547) - perf: cache dimension computations (#542) - perf: remove duplicated size computation (#537) - chore(perf): add torch to benchmark (#539) - test: speed up ipc tests by ~6.5x (#527) - perf: speed up kernel launch (#510) - perf: remove context threading in various pointer abstractions (#536) - perf: reduce the number of `__cuda_array_interface__` accesses (#538) - refactor: remove unnecessary custom map and set implementations (#530) - [Refactor][NFC] Vendor-in vectorize decorators for future CUDA-specific changes (#513) - test: add benchmarks for kernel launch for reproducibility (#528) - test(pixi): update pixi testing command to work with the new `testing` directory (#522) - refactor: fully remove `USE_NV_BINDING` (#525) - Draft: Vendor in the IR module (#439) - pyproject.toml: add search path for Pyrefly (#524) - Vendor in numba.core.typing for CUDA-specific changes (#473) - Use numba.config when available, otherwise use numba.cuda.config (#497) - [MNT] Drop NUMBA_CUDA_USE_NVIDIA_BINDING; always use cuda.core and cuda.bindings as fallback (#479) - Vendor in dispatcher, entrypoints, pretty_annotate for CUDA-specific changes (#502) - build: allow parallelization of nvcc testing builds (#521) - chore(dev-deps): add pixi (#505) - Vendor the imputils module for CUDA refactoring (#448) - Don't use `MemoryLeakMixin` for tests that don't use NRT (#519) - Switch back to stable cuDF release in thirdparty tests (#518) - Updating .gitignore with binaries in the `testing` folder (#516) - Remove some unnecessary uses of ContextResettingTestCase (#507) - Vendor in _helperlib cext for CUDA-specific changes (#512) - Vendor in typeconv for future CUDA-specific changes (#499) - [Refactor][NFC] Vendor-in numba.cpython modules for future CUDA-specific changes (#493) - [Refactor][NFC] Vendor-in numba.np modules for future CUDA-specific changes (#494) - Make the CUDA target the default for CUDA overload decorators (#511) - Remove C extension loading hacks (#506) - Ensure NUMBA can manipulate memory from CUDA graphs before the graph is launched (#437) - [Refactor][NFC] Vendor-in core Numba analysis utils for CUDA-specific changes (#433) - Fix Bf16 Test OB Error (#509) - Vendor in components from numba.core.runtime for CUDA-specific changes (#498) - [Refactor] Vendor in _dispatcher, _devicearray, mviewbuf C extension for CUDA-specific customization (#373) - [MNT] Managed UM memset fallback and skip CUDA IPC tests on WSL2 (#488) - Improve debug value range coverage (#461) - Add `compile_all` API (#484) - Vendor in core.registry for CUDA-specific changes (#485) - [Refactor][NFC] Vendor in numba.misc for CUDA-specific changes (#457) - Vendor in optional, boxing for CUDA-specific changes, fix dangling imports (#476) - [test] Remove dependency on cpu_target (#490) - Change dangling imports of numba.core.lowering to numba.cuda.lowering (#475) - [test] Use numpy's tolerance for float16 (#491) - [Refactor][NFC] Vendor-in numba.extending for future CUDA-specific changes (#466) - [Refactor][NFC] Vendor-in more cpython registries for future CUDA-specific changes (#478)

gmarkall added 2 commits November 10, 2025 15:33

Rename test-thirdparty to test-thirdparty-cudf

7748ef6

Add nvmath-python tests

307e23a

gmarkall added the 2 - In Progress Currently a work in progress label Nov 10, 2025

Attempt to fix nvmath test script

9924815

There is no conda prefix when there's no conda

ce12ec6

Try installing nvidia-cutlass

78b8afe

gmarkall added 2 commits November 11, 2025 10:16

Correct typo in quotes

e23de8d

Fail fast on nvmath tests and use native traceback while debuging

a6899c5

Generate Numba signature objects when Numba is in use

f28dfe2

Consider core Numba number domain and struct model in array type check

4b513c3

Install dx tests requirements too

cfc81d1

gmarkall added 2 commits November 12, 2025 10:33

Add Dim3 and GridGroup to types module

dc1ae2b

Try only installing nvidia-mthdx directly

6ab470b

Add thirdparty tests of Awkward Array

f9e0af7

Use correct wheel for CuPy

f7f7164

Correct typo in awkward Github URL

baf15cd

Disable benchmarks for awkward tests

7897334

Don't run awkward tests with -v because it makes the log huge

e85e8d7

gmarkall added 2 commits November 17, 2025 20:17

Merge remote-tracking branch 'NVIDIA/main' into thirdparty-tests

06e3eac

Only run nvmath tests on push to main

fa92cb2

gmarkall added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Nov 17, 2025

gmarkall marked this pull request as ready for review November 17, 2025 20:44

brandon-b-miller reviewed Nov 18, 2025

View reviewed changes

brandon-b-miller approved these changes Nov 18, 2025

View reviewed changes

Attempt to make pr job succeed when nvmath-python tests skipped

9e5c87e

rparolin self-requested a review November 19, 2025 00:29

rparolin approved these changes Nov 19, 2025

View reviewed changes

cryos and others added 2 commits November 20, 2025 08:34

Add in a dedicated workflow for merged commits

ed99309

Merge remote-tracking branch 'NVIDIA/main' into thirdparty-tests

ae4f2d1

gmarkall mentioned this pull request Nov 20, 2025

Thirdparty tests #601

Closed

gmarkall merged commit 9862f7e into NVIDIA:main Nov 20, 2025
71 checks passed

gmarkall added 5 - Ready to merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Nov 20, 2025

gmarkall mentioned this pull request Nov 20, 2025

Bump version to 0.21.0 #602

Merged

Conversation

gmarkall commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Nov 10, 2025

Uh oh!

gmarkall commented Nov 10, 2025

Uh oh!

cpcloud commented Nov 10, 2025

Uh oh!

gmarkall commented Nov 10, 2025

Uh oh!

gmarkall commented Nov 10, 2025

Uh oh!

gmarkall commented Nov 10, 2025

Uh oh!

gmarkall commented Nov 10, 2025

Uh oh!

gmarkall commented Nov 11, 2025

Uh oh!

gmarkall commented Nov 11, 2025

Uh oh!

gmarkall commented Nov 12, 2025

Uh oh!

gmarkall commented Nov 12, 2025

Uh oh!

gmarkall commented Nov 12, 2025

Uh oh!

gmarkall commented Nov 12, 2025

Uh oh!

gmarkall commented Nov 12, 2025

Uh oh!

gmarkall commented Nov 12, 2025

Uh oh!

gmarkall commented Nov 12, 2025

Uh oh!

gmarkall commented Nov 12, 2025

Uh oh!

gmarkall commented Nov 12, 2025

Uh oh!

gmarkall commented Nov 17, 2025

Uh oh!

copy-pr-bot bot commented Nov 17, 2025

Uh oh!

brandon-b-miller Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

gmarkall Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

brandon-b-miller left a comment

Choose a reason for hiding this comment

Uh oh!

gmarkall commented Nov 18, 2025

Uh oh!

gmarkall commented Nov 19, 2025

Uh oh!

gmarkall commented Nov 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

gmarkall commented Nov 10, 2025 •

edited

Loading