[test] Remove dependency on cpu_target #490

ashermancinelli · 2025-09-25T20:40:53Z

Parts of test_ir_utils were vendored in from numba and repurposed for the cuda target, however it was still using the CPU target for setting up context handlers. We may not need the CPU target at all; try to remove dependency on it.

Parts of test_ir_utils was vendored in from numba and repurposed for the cuda target, however it was still using the CPU target. We may not need the CPU target at all; try to remove dependency on it.

copy-pr-bot · 2025-09-25T20:40:57Z

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

ashermancinelli · 2025-09-25T20:41:04Z

/ok to test 0688e8e

ashermancinelli · 2025-09-25T20:49:54Z

/ok to test 3462618

VijayKandiah · 2025-09-25T21:11:05Z

LGTM!

brandon-b-miller · 2025-09-26T12:40:00Z

numba_cuda/numba/cuda/tests/cudapy/test_ir_utils.py

                if not flags:
                    flags = Flags()
-                flags.nrt = True
+                # flags.nrt = True


what happens if this is enabled?

Thanks, I've been chatting with Atmn about this offline. There's an issue with our configuration flags, but I think I can fix it in this PR.

Okay, I confirmed with Atmn and Vijay. Earlier, Graham left this comment with respect to NRT:

GM: We should not be vendoring this, references to it in the code should be removed

So I think it's safe to remove this commented-out line.

To answer your specific question what happens if this is enabled?, we get a runtime error because flags.nrt is immutably-inherited from Numba's configuration.

How does this sound to you?

…utils-cpu-target

ashermancinelli · 2025-09-26T16:43:55Z

/ok to test 4c168b6

ashermancinelli · 2025-09-26T17:06:07Z

Test failure unrelated:

FAILED ../numba_cuda/numba/cuda/tests/cudapy/test_intrinsics.py::TestCudaIntrinsic::test_fp16_intrinsics_common - AssertionError: 
Not equal to tolerance rtol=1e-07, atol=0

Mismatched elements: 4 / 32 (12.5%)
Max absolute difference among violations: 0.0004883
Max relative difference among violations: 0.00068
 ACTUAL: array([-0.293 , -0.8447, -0.7266,  0.8687,  0.3357,  0.654 , -0.1199,
       -0.2788,  0.809 ,  0.98  , -0.823 , -0.75  , -0.5767, -0.9927,
        0.4475, -0.9897,  0.991 ,  0.316 ,  0.6543,  0.0396,  0.971 ,...
 DESIRED: array([-0.293 , -0.8447, -0.7266,  0.8687,  0.3357,  0.654 , -0.1199,
       -0.2788,  0.809 ,  0.98  , -0.8237, -0.75  , -0.5767, -0.9927,
        0.4475, -0.9897,  0.991 ,  0.316 ,  0.6543,  0.0396,  0.971 ,...
==== 1 failed, 723 passed, 1017 skipped, 6 xfailed, 103 warnings in 48.60s =====

Some CI runs were spuriously failing due to overly stringent tolerance for fp16 tests (NVIDIA#490). Use numpy's epsilon for 16b floats instead of the default relative tolerance for np.testing.assert_allclose().

Some CI runs were spuriously failing due to overly stringent tolerance for fp16 tests (#490). Use numpy's epsilon for 16b floats instead of the default relative tolerance for np.testing.assert_allclose(). This PR passes `CUDATestCase.FLOAT16_RTOL` to relevant calls to `np.testing.assert_allclose`. Alternatively, we could add `assertAllClose` to `CUDATestCase` that infers the appropriate relative tolerance from the data types passed in. Please let me know if we'd like to go that route instead. Thanks in advance!

ashermancinelli · 2025-09-29T18:38:30Z

/ok to test dd9741b

ashermancinelli · 2025-09-29T19:13:48Z

/ok to test 4602903

Some CI runs were spuriously failing due to overly stringent tolerance for fp16 tests (NVIDIA#490). Use numpy's epsilon for 16b floats instead of the default relative tolerance for np.testing.assert_allclose(). This PR passes `CUDATestCase.FLOAT16_RTOL` to relevant calls to `np.testing.assert_allclose`. Alternatively, we could add `assertAllClose` to `CUDATestCase` that infers the appropriate relative tolerance from the data types passed in. Please let me know if we'd like to go that route instead. Thanks in advance!

- Add support for cache-hinted load and store operations (NVIDIA#587) - Add more thirdparty tests (NVIDIA#586) - Add sphinx-lint to pre-commit and fix errors (NVIDIA#597) - Add DWARF variant part support for polymorphic variables in CUDA debug info (NVIDIA#544) - chore: clean up dead workaround for unavailable `lru_cache` (NVIDIA#598) - chore(docs): format types docs (NVIDIA#596) - refactor: decouple `Context` from `Stream` and `Event` objects (NVIDIA#579) - Fix freezing in of constant arrays with negative strides (NVIDIA#589) - Update tests to accept variants of generated PTX (NVIDIA#585) - refactor: replace device functionality with `cuda.core` APIs (NVIDIA#581) - Move frontend tests to `cudapy` namespace (NVIDIA#558) - Generalize the concurrency group for main merges (NVIDIA#582) - ci: move pre-commit checks to pre commit action (NVIDIA#577) - chore(pixi): set up doc builds; remove most `build-conda` dependencies (NVIDIA#574) - ci: ensure that python version in ci matches matrix (NVIDIA#575) - Fix the `cuda.is_supported_version()` API (NVIDIA#571) - Fix checks on main (NVIDIA#576) - feat: add `math.nextafter` (NVIDIA#543) - ci: replace conda testing with pixi (NVIDIA#554) - [CI] Run PR workflow on merge to main (NVIDIA#572) - Propose Alternative Module Path for `ext_types` and Maintain `numba.cuda.types.bfloat16` Import API (NVIDIA#569) - test: enable fail-on-warn and clean up resulting failures (NVIDIA#529) - [Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific changes (NVIDIA#565) - Fix registration with Numba, vendor MakeFunctionToJITFunction tests (NVIDIA#566) - [Refactor][NFC][Cleanups] Update imports to upstream numba to use the numba.cuda modules (NVIDIA#561) - test: refactor process-based tests to use concurrent futures in order to simplify tests (NVIDIA#550) - test: revert back to ipc futures that await each iteration (NVIDIA#564) - chore(deps): move to self-contained pixi.toml to avoid mixed-pypi-pixi environments (NVIDIA#551) - [Refactor][NFC] Vendor-in errors for future CUDA-specific changes (NVIDIA#534) - Remove dependencies on target_extension for CUDA target (NVIDIA#555) - Relax the pinning to `cuda-core` to allow it floating across minor releases (NVIDIA#559) - [WIP] Port numpy reduction tests to CUDA (NVIDIA#523) - ci: add timeout to avoid blocking the job queue (NVIDIA#556) - Handle `cuda.core.Stream` in driver operations (NVIDIA#401) - feat: add support for `math.exp2` (NVIDIA#541) - Vendor in types and datamodel for CUDA-specific changes (NVIDIA#533) - refactor: cleanup device constructor (NVIDIA#548) - bench: add cupy to array constructor kernel launch benchmarks (NVIDIA#547) - perf: cache dimension computations (NVIDIA#542) - perf: remove duplicated size computation (NVIDIA#537) - chore(perf): add torch to benchmark (NVIDIA#539) - test: speed up ipc tests by ~6.5x (NVIDIA#527) - perf: speed up kernel launch (NVIDIA#510) - perf: remove context threading in various pointer abstractions (NVIDIA#536) - perf: reduce the number of `__cuda_array_interface__` accesses (NVIDIA#538) - refactor: remove unnecessary custom map and set implementations (NVIDIA#530) - [Refactor][NFC] Vendor-in vectorize decorators for future CUDA-specific changes (NVIDIA#513) - test: add benchmarks for kernel launch for reproducibility (NVIDIA#528) - test(pixi): update pixi testing command to work with the new `testing` directory (NVIDIA#522) - refactor: fully remove `USE_NV_BINDING` (NVIDIA#525) - Draft: Vendor in the IR module (NVIDIA#439) - pyproject.toml: add search path for Pyrefly (NVIDIA#524) - Vendor in numba.core.typing for CUDA-specific changes (NVIDIA#473) - Use numba.config when available, otherwise use numba.cuda.config (NVIDIA#497) - [MNT] Drop NUMBA_CUDA_USE_NVIDIA_BINDING; always use cuda.core and cuda.bindings as fallback (NVIDIA#479) - Vendor in dispatcher, entrypoints, pretty_annotate for CUDA-specific changes (NVIDIA#502) - build: allow parallelization of nvcc testing builds (NVIDIA#521) - chore(dev-deps): add pixi (NVIDIA#505) - Vendor the imputils module for CUDA refactoring (NVIDIA#448) - Don't use `MemoryLeakMixin` for tests that don't use NRT (NVIDIA#519) - Switch back to stable cuDF release in thirdparty tests (NVIDIA#518) - Updating .gitignore with binaries in the `testing` folder (NVIDIA#516) - Remove some unnecessary uses of ContextResettingTestCase (NVIDIA#507) - Vendor in _helperlib cext for CUDA-specific changes (NVIDIA#512) - Vendor in typeconv for future CUDA-specific changes (NVIDIA#499) - [Refactor][NFC] Vendor-in numba.cpython modules for future CUDA-specific changes (NVIDIA#493) - [Refactor][NFC] Vendor-in numba.np modules for future CUDA-specific changes (NVIDIA#494) - Make the CUDA target the default for CUDA overload decorators (NVIDIA#511) - Remove C extension loading hacks (NVIDIA#506) - Ensure NUMBA can manipulate memory from CUDA graphs before the graph is launched (NVIDIA#437) - [Refactor][NFC] Vendor-in core Numba analysis utils for CUDA-specific changes (NVIDIA#433) - Fix Bf16 Test OB Error (NVIDIA#509) - Vendor in components from numba.core.runtime for CUDA-specific changes (NVIDIA#498) - [Refactor] Vendor in _dispatcher, _devicearray, mviewbuf C extension for CUDA-specific customization (NVIDIA#373) - [MNT] Managed UM memset fallback and skip CUDA IPC tests on WSL2 (NVIDIA#488) - Improve debug value range coverage (NVIDIA#461) - Add `compile_all` API (NVIDIA#484) - Vendor in core.registry for CUDA-specific changes (NVIDIA#485) - [Refactor][NFC] Vendor in numba.misc for CUDA-specific changes (NVIDIA#457) - Vendor in optional, boxing for CUDA-specific changes, fix dangling imports (NVIDIA#476) - [test] Remove dependency on cpu_target (NVIDIA#490) - Change dangling imports of numba.core.lowering to numba.cuda.lowering (NVIDIA#475) - [test] Use numpy's tolerance for float16 (NVIDIA#491) - [Refactor][NFC] Vendor-in numba.extending for future CUDA-specific changes (NVIDIA#466) - [Refactor][NFC] Vendor-in more cpython registries for future CUDA-specific changes (NVIDIA#478)

- Add support for cache-hinted load and store operations (#587) - Add more thirdparty tests (#586) - Add sphinx-lint to pre-commit and fix errors (#597) - Add DWARF variant part support for polymorphic variables in CUDA debug info (#544) - chore: clean up dead workaround for unavailable `lru_cache` (#598) - chore(docs): format types docs (#596) - refactor: decouple `Context` from `Stream` and `Event` objects (#579) - Fix freezing in of constant arrays with negative strides (#589) - Update tests to accept variants of generated PTX (#585) - refactor: replace device functionality with `cuda.core` APIs (#581) - Move frontend tests to `cudapy` namespace (#558) - Generalize the concurrency group for main merges (#582) - ci: move pre-commit checks to pre commit action (#577) - chore(pixi): set up doc builds; remove most `build-conda` dependencies (#574) - ci: ensure that python version in ci matches matrix (#575) - Fix the `cuda.is_supported_version()` API (#571) - Fix checks on main (#576) - feat: add `math.nextafter` (#543) - ci: replace conda testing with pixi (#554) - [CI] Run PR workflow on merge to main (#572) - Propose Alternative Module Path for `ext_types` and Maintain `numba.cuda.types.bfloat16` Import API (#569) - test: enable fail-on-warn and clean up resulting failures (#529) - [Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific changes (#565) - Fix registration with Numba, vendor MakeFunctionToJITFunction tests (#566) - [Refactor][NFC][Cleanups] Update imports to upstream numba to use the numba.cuda modules (#561) - test: refactor process-based tests to use concurrent futures in order to simplify tests (#550) - test: revert back to ipc futures that await each iteration (#564) - chore(deps): move to self-contained pixi.toml to avoid mixed-pypi-pixi environments (#551) - [Refactor][NFC] Vendor-in errors for future CUDA-specific changes (#534) - Remove dependencies on target_extension for CUDA target (#555) - Relax the pinning to `cuda-core` to allow it floating across minor releases (#559) - [WIP] Port numpy reduction tests to CUDA (#523) - ci: add timeout to avoid blocking the job queue (#556) - Handle `cuda.core.Stream` in driver operations (#401) - feat: add support for `math.exp2` (#541) - Vendor in types and datamodel for CUDA-specific changes (#533) - refactor: cleanup device constructor (#548) - bench: add cupy to array constructor kernel launch benchmarks (#547) - perf: cache dimension computations (#542) - perf: remove duplicated size computation (#537) - chore(perf): add torch to benchmark (#539) - test: speed up ipc tests by ~6.5x (#527) - perf: speed up kernel launch (#510) - perf: remove context threading in various pointer abstractions (#536) - perf: reduce the number of `__cuda_array_interface__` accesses (#538) - refactor: remove unnecessary custom map and set implementations (#530) - [Refactor][NFC] Vendor-in vectorize decorators for future CUDA-specific changes (#513) - test: add benchmarks for kernel launch for reproducibility (#528) - test(pixi): update pixi testing command to work with the new `testing` directory (#522) - refactor: fully remove `USE_NV_BINDING` (#525) - Draft: Vendor in the IR module (#439) - pyproject.toml: add search path for Pyrefly (#524) - Vendor in numba.core.typing for CUDA-specific changes (#473) - Use numba.config when available, otherwise use numba.cuda.config (#497) - [MNT] Drop NUMBA_CUDA_USE_NVIDIA_BINDING; always use cuda.core and cuda.bindings as fallback (#479) - Vendor in dispatcher, entrypoints, pretty_annotate for CUDA-specific changes (#502) - build: allow parallelization of nvcc testing builds (#521) - chore(dev-deps): add pixi (#505) - Vendor the imputils module for CUDA refactoring (#448) - Don't use `MemoryLeakMixin` for tests that don't use NRT (#519) - Switch back to stable cuDF release in thirdparty tests (#518) - Updating .gitignore with binaries in the `testing` folder (#516) - Remove some unnecessary uses of ContextResettingTestCase (#507) - Vendor in _helperlib cext for CUDA-specific changes (#512) - Vendor in typeconv for future CUDA-specific changes (#499) - [Refactor][NFC] Vendor-in numba.cpython modules for future CUDA-specific changes (#493) - [Refactor][NFC] Vendor-in numba.np modules for future CUDA-specific changes (#494) - Make the CUDA target the default for CUDA overload decorators (#511) - Remove C extension loading hacks (#506) - Ensure NUMBA can manipulate memory from CUDA graphs before the graph is launched (#437) - [Refactor][NFC] Vendor-in core Numba analysis utils for CUDA-specific changes (#433) - Fix Bf16 Test OB Error (#509) - Vendor in components from numba.core.runtime for CUDA-specific changes (#498) - [Refactor] Vendor in _dispatcher, _devicearray, mviewbuf C extension for CUDA-specific customization (#373) - [MNT] Managed UM memset fallback and skip CUDA IPC tests on WSL2 (#488) - Improve debug value range coverage (#461) - Add `compile_all` API (#484) - Vendor in core.registry for CUDA-specific changes (#485) - [Refactor][NFC] Vendor in numba.misc for CUDA-specific changes (#457) - Vendor in optional, boxing for CUDA-specific changes, fix dangling imports (#476) - [test] Remove dependency on cpu_target (#490) - Change dangling imports of numba.core.lowering to numba.cuda.lowering (#475) - [test] Use numpy's tolerance for float16 (#491) - [Refactor][NFC] Vendor-in numba.extending for future CUDA-specific changes (#466) - [Refactor][NFC] Vendor-in more cpython registries for future CUDA-specific changes (#478)

[test] Remove dep on cpu_target

0688e8e

Parts of test_ir_utils was vendored in from numba and repurposed for the cuda target, however it was still using the CPU target. We may not need the CPU target at all; try to remove dependency on it.

ashermancinelli requested review from VijayKandiah, atmnp, brandon-b-miller and rparolin September 25, 2025 20:40

ashermancinelli self-assigned this Sep 25, 2025

ashermancinelli added the 3 - Ready for Review Ready for review by team label Sep 25, 2025

Skip ir utils test on simulator

3462618

VijayKandiah previously approved these changes Sep 25, 2025

View reviewed changes

brandon-b-miller reviewed Sep 26, 2025

View reviewed changes

Remove commented-out line

742e2fc

ashermancinelli dismissed VijayKandiah’s stale review via 742e2fc September 26, 2025 16:43

Merge branch 'main' of github.com:NVIDIA/numba-cuda into ajm/test-ir-…

4c168b6

…utils-cpu-target

ashermancinelli mentioned this pull request Sep 29, 2025

[test] Use numpy's tolerance for float16 #491

Merged

Merge branch 'main' into ajm/test-ir-utils-cpu-target

dd9741b

brandon-b-miller approved these changes Sep 29, 2025

View reviewed changes

brandon-b-miller added 5 - Ready to merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Sep 29, 2025

Merge branch 'main' into ajm/test-ir-utils-cpu-target

4602903

ashermancinelli enabled auto-merge (squash) September 29, 2025 19:17

ashermancinelli merged commit f7903e3 into NVIDIA:main Sep 29, 2025
56 checks passed

gmarkall mentioned this pull request Nov 20, 2025

Bump version to 0.21.0 #602

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[test] Remove dependency on cpu_target #490

[test] Remove dependency on cpu_target #490

Uh oh!

ashermancinelli commented Sep 25, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Sep 25, 2025

Uh oh!

ashermancinelli commented Sep 25, 2025

Uh oh!

ashermancinelli commented Sep 25, 2025

Uh oh!

VijayKandiah commented Sep 25, 2025

Uh oh!

brandon-b-miller Sep 26, 2025

Uh oh!

ashermancinelli Sep 26, 2025

Uh oh!

ashermancinelli Sep 26, 2025 •

edited

Loading

Uh oh!

ashermancinelli commented Sep 26, 2025

Uh oh!

ashermancinelli commented Sep 26, 2025

Uh oh!

ashermancinelli commented Sep 29, 2025

Uh oh!

ashermancinelli commented Sep 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[test] Remove dependency on cpu_target #490

[test] Remove dependency on cpu_target #490

Uh oh!

Conversation

ashermancinelli commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Sep 25, 2025

Uh oh!

ashermancinelli commented Sep 25, 2025

Uh oh!

ashermancinelli commented Sep 25, 2025

Uh oh!

VijayKandiah commented Sep 25, 2025

Uh oh!

brandon-b-miller Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

ashermancinelli Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

ashermancinelli Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ashermancinelli commented Sep 26, 2025

Uh oh!

ashermancinelli commented Sep 26, 2025

Uh oh!

ashermancinelli commented Sep 29, 2025

Uh oh!

ashermancinelli commented Sep 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ashermancinelli commented Sep 25, 2025 •

edited

Loading

ashermancinelli Sep 26, 2025 •

edited

Loading