Remove dependencies on target_extension for CUDA target #555

VijayKandiah · 2025-10-27T16:03:06Z

This PR removes dependency on numba.core.target_extension. This import was primarily used to get the local target which is CUDA in our case unless the CPU dispatcher is used. There are a couple numba.core APIs that I had to monkey patch in descriptor.py to handle our default being CUDA target: numba.core.utils.order_by_target_specificity, and numba.core.target_extension.get_local_target. @gmarkall please take a look and let me know if there's a way around doing this sort of monkey patching.

…yping registry

copy-pr-bot · 2025-10-27T16:03:11Z

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

VijayKandiah · 2025-10-27T16:03:16Z

/ok to test

VijayKandiah · 2025-10-27T22:46:41Z

/ok to test

gmarkall · 2025-10-28T17:47:37Z

@gmarkall please take a look and let me know if there's a way around doing this sort of monkey patching.

Having a look at this now.

I think there's a mixup between the CUDATarget class and the cuda_target instance somewhere, which is confusing Numba's order_by_target_specificity and maybe driving the need for a monkey patch, but I'm still a little fuzzy about what's going on here. Will update once I've worked out a bit more.

gmarkall · 2025-10-28T17:47:56Z

(Other than that, everything looks good with the PR)

The Numba monkey patch was required to workaround some previous changes, but is likely to break Numba for other external targets. This commit removes the monkey patch and restores functionality without a hard dependency on `numba.core.target_extension` with two changes: - The target override to set the target to CUDA is required. If this is not set, Numba always assumes the CPU target, and fails to compile the implementations of various operations. We override the target conditionally based on the presence of Numba. - The `target_context.target` should not be an instance of the `CUDATarget` class, but instead a `numba.core.target_extension.Target` class. This class is separate from other target classes, and is used to evaluate the hierarchy of target implementations. If Numba is present, then we set the `target` member of the target context. Otherwise, it is unused / un-needed in Numba-CUDA.

It is not used - this makes the difference between this version and the upstream more stark, and should prevent them being seen as interchangeable.

…arget_extension

VijayKandiah · 2025-10-29T18:23:05Z

/ok to test

PR NVIDIA#555 removed dependencies on the target extension, but it also removed registration in Numba's target registry. This prevents handling of closures, because closure handling uses a process that is equivalent to wrapping closure functions in a `@jit` decorator, where the appropriate `@jit` decorator for the target is looked up in the jit registry. This commit restores the registration when Numba is present, and vendors / ports `test_make_function_to_jit_function`. This ensures that the target registration continues to work, and also exercises the `MakeFunctionToJITFunction` pass, which previously had no effect on any code in the test suite. In order to preserve the form of the original tests, the original test code has the `@njit` decorators replaced with a wrapper function that generates an appropriate kernel, temporary storage, and launch code. Some changes had to be made to avoid refcounting, where arrays would have been used. These have been replaced by (in different cases): - Summation of all the values that would have been array elements, or - Carefully constructing tuples to return and then packing / unpacking them at function boundaries.

…566) PR #555 removed dependencies on the target extension, but it also removed registration in Numba's target registry. This prevents handling of closures, because closure handling uses a process that is equivalent to wrapping closure functions in a `@jit` decorator, where the appropriate `@jit` decorator for the target is looked up in the jit registry. This commit restores the registration when Numba is present, and vendors / ports `test_make_function_to_jit_function`. This ensures that the target registration continues to work, and also exercises the `MakeFunctionToJITFunction` pass, which previously had no effect on any code in the test suite. In order to preserve the form of the original tests, the original test code has the `@njit` decorators replaced with a wrapper function that generates an appropriate kernel, temporary storage, and launch code. Some changes had to be made to avoid refcounting, where arrays would have been used. These have been replaced by (in different cases): - Summation of all the values that would have been array elements, or - Carefully constructing tuples to return and then packing / unpacking them at function boundaries.

- Add support for cache-hinted load and store operations (NVIDIA#587) - Add more thirdparty tests (NVIDIA#586) - Add sphinx-lint to pre-commit and fix errors (NVIDIA#597) - Add DWARF variant part support for polymorphic variables in CUDA debug info (NVIDIA#544) - chore: clean up dead workaround for unavailable `lru_cache` (NVIDIA#598) - chore(docs): format types docs (NVIDIA#596) - refactor: decouple `Context` from `Stream` and `Event` objects (NVIDIA#579) - Fix freezing in of constant arrays with negative strides (NVIDIA#589) - Update tests to accept variants of generated PTX (NVIDIA#585) - refactor: replace device functionality with `cuda.core` APIs (NVIDIA#581) - Move frontend tests to `cudapy` namespace (NVIDIA#558) - Generalize the concurrency group for main merges (NVIDIA#582) - ci: move pre-commit checks to pre commit action (NVIDIA#577) - chore(pixi): set up doc builds; remove most `build-conda` dependencies (NVIDIA#574) - ci: ensure that python version in ci matches matrix (NVIDIA#575) - Fix the `cuda.is_supported_version()` API (NVIDIA#571) - Fix checks on main (NVIDIA#576) - feat: add `math.nextafter` (NVIDIA#543) - ci: replace conda testing with pixi (NVIDIA#554) - [CI] Run PR workflow on merge to main (NVIDIA#572) - Propose Alternative Module Path for `ext_types` and Maintain `numba.cuda.types.bfloat16` Import API (NVIDIA#569) - test: enable fail-on-warn and clean up resulting failures (NVIDIA#529) - [Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific changes (NVIDIA#565) - Fix registration with Numba, vendor MakeFunctionToJITFunction tests (NVIDIA#566) - [Refactor][NFC][Cleanups] Update imports to upstream numba to use the numba.cuda modules (NVIDIA#561) - test: refactor process-based tests to use concurrent futures in order to simplify tests (NVIDIA#550) - test: revert back to ipc futures that await each iteration (NVIDIA#564) - chore(deps): move to self-contained pixi.toml to avoid mixed-pypi-pixi environments (NVIDIA#551) - [Refactor][NFC] Vendor-in errors for future CUDA-specific changes (NVIDIA#534) - Remove dependencies on target_extension for CUDA target (NVIDIA#555) - Relax the pinning to `cuda-core` to allow it floating across minor releases (NVIDIA#559) - [WIP] Port numpy reduction tests to CUDA (NVIDIA#523) - ci: add timeout to avoid blocking the job queue (NVIDIA#556) - Handle `cuda.core.Stream` in driver operations (NVIDIA#401) - feat: add support for `math.exp2` (NVIDIA#541) - Vendor in types and datamodel for CUDA-specific changes (NVIDIA#533) - refactor: cleanup device constructor (NVIDIA#548) - bench: add cupy to array constructor kernel launch benchmarks (NVIDIA#547) - perf: cache dimension computations (NVIDIA#542) - perf: remove duplicated size computation (NVIDIA#537) - chore(perf): add torch to benchmark (NVIDIA#539) - test: speed up ipc tests by ~6.5x (NVIDIA#527) - perf: speed up kernel launch (NVIDIA#510) - perf: remove context threading in various pointer abstractions (NVIDIA#536) - perf: reduce the number of `__cuda_array_interface__` accesses (NVIDIA#538) - refactor: remove unnecessary custom map and set implementations (NVIDIA#530) - [Refactor][NFC] Vendor-in vectorize decorators for future CUDA-specific changes (NVIDIA#513) - test: add benchmarks for kernel launch for reproducibility (NVIDIA#528) - test(pixi): update pixi testing command to work with the new `testing` directory (NVIDIA#522) - refactor: fully remove `USE_NV_BINDING` (NVIDIA#525) - Draft: Vendor in the IR module (NVIDIA#439) - pyproject.toml: add search path for Pyrefly (NVIDIA#524) - Vendor in numba.core.typing for CUDA-specific changes (NVIDIA#473) - Use numba.config when available, otherwise use numba.cuda.config (NVIDIA#497) - [MNT] Drop NUMBA_CUDA_USE_NVIDIA_BINDING; always use cuda.core and cuda.bindings as fallback (NVIDIA#479) - Vendor in dispatcher, entrypoints, pretty_annotate for CUDA-specific changes (NVIDIA#502) - build: allow parallelization of nvcc testing builds (NVIDIA#521) - chore(dev-deps): add pixi (NVIDIA#505) - Vendor the imputils module for CUDA refactoring (NVIDIA#448) - Don't use `MemoryLeakMixin` for tests that don't use NRT (NVIDIA#519) - Switch back to stable cuDF release in thirdparty tests (NVIDIA#518) - Updating .gitignore with binaries in the `testing` folder (NVIDIA#516) - Remove some unnecessary uses of ContextResettingTestCase (NVIDIA#507) - Vendor in _helperlib cext for CUDA-specific changes (NVIDIA#512) - Vendor in typeconv for future CUDA-specific changes (NVIDIA#499) - [Refactor][NFC] Vendor-in numba.cpython modules for future CUDA-specific changes (NVIDIA#493) - [Refactor][NFC] Vendor-in numba.np modules for future CUDA-specific changes (NVIDIA#494) - Make the CUDA target the default for CUDA overload decorators (NVIDIA#511) - Remove C extension loading hacks (NVIDIA#506) - Ensure NUMBA can manipulate memory from CUDA graphs before the graph is launched (NVIDIA#437) - [Refactor][NFC] Vendor-in core Numba analysis utils for CUDA-specific changes (NVIDIA#433) - Fix Bf16 Test OB Error (NVIDIA#509) - Vendor in components from numba.core.runtime for CUDA-specific changes (NVIDIA#498) - [Refactor] Vendor in _dispatcher, _devicearray, mviewbuf C extension for CUDA-specific customization (NVIDIA#373) - [MNT] Managed UM memset fallback and skip CUDA IPC tests on WSL2 (NVIDIA#488) - Improve debug value range coverage (NVIDIA#461) - Add `compile_all` API (NVIDIA#484) - Vendor in core.registry for CUDA-specific changes (NVIDIA#485) - [Refactor][NFC] Vendor in numba.misc for CUDA-specific changes (NVIDIA#457) - Vendor in optional, boxing for CUDA-specific changes, fix dangling imports (NVIDIA#476) - [test] Remove dependency on cpu_target (NVIDIA#490) - Change dangling imports of numba.core.lowering to numba.cuda.lowering (NVIDIA#475) - [test] Use numpy's tolerance for float16 (NVIDIA#491) - [Refactor][NFC] Vendor-in numba.extending for future CUDA-specific changes (NVIDIA#466) - [Refactor][NFC] Vendor-in more cpython registries for future CUDA-specific changes (NVIDIA#478)

- Add support for cache-hinted load and store operations (#587) - Add more thirdparty tests (#586) - Add sphinx-lint to pre-commit and fix errors (#597) - Add DWARF variant part support for polymorphic variables in CUDA debug info (#544) - chore: clean up dead workaround for unavailable `lru_cache` (#598) - chore(docs): format types docs (#596) - refactor: decouple `Context` from `Stream` and `Event` objects (#579) - Fix freezing in of constant arrays with negative strides (#589) - Update tests to accept variants of generated PTX (#585) - refactor: replace device functionality with `cuda.core` APIs (#581) - Move frontend tests to `cudapy` namespace (#558) - Generalize the concurrency group for main merges (#582) - ci: move pre-commit checks to pre commit action (#577) - chore(pixi): set up doc builds; remove most `build-conda` dependencies (#574) - ci: ensure that python version in ci matches matrix (#575) - Fix the `cuda.is_supported_version()` API (#571) - Fix checks on main (#576) - feat: add `math.nextafter` (#543) - ci: replace conda testing with pixi (#554) - [CI] Run PR workflow on merge to main (#572) - Propose Alternative Module Path for `ext_types` and Maintain `numba.cuda.types.bfloat16` Import API (#569) - test: enable fail-on-warn and clean up resulting failures (#529) - [Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific changes (#565) - Fix registration with Numba, vendor MakeFunctionToJITFunction tests (#566) - [Refactor][NFC][Cleanups] Update imports to upstream numba to use the numba.cuda modules (#561) - test: refactor process-based tests to use concurrent futures in order to simplify tests (#550) - test: revert back to ipc futures that await each iteration (#564) - chore(deps): move to self-contained pixi.toml to avoid mixed-pypi-pixi environments (#551) - [Refactor][NFC] Vendor-in errors for future CUDA-specific changes (#534) - Remove dependencies on target_extension for CUDA target (#555) - Relax the pinning to `cuda-core` to allow it floating across minor releases (#559) - [WIP] Port numpy reduction tests to CUDA (#523) - ci: add timeout to avoid blocking the job queue (#556) - Handle `cuda.core.Stream` in driver operations (#401) - feat: add support for `math.exp2` (#541) - Vendor in types and datamodel for CUDA-specific changes (#533) - refactor: cleanup device constructor (#548) - bench: add cupy to array constructor kernel launch benchmarks (#547) - perf: cache dimension computations (#542) - perf: remove duplicated size computation (#537) - chore(perf): add torch to benchmark (#539) - test: speed up ipc tests by ~6.5x (#527) - perf: speed up kernel launch (#510) - perf: remove context threading in various pointer abstractions (#536) - perf: reduce the number of `__cuda_array_interface__` accesses (#538) - refactor: remove unnecessary custom map and set implementations (#530) - [Refactor][NFC] Vendor-in vectorize decorators for future CUDA-specific changes (#513) - test: add benchmarks for kernel launch for reproducibility (#528) - test(pixi): update pixi testing command to work with the new `testing` directory (#522) - refactor: fully remove `USE_NV_BINDING` (#525) - Draft: Vendor in the IR module (#439) - pyproject.toml: add search path for Pyrefly (#524) - Vendor in numba.core.typing for CUDA-specific changes (#473) - Use numba.config when available, otherwise use numba.cuda.config (#497) - [MNT] Drop NUMBA_CUDA_USE_NVIDIA_BINDING; always use cuda.core and cuda.bindings as fallback (#479) - Vendor in dispatcher, entrypoints, pretty_annotate for CUDA-specific changes (#502) - build: allow parallelization of nvcc testing builds (#521) - chore(dev-deps): add pixi (#505) - Vendor the imputils module for CUDA refactoring (#448) - Don't use `MemoryLeakMixin` for tests that don't use NRT (#519) - Switch back to stable cuDF release in thirdparty tests (#518) - Updating .gitignore with binaries in the `testing` folder (#516) - Remove some unnecessary uses of ContextResettingTestCase (#507) - Vendor in _helperlib cext for CUDA-specific changes (#512) - Vendor in typeconv for future CUDA-specific changes (#499) - [Refactor][NFC] Vendor-in numba.cpython modules for future CUDA-specific changes (#493) - [Refactor][NFC] Vendor-in numba.np modules for future CUDA-specific changes (#494) - Make the CUDA target the default for CUDA overload decorators (#511) - Remove C extension loading hacks (#506) - Ensure NUMBA can manipulate memory from CUDA graphs before the graph is launched (#437) - [Refactor][NFC] Vendor-in core Numba analysis utils for CUDA-specific changes (#433) - Fix Bf16 Test OB Error (#509) - Vendor in components from numba.core.runtime for CUDA-specific changes (#498) - [Refactor] Vendor in _dispatcher, _devicearray, mviewbuf C extension for CUDA-specific customization (#373) - [MNT] Managed UM memset fallback and skip CUDA IPC tests on WSL2 (#488) - Improve debug value range coverage (#461) - Add `compile_all` API (#484) - Vendor in core.registry for CUDA-specific changes (#485) - [Refactor][NFC] Vendor in numba.misc for CUDA-specific changes (#457) - Vendor in optional, boxing for CUDA-specific changes, fix dangling imports (#476) - [test] Remove dependency on cpu_target (#490) - Change dangling imports of numba.core.lowering to numba.cuda.lowering (#475) - [test] Use numpy's tolerance for float16 (#491) - [Refactor][NFC] Vendor-in numba.extending for future CUDA-specific changes (#466) - [Refactor][NFC] Vendor-in more cpython registries for future CUDA-specific changes (#478)

VijayKandiah and others added 19 commits October 16, 2025 14:41

Vendor in types, datamodel, target_extension for CUDA-specific changes

c4e6c4a

Merge 'main' into vk/types

8095d5e

Vendor in generators and removerefctpass for CUDA-specific changes

f5c5552

Fix types imports in nrt

6722536

Add type-mapping for Records

244a38e

Remove duplicate registry declarations in typing submodules

d87d44e

Merge 'main' into vk/types

528b6d0

Revert target_extension vendoring

00ec227

Redirect to numba.types and numba.core.datamodel if they are available

0b6601c

Merge branch 'main' into vk/types

b9e9109

Remove mapping between numba and numba-cuda types

cc80df3

Fix filtering for registering third-party declarations from numba's t…

d7ac874

…yping registry

Remove dependency on numba.core.target_extension for CUDATarget

dc4f660

Delete module-based package redirectors

e956b92

Move cuda_types and cuda_datamodel back to types and datamodel

b9106f6

Redirect types and datamodel modules

3827cee

Merge branch 'main' into vk/types

70b4167

Merge branch 'vk/types' into vk/target_extension

fa3331f

Merge 'main' into vk/target_extension

798dfde

VijayKandiah requested a review from gmarkall October 27, 2025 16:03

VijayKandiah self-assigned this Oct 27, 2025

VijayKandiah added the 3 - Ready for Review Ready for review by team label Oct 27, 2025

Merge branch 'main' into vk/target_extension

3f27911

gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 3 - Ready for Review Ready for review by team labels Oct 28, 2025

gmarkall and others added 4 commits October 29, 2025 16:31

Don't pass target into order_by_target_specificity()

d3e1677

It is not used - this makes the difference between this version and the upstream more stark, and should prevent them being seen as interchangeable.

Merge branch 'main' into vk/target_extension

f2be476

Merge remote-tracking branch 'gmarkall/vk/target_extension' into vk/t…

911b178

…arget_extension

gmarkall approved these changes Oct 30, 2025

View reviewed changes

gmarkall merged commit 5aeb63c into NVIDIA:main Oct 30, 2025
70 checks passed

gmarkall mentioned this pull request Oct 31, 2025

Fix registration with Numba, vendor MakeFunctionToJITFunction tests #566

Merged

gmarkall mentioned this pull request Nov 20, 2025

Bump version to 0.21.0 #602

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove dependencies on target_extension for CUDA target #555

Remove dependencies on target_extension for CUDA target #555

Uh oh!

VijayKandiah commented Oct 27, 2025

Uh oh!

copy-pr-bot bot commented Oct 27, 2025

Uh oh!

VijayKandiah commented Oct 27, 2025

Uh oh!

VijayKandiah commented Oct 27, 2025

Uh oh!

gmarkall commented Oct 28, 2025

Uh oh!

gmarkall commented Oct 28, 2025

Uh oh!

VijayKandiah commented Oct 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Remove dependencies on target_extension for CUDA target #555

Remove dependencies on target_extension for CUDA target #555

Uh oh!

Conversation

VijayKandiah commented Oct 27, 2025

Uh oh!

copy-pr-bot bot commented Oct 27, 2025

Uh oh!

VijayKandiah commented Oct 27, 2025

Uh oh!

VijayKandiah commented Oct 27, 2025

Uh oh!

gmarkall commented Oct 28, 2025

Uh oh!

gmarkall commented Oct 28, 2025

Uh oh!

VijayKandiah commented Oct 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants