[Refactor][NFC][Cleanups] Update imports to upstream numba to use the numba.cuda modules by atmnp · Pull Request #561 · NVIDIA/numba-cuda

atmnp · 2025-10-29T17:57:27Z

This PR updates the vast majority of imports to upstream numba modules, that had already been vendored in for future CUDA-specific changes. After this PR, there should be exactly 3 (not strictly necessary) upstream numba imports remaining - one in testing vectorization for in test_math, one in random.py, and the global compiler_lock.

It seems from the existing comments that removing the CPU jit in random.py would result in considerable performance drops, so I've just added a guard for numba being available.

There are modules such as numba.core.misc.{mergesort,quicksort} that had not yet been vendored in, so this PR also vendors those modules in. They had to be modified to not use the CPU jit, since these are not necessarily supported within numba-cuda. It's still unclear to me if this is necessary or is done correctly, since we don't have tests for these yet. My impression is that these are to support jitting of mergesort/quicksort on arrays, which I am fairly confident we cannot support on the CUDA target until more development is done on the Numba-CUDA runtime. I am unsure if we can remove these uses but I would prefer if we did.

There are many tests that rely on CPU jitting, these tests are now guarded to not run in environments where the numba package is not available. This is done using a global importlib check in numba-cuda's __init__.py. This cannot be moved into numba.cuda.config since the upstream numba doesn't have this flag, and we intend on forwarding calls to upstream numba when it is available. All previously sporadic guards for this should have been updated to use this global flag now.

Support for prange, pndindex, and stencils has been removed, likewise for typed.Dict, and typed.List. There was a pass that relied on this, which was the MakeFunctionIntoJitFunction pass, which implicitly converts functions to jitted functions, which is also a use case that is not currently tested. Also, this removes all mention of the legacy type system flag since we implicitly assume the legacy type system in place.

copy-pr-bot · 2025-10-29T17:57:31Z

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

copy-pr-bot · 2025-10-29T17:57:38Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

atmnp · 2025-10-29T17:57:58Z

/ok to test

atmnp · 2025-10-29T18:07:56Z

/ok to test

…cific changes

….misc

…mba.cuda.errors

…nd numba not being alongside Numba-CUDA

…on function; update imports to use numba.cuda (berzerk cleaning of imports, introduced # compat-ignore as a way for my scripts to ignore necessary numba imports)

…t a use case we intend to support

…_cuda reasons (not needed), cleaning up some imports

atmnp · 2025-10-30T17:43:43Z

/ok to test

atmnp · 2025-10-30T18:11:31Z

/ok to test

numba_cuda/numba/cuda/__init__.py

gmarkall

We can't delete the MakeFunctionToJitFunction pass - it is load-bearing, but our testsuite has insufficient coverage (as noted in previous situations).

An example that removing the pass breaks is:

from numba import cuda
import numpy as np

N = 10

def factory(consumer_func):
    @cuda.jit
    def func():
        def inner():
            return N
        return consumer_func(inner)
    return func


@cuda.jit
def consumer(func):
    return func()

jitted_func = factory(consumer)

@cuda.jit
def kernel(out):
    out[0] = jitted_func()


out = np.zeros(1, dtype=np.uint64)
kernel[1, 1](out)
assert out[0] == N

which runs fine on main, and here results in:

ailed in cuda mode pipeline (step: nopython frontend)
Invalid use of Literal[Expr](make_function(name=None, code=<code object inner at 0x79a49fae2800, file "/home/gmarkall/numbadev/issues/numba-cuda-561/example.py", line 9>, closure=None, defaults=None)) with parameters ()
No type info available for Literal[Expr](make_function(name=None, code=<code object inner at 0x79a49fae2800, file "/home/gmarkall/numbadev/issues/numba-cuda-561/example.py", line 9>, closure=None, defaults=None)) as a callable.
During: resolving callee type: Literal[Expr](make_function(name=None, code=<code object inner at 0x79a49fae2800, file "/home/gmarkall/numbadev/issues/numba-cuda-561/example.py", line 9>, closure=None, defaults=None))
During: typing of call at /home/gmarkall/numbadev/issues/numba-cuda-561/example.py (17)


File "example.py", line 17:
def consumer(func):
    return func()
    ^

During: Pass nopython_type_inference
During: resolving callee type: type(CUDADispatcher(<function consumer at 0x79a49fb03240>))
During: typing of call at /home/gmarkall/numbadev/issues/numba-cuda-561/example.py (11)


File "example.py", line 11:
        def inner():
            <source elided>
            return N
        return consumer_func(inner)
        ^

During: Pass nopython_type_inference
During: resolving callee type: type(CUDADispatcher(<function factory.<locals>.func at 0x79a3e1e44860>))
During: typing of call at /home/gmarkall/numbadev/issues/numba-cuda-561/example.py (23)


File "example.py", line 23:
def kernel(out):
    out[0] = jitted_func()
    ^

During: Pass nopython_type_inference

there is a test, test_make_function_to_jitted_function, that I have half-completed porting to Numba-CUDA, and will add in another PR.

numba_cuda/numba/cuda/core/errors.py

numba_cuda/numba/cuda/core/ir.py

numba_cuda/numba/cuda/core/untyped_passes.py

numba_cuda/numba/cuda/random.py

numba_cuda/numba/cuda/tests/core/serialize_usecases.py

numba_cuda/numba/cuda/tests/core/test_serialize.py

numba_cuda/numba/cuda/typing/templates.py

…njit, not a use case we intend to support" This reverts commit 0a60d27.

… the directory chain

gmarkall · 2025-10-31T17:54:45Z

Whilst looking at the impact of moving MakeFunctionToJitFunction, I noticed that the following:

from numba import cuda
 # ...
kernel[1, 1]((1, 2, 3))

cuda.synchronize()

results in

...
  File "/home/gmarkall/numbadev/numba/numba/core/typing/templates.py", line 749, in _get_jit_decorator
    jitter = jit_registry[target_hw]
             ~~~~~~~~~~~~^^^^^^^^^^^
  File "/home/gmarkall/numbadev/numba/numba/core/registry.py", line 71, in __getitem__
    return super(DelayedRegistry, self).__getitem__(item)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: <class 'numba.core.target_extension.CUDA'>

so there's a bug impacting some of the code I believe this PR touches that needs fixing. I'm looking into a fix for this.

Fix for the target registration is in #566

gmarkall · 2025-10-31T18:38:58Z

/ok to test

atmnp · 2025-10-31T18:48:34Z

/ok to test

- Add support for cache-hinted load and store operations (NVIDIA#587) - Add more thirdparty tests (NVIDIA#586) - Add sphinx-lint to pre-commit and fix errors (NVIDIA#597) - Add DWARF variant part support for polymorphic variables in CUDA debug info (NVIDIA#544) - chore: clean up dead workaround for unavailable `lru_cache` (NVIDIA#598) - chore(docs): format types docs (NVIDIA#596) - refactor: decouple `Context` from `Stream` and `Event` objects (NVIDIA#579) - Fix freezing in of constant arrays with negative strides (NVIDIA#589) - Update tests to accept variants of generated PTX (NVIDIA#585) - refactor: replace device functionality with `cuda.core` APIs (NVIDIA#581) - Move frontend tests to `cudapy` namespace (NVIDIA#558) - Generalize the concurrency group for main merges (NVIDIA#582) - ci: move pre-commit checks to pre commit action (NVIDIA#577) - chore(pixi): set up doc builds; remove most `build-conda` dependencies (NVIDIA#574) - ci: ensure that python version in ci matches matrix (NVIDIA#575) - Fix the `cuda.is_supported_version()` API (NVIDIA#571) - Fix checks on main (NVIDIA#576) - feat: add `math.nextafter` (NVIDIA#543) - ci: replace conda testing with pixi (NVIDIA#554) - [CI] Run PR workflow on merge to main (NVIDIA#572) - Propose Alternative Module Path for `ext_types` and Maintain `numba.cuda.types.bfloat16` Import API (NVIDIA#569) - test: enable fail-on-warn and clean up resulting failures (NVIDIA#529) - [Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific changes (NVIDIA#565) - Fix registration with Numba, vendor MakeFunctionToJITFunction tests (NVIDIA#566) - [Refactor][NFC][Cleanups] Update imports to upstream numba to use the numba.cuda modules (NVIDIA#561) - test: refactor process-based tests to use concurrent futures in order to simplify tests (NVIDIA#550) - test: revert back to ipc futures that await each iteration (NVIDIA#564) - chore(deps): move to self-contained pixi.toml to avoid mixed-pypi-pixi environments (NVIDIA#551) - [Refactor][NFC] Vendor-in errors for future CUDA-specific changes (NVIDIA#534) - Remove dependencies on target_extension for CUDA target (NVIDIA#555) - Relax the pinning to `cuda-core` to allow it floating across minor releases (NVIDIA#559) - [WIP] Port numpy reduction tests to CUDA (NVIDIA#523) - ci: add timeout to avoid blocking the job queue (NVIDIA#556) - Handle `cuda.core.Stream` in driver operations (NVIDIA#401) - feat: add support for `math.exp2` (NVIDIA#541) - Vendor in types and datamodel for CUDA-specific changes (NVIDIA#533) - refactor: cleanup device constructor (NVIDIA#548) - bench: add cupy to array constructor kernel launch benchmarks (NVIDIA#547) - perf: cache dimension computations (NVIDIA#542) - perf: remove duplicated size computation (NVIDIA#537) - chore(perf): add torch to benchmark (NVIDIA#539) - test: speed up ipc tests by ~6.5x (NVIDIA#527) - perf: speed up kernel launch (NVIDIA#510) - perf: remove context threading in various pointer abstractions (NVIDIA#536) - perf: reduce the number of `__cuda_array_interface__` accesses (NVIDIA#538) - refactor: remove unnecessary custom map and set implementations (NVIDIA#530) - [Refactor][NFC] Vendor-in vectorize decorators for future CUDA-specific changes (NVIDIA#513) - test: add benchmarks for kernel launch for reproducibility (NVIDIA#528) - test(pixi): update pixi testing command to work with the new `testing` directory (NVIDIA#522) - refactor: fully remove `USE_NV_BINDING` (NVIDIA#525) - Draft: Vendor in the IR module (NVIDIA#439) - pyproject.toml: add search path for Pyrefly (NVIDIA#524) - Vendor in numba.core.typing for CUDA-specific changes (NVIDIA#473) - Use numba.config when available, otherwise use numba.cuda.config (NVIDIA#497) - [MNT] Drop NUMBA_CUDA_USE_NVIDIA_BINDING; always use cuda.core and cuda.bindings as fallback (NVIDIA#479) - Vendor in dispatcher, entrypoints, pretty_annotate for CUDA-specific changes (NVIDIA#502) - build: allow parallelization of nvcc testing builds (NVIDIA#521) - chore(dev-deps): add pixi (NVIDIA#505) - Vendor the imputils module for CUDA refactoring (NVIDIA#448) - Don't use `MemoryLeakMixin` for tests that don't use NRT (NVIDIA#519) - Switch back to stable cuDF release in thirdparty tests (NVIDIA#518) - Updating .gitignore with binaries in the `testing` folder (NVIDIA#516) - Remove some unnecessary uses of ContextResettingTestCase (NVIDIA#507) - Vendor in _helperlib cext for CUDA-specific changes (NVIDIA#512) - Vendor in typeconv for future CUDA-specific changes (NVIDIA#499) - [Refactor][NFC] Vendor-in numba.cpython modules for future CUDA-specific changes (NVIDIA#493) - [Refactor][NFC] Vendor-in numba.np modules for future CUDA-specific changes (NVIDIA#494) - Make the CUDA target the default for CUDA overload decorators (NVIDIA#511) - Remove C extension loading hacks (NVIDIA#506) - Ensure NUMBA can manipulate memory from CUDA graphs before the graph is launched (NVIDIA#437) - [Refactor][NFC] Vendor-in core Numba analysis utils for CUDA-specific changes (NVIDIA#433) - Fix Bf16 Test OB Error (NVIDIA#509) - Vendor in components from numba.core.runtime for CUDA-specific changes (NVIDIA#498) - [Refactor] Vendor in _dispatcher, _devicearray, mviewbuf C extension for CUDA-specific customization (NVIDIA#373) - [MNT] Managed UM memset fallback and skip CUDA IPC tests on WSL2 (NVIDIA#488) - Improve debug value range coverage (NVIDIA#461) - Add `compile_all` API (NVIDIA#484) - Vendor in core.registry for CUDA-specific changes (NVIDIA#485) - [Refactor][NFC] Vendor in numba.misc for CUDA-specific changes (NVIDIA#457) - Vendor in optional, boxing for CUDA-specific changes, fix dangling imports (NVIDIA#476) - [test] Remove dependency on cpu_target (NVIDIA#490) - Change dangling imports of numba.core.lowering to numba.cuda.lowering (NVIDIA#475) - [test] Use numpy's tolerance for float16 (NVIDIA#491) - [Refactor][NFC] Vendor-in numba.extending for future CUDA-specific changes (NVIDIA#466) - [Refactor][NFC] Vendor-in more cpython registries for future CUDA-specific changes (NVIDIA#478)

- Add support for cache-hinted load and store operations (#587) - Add more thirdparty tests (#586) - Add sphinx-lint to pre-commit and fix errors (#597) - Add DWARF variant part support for polymorphic variables in CUDA debug info (#544) - chore: clean up dead workaround for unavailable `lru_cache` (#598) - chore(docs): format types docs (#596) - refactor: decouple `Context` from `Stream` and `Event` objects (#579) - Fix freezing in of constant arrays with negative strides (#589) - Update tests to accept variants of generated PTX (#585) - refactor: replace device functionality with `cuda.core` APIs (#581) - Move frontend tests to `cudapy` namespace (#558) - Generalize the concurrency group for main merges (#582) - ci: move pre-commit checks to pre commit action (#577) - chore(pixi): set up doc builds; remove most `build-conda` dependencies (#574) - ci: ensure that python version in ci matches matrix (#575) - Fix the `cuda.is_supported_version()` API (#571) - Fix checks on main (#576) - feat: add `math.nextafter` (#543) - ci: replace conda testing with pixi (#554) - [CI] Run PR workflow on merge to main (#572) - Propose Alternative Module Path for `ext_types` and Maintain `numba.cuda.types.bfloat16` Import API (#569) - test: enable fail-on-warn and clean up resulting failures (#529) - [Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific changes (#565) - Fix registration with Numba, vendor MakeFunctionToJITFunction tests (#566) - [Refactor][NFC][Cleanups] Update imports to upstream numba to use the numba.cuda modules (#561) - test: refactor process-based tests to use concurrent futures in order to simplify tests (#550) - test: revert back to ipc futures that await each iteration (#564) - chore(deps): move to self-contained pixi.toml to avoid mixed-pypi-pixi environments (#551) - [Refactor][NFC] Vendor-in errors for future CUDA-specific changes (#534) - Remove dependencies on target_extension for CUDA target (#555) - Relax the pinning to `cuda-core` to allow it floating across minor releases (#559) - [WIP] Port numpy reduction tests to CUDA (#523) - ci: add timeout to avoid blocking the job queue (#556) - Handle `cuda.core.Stream` in driver operations (#401) - feat: add support for `math.exp2` (#541) - Vendor in types and datamodel for CUDA-specific changes (#533) - refactor: cleanup device constructor (#548) - bench: add cupy to array constructor kernel launch benchmarks (#547) - perf: cache dimension computations (#542) - perf: remove duplicated size computation (#537) - chore(perf): add torch to benchmark (#539) - test: speed up ipc tests by ~6.5x (#527) - perf: speed up kernel launch (#510) - perf: remove context threading in various pointer abstractions (#536) - perf: reduce the number of `__cuda_array_interface__` accesses (#538) - refactor: remove unnecessary custom map and set implementations (#530) - [Refactor][NFC] Vendor-in vectorize decorators for future CUDA-specific changes (#513) - test: add benchmarks for kernel launch for reproducibility (#528) - test(pixi): update pixi testing command to work with the new `testing` directory (#522) - refactor: fully remove `USE_NV_BINDING` (#525) - Draft: Vendor in the IR module (#439) - pyproject.toml: add search path for Pyrefly (#524) - Vendor in numba.core.typing for CUDA-specific changes (#473) - Use numba.config when available, otherwise use numba.cuda.config (#497) - [MNT] Drop NUMBA_CUDA_USE_NVIDIA_BINDING; always use cuda.core and cuda.bindings as fallback (#479) - Vendor in dispatcher, entrypoints, pretty_annotate for CUDA-specific changes (#502) - build: allow parallelization of nvcc testing builds (#521) - chore(dev-deps): add pixi (#505) - Vendor the imputils module for CUDA refactoring (#448) - Don't use `MemoryLeakMixin` for tests that don't use NRT (#519) - Switch back to stable cuDF release in thirdparty tests (#518) - Updating .gitignore with binaries in the `testing` folder (#516) - Remove some unnecessary uses of ContextResettingTestCase (#507) - Vendor in _helperlib cext for CUDA-specific changes (#512) - Vendor in typeconv for future CUDA-specific changes (#499) - [Refactor][NFC] Vendor-in numba.cpython modules for future CUDA-specific changes (#493) - [Refactor][NFC] Vendor-in numba.np modules for future CUDA-specific changes (#494) - Make the CUDA target the default for CUDA overload decorators (#511) - Remove C extension loading hacks (#506) - Ensure NUMBA can manipulate memory from CUDA graphs before the graph is launched (#437) - [Refactor][NFC] Vendor-in core Numba analysis utils for CUDA-specific changes (#433) - Fix Bf16 Test OB Error (#509) - Vendor in components from numba.core.runtime for CUDA-specific changes (#498) - [Refactor] Vendor in _dispatcher, _devicearray, mviewbuf C extension for CUDA-specific customization (#373) - [MNT] Managed UM memset fallback and skip CUDA IPC tests on WSL2 (#488) - Improve debug value range coverage (#461) - Add `compile_all` API (#484) - Vendor in core.registry for CUDA-specific changes (#485) - [Refactor][NFC] Vendor in numba.misc for CUDA-specific changes (#457) - Vendor in optional, boxing for CUDA-specific changes, fix dangling imports (#476) - [test] Remove dependency on cpu_target (#490) - Change dangling imports of numba.core.lowering to numba.cuda.lowering (#475) - [test] Use numpy's tolerance for float16 (#491) - [Refactor][NFC] Vendor-in numba.extending for future CUDA-specific changes (#466) - [Refactor][NFC] Vendor-in more cpython registries for future CUDA-specific changes (#478)

atmnp self-assigned this Oct 29, 2025

atmnp added the 1 - On Deck To be worked on next label Oct 29, 2025

atmnp marked this pull request as draft October 29, 2025 17:57

atmnp added 16 commits October 30, 2025 09:36

[Refactor][NFC] Remove references to numba.__file__ and numba.__path__

94014ce

[Refactor][NFC] Remove references to pndindex, literal_unroll, stencils

ab627e8

[Refactor][NFC] Remove reference to numba.__version__

679d26f

[Refactor][NFC] Vendor-in mergesort and quicksort for future CUDA-spe…

f99f206

…cific changes

[Refactor][NFC] Remove references to prange

c6ca436

[Refactor][NFC] Remove references to numba.misc, update to numba.cuda…

b1f91ab

….misc

[Refactor][NFC] Remove more prange references

eb1accd

[Refactor][NFC] Update references to numba.errors to now reference nu…

0782fba

…mba.cuda.errors

[NFC][Refactor] Replace some dangling imports, add proper guards arou…

5137261

…nd numba not being alongside Numba-CUDA

[Refactor][NFC] Introduce version_info structure and version generati…

3c85fde

…on function; update imports to use numba.cuda (berzerk cleaning of imports, introduced # compat-ignore as a way for my scripts to ignore necessary numba imports)

update numba.cuda imports to fix test errors

37692da

remove legacy_type_system usage

c7239fd

remove LEGACY_TYPE_SYSTEM in numba.cuda.core.config, skip test on sim

c28b063

Remove MakeFunctionToJitFunction pass, which implicitly uses njit, no…

0a60d27

…t a use case we intend to support

bunch of sporadic import fixes, cleanup soon

5179b84

cleaning up some of the guards, removing some of the standalone_numba…

caaa5cc

…_cuda reasons (not needed), cleaning up some imports

atmnp force-pushed the atmn/cleanups branch from a2c4171 to caaa5cc Compare October 30, 2025 17:43

atmnp added 2 commits October 30, 2025 11:05

skip array reductions test on sim

31af244

add an extra guard in test_overload for cpu tests

15dc22a

atmnp changed the title ~~[WIP][NFC][Cleanups] Update imports to upstream numba to use the numba.cuda modules~~ [Refactor][NFC][Cleanups] Update imports to upstream numba to use the numba.cuda modules Oct 30, 2025

atmnp marked this pull request as ready for review October 30, 2025 19:11