Vendor the imputils module for CUDA refactoring by ashermancinelli · Pull Request #448 · NVIDIA/numba-cuda

ashermancinelli · 2025-08-29T14:30:42Z

No modules import this module in numba-cuda yet, but other modules that have yet to be vendored will use it. I see no standalone tests for this module. It would be nice to vendor this module so it is already in the repo in case we would like to vendor any of the following modules for CUDA-specific refactoring.

These are the places it is being imported in upstream numba:

Click to expand

numba/typed/dictobject.py
numba/typed/listobject.py
numba/typed/typeddict.py
numba/typed/typedlist.py
numba/typed/dictimpl.py
numba/stencils/stencil.py
numba/parfors/parfor.py
numba/cpython/slicing.py
numba/cuda/vector_types.py
numba/cpython/new_numbers.py
numba/cpython/new_mathimpl.py
numba/cpython/old_tupleobj.py
numba/cpython/new_tupleobj.py
numba/cuda/mathimpl.py
numba/cpython/randomimpl.py
numba/cpython/old_numbers.py
numba/cpython/printimpl.py
numba/cpython/old_mathimpl.py
numba/cpython/cmathimpl.py
numba/misc/cffiimpl.py
numba/cpython/iterators.py
numba/cpython/setobj.py
numba/cuda/libdeviceimpl.py
numba/cpython/old_builtins.py
numba/cpython/enumimpl.py
numba/cuda/cudaimpl.py
numba/cpython/new_builtins.py
numba/cpython/unicode_support.py
numba/tests/test_target_extension.py
numba/cpython/listobj.py
numba/cpython/rangeobj.py
numba/cuda/printimpl.py
numba/cpython/unicode.py
numba/experimental/structref.py
numba/experimental/function_type.py
numba/experimental/jitclass/base.py
numba/core/base.py
numba/core/optional.py
numba/core/inline_closurecall.py
numba/tests/pdlike_usecase.py
numba/core/extending.py
numba/tests/test_extending.py
numba/np/npyfuncs.py
numba/np/linalg.py
numba/np/old_arraymath.py
numba/np/unsafe/ndarray.py
numba/np/arrayobj.py
numba/np/new_arraymath.py
numba/np/npyimpl.py
numba/tests/test_practical_lowering_issues.py
numba/np/npdatetime.py
numba/np/math/mathimpl.py
numba/np/math/cmathimpl.py
numba/np/math/numbers.py
numba/tests/test_nrt.py
numba/tests/test_target_overloadselector.py

Depends on at least the following modules:

extending module (no PR at time of writing)
numba.np module
Vendor the inline-closure-call module #443
Vendor the base module for CUDA-specific refactoring #445

copy-pr-bot · 2025-08-29T14:30:46Z

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

ashermancinelli · 2025-08-29T14:31:01Z

/ok to test e07ff46

VijayKandiah · 2025-08-29T17:43:16Z

No modules import this module in numba-cuda yet, but other modules that have yet to be vendored will use it. I see no standalone tests for this module. It would be nice to vendor this module so it is already in the repo in case we would like to vendor any of the following modules for CUDA-specific refactoring.

I do see some places this module is currently being imported:

$ rg -i imputils
numba/cuda/cudaimpl.py
9:from numba.core.imputils import Registry

numba/cuda/extending.py
25:    from numba.core.imputils import impl_ret_borrowed

numba/cuda/_internal/cuda_fp16.py
26:from numba.core.imputils import Registry as TargetRegistry, lower_cast

numba/cuda/libdeviceimpl.py
4:from numba.core.imputils import Registry

numba/cuda/mathimpl.py
6:from numba.core.imputils import Registry

numba/cuda/printimpl.py
6:from numba.core.imputils import Registry

numba/cuda/vector_types.py
9:from numba.core.imputils import Registry as ImplRegistry

ashermancinelli · 2025-08-29T17:54:41Z

Thanks! Not sure how I missed those.

ashermancinelli · 2025-08-29T19:44:04Z

/ok to test 38a84c0

gmarkall · 2025-09-03T07:57:57Z

The two PRs highlighted as dependencies of this PR have now been merged - does this unblock it, or are there new dependencies that will need to be covered?

ashermancinelli · 2025-09-03T15:10:34Z

After updating, I found additional dependencies so this will have to remain blocked for now. Thanks Graham!

ashermancinelli · 2025-10-09T21:08:15Z

/ok to test f40b4bd

ashermancinelli · 2025-10-09T21:37:40Z

The failing test is from thirdparty cudf tests:

NotImplementedError: No definition for lowering <class 'cudf.core.udf.api.Masked'>(int64, bool) -> Masked(int64)

details

____________ test_masked_udf_scalar_args_binops_multiple[ge-data0] _____________
Traceback (most recent call last):
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/cudf/core/indexed_frame.py", line 3515, in _apply
    kernel, retty = kr.get_kernel()
                    ^^^^^^^^^^^^^^^
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/cudf/core/udf/udf_kernel_base.py", line 167, in get_kernel
    return self._compile_or_get_kernel()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/cudf/core/udf/udf_kernel_base.py", line 184, in _compile_or_get_kernel
    kernel, scalar_return_type = self.compile_kernel()
                                 ^^^^^^^^^^^^^^^^^^^^^
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/cudf/core/udf/udf_kernel_base.py", line 145, in compile_kernel
    kernel = self.compile_kernel_string(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/cudf/core/udf/udf_kernel_base.py", line 159, in compile_kernel_string
    kernel = cuda.jit(
             ^^^^^^^^^
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/numba_cuda/numba/cuda/decorators.py", line 207, in _jit
    disp.compile(argtypes)
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/numba_cuda/numba/cuda/dispatcher.py", line 1894, in compile
    kernel = _Kernel(self.py_func, argtypes, **self.targetoptions)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/numba_cuda/numba/cuda/dispatcher.py", line 148, in __init__
    cres = compile_cuda(
           ^^^^^^^^^^^^^
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/numba_cuda/numba/cuda/compiler.py", line 750, in compile_cuda
    cres = compile_extra(
           ^^^^^^^^^^^^^^
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/numba_cuda/numba/cuda/compiler.py", line 591, in compile_extra
    return pipeline.compile_extra(func)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/numba_cuda/numba/cuda/core/compiler.py", line 124, in compile_extra
    return self._compile_bytecode()
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/numba_cuda/numba/cuda/core/compiler.py", line 192, in _compile_bytecode
    return self._compile_core()
           ^^^^^^^^^^^^^^^^^^^^
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/numba_cuda/numba/cuda/core/compiler.py", line 168, in _compile_core
    raise e
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/numba_cuda/numba/cuda/core/compiler.py", line 160, in _compile_core
    pm.run(self.state)
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/numba_cuda/numba/cuda/core/compiler_machinery.py", line 393, in run
    raise e
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/numba_cuda/numba/cuda/core/compiler_machinery.py", line 386, in run
    self._runPass(idx, pass_inst, state)
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/numba_cuda/numba/cuda/core/compiler_machinery.py", line 338, in _runPass
    mutated |= check(pss.run_pass, internal_state)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/numba_cuda/numba/cuda/core/compiler_machinery.py", line 292, in check
    mangled = func(compiler_state)
              ^^^^^^^^^^^^^^^^^^^^
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/numba_cuda/numba/cuda/core/typed_passes.py", line 355, in run_pass
    lower.lower()
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/numba_cuda/numba/cuda/lowering.py", line 230, in lower
    self.lower_normal_function(self.fndesc)
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/numba_cuda/numba/cuda/lowering.py", line 269, in lower_normal_function
    entry_block_tail = self.lower_function_body()
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/numba_cuda/numba/cuda/lowering.py", line 300, in lower_function_body
    self.lower_block(block)
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/numba_cuda/numba/cuda/lowering.py", line 318, in lower_block
    self.lower_inst(inst)
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/numba_cuda/numba/cuda/lowering.py", line 518, in lower_inst
    val = self.lower_assign(ty, inst)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/numba_cuda/numba/cuda/lowering.py", line 746, in lower_assign
    return self.lower_expr(ty, value)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/numba_cuda/numba/cuda/lowering.py", line 1285, in lower_expr
    res = self.lower_call(resty, expr)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/numba_cuda/numba/cuda/lowering.py", line 1029, in lower_call
    res = self._lower_call_normal(fnty, expr, signature)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/numba_cuda/numba/cuda/lowering.py", line 1247, in _lower_call_normal
    impl = self.context.get_function(fnty, signature)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/numba_cuda/numba/cuda/core/base.py", line 575, in get_function
    return self.get_function(fn, sig, _firstcall=False)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pyenv/versions/3.12.11/lib/python3.12/site-packages/numba_cuda/numba/cuda/core/base.py", line 577, in get_function
    raise NotImplementedError(
NotImplementedError: No definition for lowering <class 'cudf.core.udf.api.Masked'>(int64, bool) -> Masked(int64)

I'm investigating.

brandon-b-miller · 2025-10-10T02:16:41Z

numba_cuda/numba/cuda/core/imputils.py

+# (functions, attributes, type casts)
+builtin_registry = Registry("builtin_registry")
+
+lower_builtin = builtin_registry.lower


As mentioned offline I believe this may be the source of the test failures. I'll try and confirm my suspicion here and think about what to to do beyond eventually updating cuDF.

I'm wondering if we need to add the registry here:

# context.py class Context(BaseContext): # This list will be extended to include all the registries # that are needed for CUDA def load_additional_registries(self):

Once I can repro the cudf tests locally I'll try some things out. Thanks for taking a look!

Reproducing locally, and adding numba's registries to CUDATargetContext::load_additional_registries() is allowing these tests to pass. I might be missing other registries, but the cudf tests that were failing that I have retested are all passing now.

ashermancinelli · 2025-10-10T15:07:16Z

/ok to test e6b6bd2

…mputils

ashermancinelli · 2025-10-10T15:56:08Z

/ok to test 6253827

gmarkall

This looks good.

I think the workaround installing the upstream registry is probably the best possible workaround; I feel a little nervous that it might somehow cause issues, but as I can't predict the nature of them, I think we'll just have to address them if / when they occur.

- Add support for cache-hinted load and store operations (NVIDIA#587) - Add more thirdparty tests (NVIDIA#586) - Add sphinx-lint to pre-commit and fix errors (NVIDIA#597) - Add DWARF variant part support for polymorphic variables in CUDA debug info (NVIDIA#544) - chore: clean up dead workaround for unavailable `lru_cache` (NVIDIA#598) - chore(docs): format types docs (NVIDIA#596) - refactor: decouple `Context` from `Stream` and `Event` objects (NVIDIA#579) - Fix freezing in of constant arrays with negative strides (NVIDIA#589) - Update tests to accept variants of generated PTX (NVIDIA#585) - refactor: replace device functionality with `cuda.core` APIs (NVIDIA#581) - Move frontend tests to `cudapy` namespace (NVIDIA#558) - Generalize the concurrency group for main merges (NVIDIA#582) - ci: move pre-commit checks to pre commit action (NVIDIA#577) - chore(pixi): set up doc builds; remove most `build-conda` dependencies (NVIDIA#574) - ci: ensure that python version in ci matches matrix (NVIDIA#575) - Fix the `cuda.is_supported_version()` API (NVIDIA#571) - Fix checks on main (NVIDIA#576) - feat: add `math.nextafter` (NVIDIA#543) - ci: replace conda testing with pixi (NVIDIA#554) - [CI] Run PR workflow on merge to main (NVIDIA#572) - Propose Alternative Module Path for `ext_types` and Maintain `numba.cuda.types.bfloat16` Import API (NVIDIA#569) - test: enable fail-on-warn and clean up resulting failures (NVIDIA#529) - [Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific changes (NVIDIA#565) - Fix registration with Numba, vendor MakeFunctionToJITFunction tests (NVIDIA#566) - [Refactor][NFC][Cleanups] Update imports to upstream numba to use the numba.cuda modules (NVIDIA#561) - test: refactor process-based tests to use concurrent futures in order to simplify tests (NVIDIA#550) - test: revert back to ipc futures that await each iteration (NVIDIA#564) - chore(deps): move to self-contained pixi.toml to avoid mixed-pypi-pixi environments (NVIDIA#551) - [Refactor][NFC] Vendor-in errors for future CUDA-specific changes (NVIDIA#534) - Remove dependencies on target_extension for CUDA target (NVIDIA#555) - Relax the pinning to `cuda-core` to allow it floating across minor releases (NVIDIA#559) - [WIP] Port numpy reduction tests to CUDA (NVIDIA#523) - ci: add timeout to avoid blocking the job queue (NVIDIA#556) - Handle `cuda.core.Stream` in driver operations (NVIDIA#401) - feat: add support for `math.exp2` (NVIDIA#541) - Vendor in types and datamodel for CUDA-specific changes (NVIDIA#533) - refactor: cleanup device constructor (NVIDIA#548) - bench: add cupy to array constructor kernel launch benchmarks (NVIDIA#547) - perf: cache dimension computations (NVIDIA#542) - perf: remove duplicated size computation (NVIDIA#537) - chore(perf): add torch to benchmark (NVIDIA#539) - test: speed up ipc tests by ~6.5x (NVIDIA#527) - perf: speed up kernel launch (NVIDIA#510) - perf: remove context threading in various pointer abstractions (NVIDIA#536) - perf: reduce the number of `__cuda_array_interface__` accesses (NVIDIA#538) - refactor: remove unnecessary custom map and set implementations (NVIDIA#530) - [Refactor][NFC] Vendor-in vectorize decorators for future CUDA-specific changes (NVIDIA#513) - test: add benchmarks for kernel launch for reproducibility (NVIDIA#528) - test(pixi): update pixi testing command to work with the new `testing` directory (NVIDIA#522) - refactor: fully remove `USE_NV_BINDING` (NVIDIA#525) - Draft: Vendor in the IR module (NVIDIA#439) - pyproject.toml: add search path for Pyrefly (NVIDIA#524) - Vendor in numba.core.typing for CUDA-specific changes (NVIDIA#473) - Use numba.config when available, otherwise use numba.cuda.config (NVIDIA#497) - [MNT] Drop NUMBA_CUDA_USE_NVIDIA_BINDING; always use cuda.core and cuda.bindings as fallback (NVIDIA#479) - Vendor in dispatcher, entrypoints, pretty_annotate for CUDA-specific changes (NVIDIA#502) - build: allow parallelization of nvcc testing builds (NVIDIA#521) - chore(dev-deps): add pixi (NVIDIA#505) - Vendor the imputils module for CUDA refactoring (NVIDIA#448) - Don't use `MemoryLeakMixin` for tests that don't use NRT (NVIDIA#519) - Switch back to stable cuDF release in thirdparty tests (NVIDIA#518) - Updating .gitignore with binaries in the `testing` folder (NVIDIA#516) - Remove some unnecessary uses of ContextResettingTestCase (NVIDIA#507) - Vendor in _helperlib cext for CUDA-specific changes (NVIDIA#512) - Vendor in typeconv for future CUDA-specific changes (NVIDIA#499) - [Refactor][NFC] Vendor-in numba.cpython modules for future CUDA-specific changes (NVIDIA#493) - [Refactor][NFC] Vendor-in numba.np modules for future CUDA-specific changes (NVIDIA#494) - Make the CUDA target the default for CUDA overload decorators (NVIDIA#511) - Remove C extension loading hacks (NVIDIA#506) - Ensure NUMBA can manipulate memory from CUDA graphs before the graph is launched (NVIDIA#437) - [Refactor][NFC] Vendor-in core Numba analysis utils for CUDA-specific changes (NVIDIA#433) - Fix Bf16 Test OB Error (NVIDIA#509) - Vendor in components from numba.core.runtime for CUDA-specific changes (NVIDIA#498) - [Refactor] Vendor in _dispatcher, _devicearray, mviewbuf C extension for CUDA-specific customization (NVIDIA#373) - [MNT] Managed UM memset fallback and skip CUDA IPC tests on WSL2 (NVIDIA#488) - Improve debug value range coverage (NVIDIA#461) - Add `compile_all` API (NVIDIA#484) - Vendor in core.registry for CUDA-specific changes (NVIDIA#485) - [Refactor][NFC] Vendor in numba.misc for CUDA-specific changes (NVIDIA#457) - Vendor in optional, boxing for CUDA-specific changes, fix dangling imports (NVIDIA#476) - [test] Remove dependency on cpu_target (NVIDIA#490) - Change dangling imports of numba.core.lowering to numba.cuda.lowering (NVIDIA#475) - [test] Use numpy's tolerance for float16 (NVIDIA#491) - [Refactor][NFC] Vendor-in numba.extending for future CUDA-specific changes (NVIDIA#466) - [Refactor][NFC] Vendor-in more cpython registries for future CUDA-specific changes (NVIDIA#478)

- Add support for cache-hinted load and store operations (#587) - Add more thirdparty tests (#586) - Add sphinx-lint to pre-commit and fix errors (#597) - Add DWARF variant part support for polymorphic variables in CUDA debug info (#544) - chore: clean up dead workaround for unavailable `lru_cache` (#598) - chore(docs): format types docs (#596) - refactor: decouple `Context` from `Stream` and `Event` objects (#579) - Fix freezing in of constant arrays with negative strides (#589) - Update tests to accept variants of generated PTX (#585) - refactor: replace device functionality with `cuda.core` APIs (#581) - Move frontend tests to `cudapy` namespace (#558) - Generalize the concurrency group for main merges (#582) - ci: move pre-commit checks to pre commit action (#577) - chore(pixi): set up doc builds; remove most `build-conda` dependencies (#574) - ci: ensure that python version in ci matches matrix (#575) - Fix the `cuda.is_supported_version()` API (#571) - Fix checks on main (#576) - feat: add `math.nextafter` (#543) - ci: replace conda testing with pixi (#554) - [CI] Run PR workflow on merge to main (#572) - Propose Alternative Module Path for `ext_types` and Maintain `numba.cuda.types.bfloat16` Import API (#569) - test: enable fail-on-warn and clean up resulting failures (#529) - [Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific changes (#565) - Fix registration with Numba, vendor MakeFunctionToJITFunction tests (#566) - [Refactor][NFC][Cleanups] Update imports to upstream numba to use the numba.cuda modules (#561) - test: refactor process-based tests to use concurrent futures in order to simplify tests (#550) - test: revert back to ipc futures that await each iteration (#564) - chore(deps): move to self-contained pixi.toml to avoid mixed-pypi-pixi environments (#551) - [Refactor][NFC] Vendor-in errors for future CUDA-specific changes (#534) - Remove dependencies on target_extension for CUDA target (#555) - Relax the pinning to `cuda-core` to allow it floating across minor releases (#559) - [WIP] Port numpy reduction tests to CUDA (#523) - ci: add timeout to avoid blocking the job queue (#556) - Handle `cuda.core.Stream` in driver operations (#401) - feat: add support for `math.exp2` (#541) - Vendor in types and datamodel for CUDA-specific changes (#533) - refactor: cleanup device constructor (#548) - bench: add cupy to array constructor kernel launch benchmarks (#547) - perf: cache dimension computations (#542) - perf: remove duplicated size computation (#537) - chore(perf): add torch to benchmark (#539) - test: speed up ipc tests by ~6.5x (#527) - perf: speed up kernel launch (#510) - perf: remove context threading in various pointer abstractions (#536) - perf: reduce the number of `__cuda_array_interface__` accesses (#538) - refactor: remove unnecessary custom map and set implementations (#530) - [Refactor][NFC] Vendor-in vectorize decorators for future CUDA-specific changes (#513) - test: add benchmarks for kernel launch for reproducibility (#528) - test(pixi): update pixi testing command to work with the new `testing` directory (#522) - refactor: fully remove `USE_NV_BINDING` (#525) - Draft: Vendor in the IR module (#439) - pyproject.toml: add search path for Pyrefly (#524) - Vendor in numba.core.typing for CUDA-specific changes (#473) - Use numba.config when available, otherwise use numba.cuda.config (#497) - [MNT] Drop NUMBA_CUDA_USE_NVIDIA_BINDING; always use cuda.core and cuda.bindings as fallback (#479) - Vendor in dispatcher, entrypoints, pretty_annotate for CUDA-specific changes (#502) - build: allow parallelization of nvcc testing builds (#521) - chore(dev-deps): add pixi (#505) - Vendor the imputils module for CUDA refactoring (#448) - Don't use `MemoryLeakMixin` for tests that don't use NRT (#519) - Switch back to stable cuDF release in thirdparty tests (#518) - Updating .gitignore with binaries in the `testing` folder (#516) - Remove some unnecessary uses of ContextResettingTestCase (#507) - Vendor in _helperlib cext for CUDA-specific changes (#512) - Vendor in typeconv for future CUDA-specific changes (#499) - [Refactor][NFC] Vendor-in numba.cpython modules for future CUDA-specific changes (#493) - [Refactor][NFC] Vendor-in numba.np modules for future CUDA-specific changes (#494) - Make the CUDA target the default for CUDA overload decorators (#511) - Remove C extension loading hacks (#506) - Ensure NUMBA can manipulate memory from CUDA graphs before the graph is launched (#437) - [Refactor][NFC] Vendor-in core Numba analysis utils for CUDA-specific changes (#433) - Fix Bf16 Test OB Error (#509) - Vendor in components from numba.core.runtime for CUDA-specific changes (#498) - [Refactor] Vendor in _dispatcher, _devicearray, mviewbuf C extension for CUDA-specific customization (#373) - [MNT] Managed UM memset fallback and skip CUDA IPC tests on WSL2 (#488) - Improve debug value range coverage (#461) - Add `compile_all` API (#484) - Vendor in core.registry for CUDA-specific changes (#485) - [Refactor][NFC] Vendor in numba.misc for CUDA-specific changes (#457) - Vendor in optional, boxing for CUDA-specific changes, fix dangling imports (#476) - [test] Remove dependency on cpu_target (#490) - Change dangling imports of numba.core.lowering to numba.cuda.lowering (#475) - [test] Use numpy's tolerance for float16 (#491) - [Refactor][NFC] Vendor-in numba.extending for future CUDA-specific changes (#466) - [Refactor][NFC] Vendor-in more cpython registries for future CUDA-specific changes (#478)

ashermancinelli requested a review from gmarkall August 29, 2025 14:30

ashermancinelli self-assigned this Aug 29, 2025

ashermancinelli added the 2 - In Progress Currently a work in progress label Aug 29, 2025

ashermancinelli added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Aug 29, 2025

ashermancinelli added 2 - In Progress Currently a work in progress and removed 3 - Ready for Review Ready for review by team labels Aug 29, 2025

ashermancinelli added 0 - Blocked Cannot progress due to external reasons and removed 2 - In Progress Currently a work in progress labels Aug 29, 2025

gmarkall added 2 - In Progress Currently a work in progress and removed 0 - Blocked Cannot progress due to external reasons labels Sep 3, 2025

gmarkall closed this Sep 3, 2025

gmarkall reopened this Sep 3, 2025

ashermancinelli added 0 - Blocked Cannot progress due to external reasons and removed 2 - In Progress Currently a work in progress labels Sep 3, 2025

Vendor in imputils module

f40b4bd

ashermancinelli force-pushed the ajm/vendor-imputils branch from 43ea4bc to f40b4bd Compare October 9, 2025 21:07

ashermancinelli added 3 - Ready for Review Ready for review by team and removed 0 - Blocked Cannot progress due to external reasons labels Oct 9, 2025

ashermancinelli mentioned this pull request Oct 9, 2025

Vendor the datamodel modules for CUDA-specific refactoring #465

Closed

brandon-b-miller reviewed Oct 10, 2025

View reviewed changes

Load upstream registries if numba is available

e6b6bd2

Merge branch 'main' of github.com:NVIDIA/numba-cuda into ajm/vendor-i…

6253827

…mputils

ashermancinelli requested a review from brandon-b-miller October 10, 2025 15:56

gmarkall approved these changes Oct 13, 2025

View reviewed changes

gmarkall added 5 - Ready to merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Oct 13, 2025

gmarkall merged commit e55679c into NVIDIA:main Oct 13, 2025
76 checks passed

gmarkall mentioned this pull request Nov 20, 2025

Bump version to 0.21.0 #602

Merged

Conversation

ashermancinelli commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Aug 29, 2025

Uh oh!

ashermancinelli commented Aug 29, 2025

Uh oh!

VijayKandiah commented Aug 29, 2025

Uh oh!

ashermancinelli commented Aug 29, 2025

Uh oh!

ashermancinelli commented Aug 29, 2025

Uh oh!

gmarkall commented Sep 3, 2025

Uh oh!

ashermancinelli commented Sep 3, 2025

Uh oh!

ashermancinelli commented Oct 9, 2025

Uh oh!

ashermancinelli commented Oct 9, 2025

Uh oh!

brandon-b-miller Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

ashermancinelli Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

ashermancinelli Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

ashermancinelli commented Oct 10, 2025

Uh oh!

ashermancinelli commented Oct 10, 2025

Uh oh!

gmarkall left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ashermancinelli commented Aug 29, 2025 •

edited

Loading