Draft: Vendor in the IR module #439

ashermancinelli · 2025-08-28T15:11:49Z

The plan-of-record as discussed offline with @gmarkall is to vendor this module piecemeal; modules in numba-cuda will continue to use numba until we have vendored in additional modules (such as types and typing).

The IR module is near the root of the import graph of numba-cuda, so it has many dependencies. Several other modules must be vendored in before this can be merged, so this will remain a draft for now. The biggest dependency is currently #373. The purpose of this PR being open prior to being ready for review is to give the team a sense of how close we are to having this module vendored in.

Depends-on:

copy-pr-bot · 2025-08-28T15:11:53Z

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

copy-pr-bot · 2025-08-28T15:12:24Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

#440) Numba-CUDA relies on the upstream Numba cpu_options to configure itself, this change vendors in the class, so we can later remove the ParallelOptions once #439 is merged.

ashermancinelli · 2025-09-04T15:29:01Z

I'll re-evaluate if this is blocked on additional modules, or remove the draft status if we can really merge it.

@gmarkall

commit 9eb9e15 Author: Asher Mancinelli <[email protected]> Date: Thu Sep 4 08:01:00 2025 -0700 Vendor in the CPU module for CUDA refactoring (NVIDIA#447) The CPU module has no standalone tests and is likely well-covered by existing tests to the extent it is used. It will be used by the registry module once that is vendored in. Depends-on: - NVIDIA#440 --------- Co-authored-by: Graham Markall <[email protected]> commit 24967a0 Author: brandon-b-miller <[email protected]> Date: Thu Sep 4 09:37:45 2025 -0500 Close `rtsys` singleton when `cuda.close()` is called (NVIDIA#455) This PR fixes an issue where `cuda.close` invalidated the context containing the current memsys module without also invalidating the `_Runtime` instance that ostensibly owned that module. This caused reference errors later on when trying to use that module. Closes NVIDIA#453 commit 6db9a70 Author: Atmn Patel <[email protected]> Date: Thu Sep 4 09:19:08 2025 -0500 [Refactor][NFC] Vendor-in numba.core.unsafe for future CUDA-specific changes (NVIDIA#460) We primarily use the exception handling mechanisms in numba.core.unsafe right now, but we will also use it in cpython.hashing once we've ported that over. I've removed the nrt support since we won't end up porting over it's uses from upstream Numba. --------- Co-authored-by: Graham Markall <[email protected]> commit 6828bee Author: Atmn Patel <[email protected]> Date: Thu Sep 4 08:16:52 2025 -0500 [Cleanups] Fix dangling imports, remove residual parfor handling (NVIDIA#458) commit f856b37 Author: Asher Mancinelli <[email protected]> Date: Thu Sep 4 05:43:33 2025 -0700 Vendor in the transforms module (NVIDIA#444) There are no standalone tests for the transforms module, so none have been vendored. commit 230b02c Author: Graham Markall <[email protected]> Date: Thu Sep 4 12:45:31 2025 +0100 Compatibility fixes for Numba 0.62 (NVIDIA#427) - `newpassmanagers` need to be used with llvmlite >= 0.45 (the legacy pass manager is removed with the move to LLVM 20) - The `Array._allocate()` overload was made CPU-only with numba/numba#10185, so we need one for the CUDA target. commit a2bf0e1 Author: Atmn Patel <[email protected]> Date: Wed Sep 3 11:06:10 2025 -0500 [Refactor][NFC] Remove `skip_if_no_external_compiler()` (NVIDIA#449) This removes the only function we have that uses pycc, which itself is not required. commit 9a09f13 Author: Asher Mancinelli <[email protected]> Date: Wed Sep 3 00:55:22 2025 -0700 Vendor the base module for CUDA-specific refactoring (NVIDIA#445) There are no standalone tests for this module. It is imported from the cpu and caching modules upstream, but we have not pulled in the cpu module and the caching module has already been vendored but BaseContext was not (as it was unused by the upstream caching module anyways). There are some tests which import this module, but we have versions of the same tests and they do not use the base module: * numba/numba/tests/test_target_overloadselector.py * numba/numba/tests/test_ufuncs.py * numba/numba/tests/test_target_extension.py --------- Co-authored-by: Graham Markall <[email protected]> Co-authored-by: Graham Markall <[email protected]> commit 65705ad Author: Asher Mancinelli <[email protected]> Date: Tue Sep 2 15:36:07 2025 -0700 Vendor in the typeinfer module (NVIDIA#442) Some of the tests were specific to the CPU, so I did not port them to GPU kernels (many tests could still be preserved though). --------- Co-authored-by: Graham Markall <[email protected]> commit 49f3d88 Author: Asher Mancinelli <[email protected]> Date: Tue Sep 2 03:34:39 2025 -0700 Vendor in the rewrites module (NVIDIA#438) Vendor in the rewrites module for CUDA-specific refactoring. commit 1bed09b Author: Asher Mancinelli <[email protected]> Date: Tue Sep 2 03:21:20 2025 -0700 Vendor in the pythonapi module (NVIDIA#446) commit 06e6103 Author: Atmn Patel <[email protected]> Date: Tue Sep 2 04:10:01 2025 -0500 [Refactor][NFC] Vendor in testing support utils for CUDA-specific changes (NVIDIA#432) This change vendors in numba.testing.support, specifically the Mixins that we use for testing for memory leaks and checking warnings. Notes from review: It's a little non-obvious why the change to the simulator's `RTSys` was needed. It is because the `BaseUFuncTest` switched to the Numba-CUDA `MemoryLeakMixin` which initializes the Numba-CUDA `rtsys`. Prior to this PR, it was erroneously using the CPU-based `MemoryLeakMixin`. commit 2c2578d Author: Asher Mancinelli <[email protected]> Date: Mon Sep 1 07:40:03 2025 -0700 Vendor the inline-closure-call module (NVIDIA#443) This module has no standalone tests. All imports have been updated. Notes from review follow: It was expected that making the following change: ```diff iff --git a/numba_cuda/numba/cuda/core/ir_utils.py b/numba_cuda/numba/cuda/core/ir_utils.py index a2ea996..d48e2dd 100644 --- a/numba_cuda/numba/cuda/core/ir_utils.py +++ b/numba_cuda/numba/cuda/core/ir_utils.py @@ -1990,7 +1990,7 @@ def get_ir_of_code(glbls, fcode): inline_pass = numba.core.inline_closurecall.InlineClosureCallPass( ir, numba.cuda.core.options.ParallelOptions(False), swapped ) - inline_pass.run() + #inline_pass.run() # TODO: DO NOT ADD MORE THINGS HERE! # If adding more things here is being contemplated, it really is time to ``` would cause an example like this to fail to run: ```python @cuda.jit() def f(): def k(x): return x + 1 print(k(4)) f[1, 1]() ``` but it didn't. It seems like the main reason this was added was for parfors, and for stencil support. I (Graham) feel 90% sure this isn't doing anything for the CUDA target, but would want to check a bit more thoroughly before we remove this entirely (sometime in the future). There are some imports from `numba.core.stencil` in `inline_closurecall.py` - for dealing with these in future we should probably delete / make a no-op the code referencing it, rather than trying to bring that in. --------- Co-authored-by: Graham Markall <[email protected]> commit a20796f Author: Vijay Kandiah <[email protected]> Date: Mon Sep 1 08:30:31 2025 -0500 [Refactor] Vendor in typing.npydecl, typing.BaseContext, typing.Context for CUDA-specific changes (NVIDIA#399) This PR vendors in few more classes (`typing.npydecl, typing.BaseContext, typing.Context`) from `numba.core.typing` for CUDA-specific customization. commit db7aabe Author: Atmn Patel <[email protected]> Date: Mon Sep 1 08:28:09 2025 -0500 [Refactor][NFC] Vendor-in tracing module for future CUDA-specific changes (NVIDIA#441) We use the `event` from this module, and we might want to implement CUDA target specific tracing logic. commit e9eb4df Author: Atmn Patel <[email protected]> Date: Mon Sep 1 08:19:09 2025 -0500 [Refactor][NFC] Vendor-in cpu_options for future CUDA-specific changes (NVIDIA#440) Numba-CUDA relies on the upstream Numba cpu_options to configure itself, this change vendors in the class, so we can later remove the ParallelOptions once NVIDIA#439 is merged. commit 15fdfbc Author: lakshayg <[email protected]> Date: Fri Aug 29 12:02:05 2025 -0700 Regenerate float16 bindings (NVIDIA#436) This PR is based on NVIDIA#265 and merely regenerates the bindings after all the features requested from Numbast were implemented. commit 93404d2 Author: Vijay Kandiah <[email protected]> Date: Fri Aug 29 03:25:27 2025 -0500 Vendor in debuginfo, utils, cffi_utils, enumdecl for CUDA-specific changes (NVIDIA#398) This PR vendors in `debuginfo, utils` from `numba.core`, and `cffi_utils, enumdecl` from `numba.core.typing` for CUDA-specific customization. Additionally fixes a dangling reference to `numba.tests.support.TestCase` by changing it to `numba.cuda.tests.support.TestCase`. commit 56a19f1 Author: Atmn Patel <[email protected]> Date: Thu Aug 28 09:47:23 2025 -0500 [Refactor] Vendor in cpython math impls for CUDA-specific customization (NVIDIA#378) This moves the cpython func impls (cmathimpl, mathimpl, and numbers) used in our ufunc_db into this repo. I no longer think that this needs additional comprehensive testing because the vast majority if not all of the impls are already tested within test_ufuncs.py, similar to NVIDIA#374. --------- Co-authored-by: Graham Markall <[email protected]> commit cff1bf2 Author: Atmn Patel <[email protected]> Date: Thu Aug 28 04:15:28 2025 -0500 [Cleanup] Fix dangling imports (NVIDIA#431) This fixes a dangling import to cgutils, removes a use of prange (not supported on the CUDA target), and also assumes the legacy Numba type system to simplify the code/remove more imports (@gmarkall suggested this should be default for now). commit 75fb24c Author: Marcus D. Hanwell <[email protected]> Date: Wed Aug 27 07:12:16 2025 -0400 Add Windows CI for CUDA 12 (NVIDIA#396) This tests one configuation only, for a start: Windows 11, AMD64, Python 3.12, CUDA 12.8, using wheels. The required changes are: - Copy the `install_gpu_driver.ps1` script from CUDA-Python's Windows CI. - Add a new job that tests on Windows, with the following minimal set of steps, the main ones being: - Install the driver - Install Python - Run the tests - Port the test script to Powershell. - Some Python code changes are also necessary: - Locate static libraries (cudadevrt) and includes from wheels. This was never needed before as they could be located on Linux test systems. - Catch a `RuntimeError` if there's an error in cuda-python loading the nvJitLink DLL - the exception it throws is a subclass of this. commit 69677be Author: Michael Wang <[email protected]> Date: Tue Aug 26 15:19:19 2025 -0700 Bfloat Test Expects Different PTX Based on CC (NVIDIA#428) This PR amends the bfloat16 PTX assertions to consider different PTX for different compute capabilities. Co-authored-by: Michael Wang <[email protected]> commit 7c0d300 Author: Atmn Patel <[email protected]> Date: Tue Aug 26 17:11:49 2025 -0500 [Refactor] Vendor in npyfuncs for CUDA-specific customization (NVIDIA#374) This moves the func impls used in our ufunc_db into this repo. I no longer think that this needs additional comprehensive testing because the vast majority if not all of the impls are already tested within `test_ufuncs.py`. commit 4a6f9d5 Author: Atmn Patel <[email protected]> Date: Tue Aug 26 12:32:56 2025 -0500 [Cleanup] Remove deadcode parfor support (NVIDIA#425) This removes the majority of the support for parfor handling brought in by previous vendoring PRs. The `auto_parallel` option in `Flags` can't be removed yet, there are other places where it is checked in upstream Numba. commit 1b8cdaa Author: Asher Mancinelli <[email protected]> Date: Tue Aug 26 07:56:11 2025 -0700 Vendor in SSA for CUDA-specific refactoring (NVIDIA#417) The tests had to undergo some significant refactoring - I tried to preserve as much of the original test coverage as I could, but here are notes on some tests that I had to remove entirely: * Test test_issue5482_objmode_expr_null_lowering was skipped because the object mode is not supported for the numba-cuda target. * Several tests that used custom compilation pipelines were omitted because custom pipelines do not work with numba.cuda.jit. We may want to re-add these tests if we add this functionality, but I did not include custom pipelines in this PR to keep the focus on vendoring in this component and not feature work. Otherwise, it was a 1-1 replacement of numba's jit and njit with numba.cuda.jit, and return values were replaced with assignments to output arrays. commit bc23f74 Author: brandon-b-miller <[email protected]> Date: Tue Aug 26 06:40:32 2025 -0500 Update thirdparty test locations (NVIDIA#426) This PR updates the thirdparty tests to reflect the location of the tests in the latest cudf nightly. commit 9408377 Author: Asher Mancinelli <[email protected]> Date: Tue Aug 26 04:08:39 2025 -0700 Vendor in postproc module for CUDA-specific refactoring (NVIDIA#418) This module has few dependencies and no standalone tests, so it has been vendored in as-is (with the copyright modifications). A few unrelated imports were importing modules from upstream numba that we have already vendored in, so I corrected them to use our copy. If we would like this PR to be strictly related to the postproc module, I can revert these changes. commit 8851c18 Author: Asher Mancinelli <[email protected]> Date: Tue Aug 26 03:32:18 2025 -0700 Vendor in the consts module for CUDA-specific refactoring (NVIDIA#419) consts.py has few dependencies and no standalone tests that I could find, so it has simply been vendored in and the existing uses in numba-cuda have been updated. I noticed untyped_passes was still conditionally importing the vendored-in interpreter.py based on the python version, so I updated that as well. commit 2bdefec Author: Atmn Patel <[email protected]> Date: Tue Aug 26 05:28:35 2025 -0500 [Refactor][NFC] Vendor-in np support utilities for CUDA-specific optimizations (NVIDIA#394) This is a minimal import, we may need to revisit this file in the future for ufuncs and more complicated numpy array checks/operations. --------- Co-authored-by: Graham Markall <[email protected]> commit a62854d Author: Atmn Patel <[email protected]> Date: Tue Aug 26 05:27:14 2025 -0500 [Refactor][NFC] Vendor in CloudPickler (NVIDIA#380) commit 5b0c95e Author: Atmn Patel <[email protected]> Date: Tue Aug 26 05:25:24 2025 -0500 Vendor-In find_lib for CUDA-specific changes (NVIDIA#407) This vendors in the small utility class we use to find shared libraries. Also fixes some dangling imports caused by the recent vendoring PRs. commit f9f2ef4 Author: Atmn Patel <[email protected]> Date: Tue Aug 26 04:41:54 2025 -0500 [Refactor][NFC] Vendor in ufunc utility functions, classes for CUDA-specific optimizations (NVIDIA#393) commit 0f6ea69 Author: brandon-b-miller <[email protected]> Date: Fri Aug 22 09:17:07 2025 -0500 Depend on `cccl` explicitly for `[cuXY]` wheels (NVIDIA#421) Adds a header we're missing that provides `cuda/atomic`. commit e6db7d1 Author: Michael Wang <[email protected]> Date: Fri Aug 22 04:25:51 2025 -0700 Improve Bfloat16 Support in Numba-CUDA (NVIDIA#376) This is an overhaul of the bfloat16 support. The PR improves bfloat16 by implementing a unified type object to the constructor handle, defining proper cast and unify rules of bfloat16 to other numeric types, adding *initial* host side support of this type (leveraging ml_dtypes package), printing support in kernel, as well as plethora of intrinsics. Details: 1. You may now `print(x)` on device. 2. You may now use comparison intrinsics, such as heq, hmax, hmax_nan etc. on device. 3. You may also use conversion intrinsics, such as `int64_to_bfloat16_rz` etc. on device; bit cast intrinsics, such as `int16_as_bfloat16` on device. --------- Co-authored-by: Michael Wang <[email protected]> commit c48e213 Author: Asher Mancinelli <[email protected]> Date: Thu Aug 21 14:17:46 2025 -0700 Backport interpreter.py for Python 3.9 (NVIDIA#392) Adds Python 3.9 to the supported versions in interpreter.py. Similar to NVIDIA#391, I looked at the diff between the current numba development branch and this commit to find the biggest differences: ``` commit 2c9f1caa3df941bf7066f7e977a18e122d363d42 Commit: kc611 <[email protected]> CommitDate: Wed Sep 18 17:43:06 2024 +0530 Removed references to Python 3.9 within core ``` There were not many; for the most part, Python 3.9 is handled the same as 3.10. NOTE: I am not concerned about subclassing issues from vendoring in `UnsupportedBytecodeError` based on what I see in numba-cuda because I do not see any uses of `self.assertRaises(UnsupportedBytecodeError)` or explicit catch blocks, which would need to be updated. --------- Co-authored-by: Graham Markall <[email protected]> commit d2827b7 Author: Keith Kraus <[email protected]> Date: Thu Aug 21 12:51:44 2025 -0400 Add CLA and third party Numba license (NVIDIA#410) Adds a `CLA.md` file as well as hooked up https://cla-assistant.io to handle CLA signing for contributors. Additionally adds a `LICENSE.numba` file and add relevant section to the `pyproject.toml` file to make sure we ship it as part of our package. commit 702f0ae Author: Graham Markall <[email protected]> Date: Thu Aug 21 17:47:13 2025 +0100 Remove costly but low-utility assertions (NVIDIA#416) I'm not aware of these assertions ever getting triggered by anyone making a mistake with registrations, but they do take a lot of time when the number of registrations is large - this commit thus removes them. commit 7d5a762 Author: Copilot <[email protected]> Date: Thu Aug 21 16:33:21 2025 +0000 [FEA] Add SPDX checker pre-commit hook (NVIDIA#415) This PR implements an SPDX header checker as a pre-commit hook to ensure that all new files include proper SPDX license and copyright identifiers, as requested in the feature request. ## Implementation The solution adapts the existing SPDX checker from the cuda-python repository: ### Files Added - **`toolshed/check_spdx.py`** - The main SPDX validation script that checks for both `SPDX-License-Identifier` and `SPDX-FileCopyrightText` headers in files - **`.spdx-ignore`** - Exclusion patterns for files that don't require SPDX headers (documentation, configuration files, build artifacts, version files, external headers, etc.) - **Updated `.pre-commit-config.yaml`** - Added a local pre-commit hook that runs the SPDX checker with the pathspec dependency ### Key Features - Validates that both required SPDX headers are present in source files - Uses gitignore-style pattern matching for file exclusions via pathspec - Allows empty files (no SPDX headers required) - Provides clear error messages showing which headers are missing from which files - Integrates seamlessly with the existing pre-commit infrastructure ### Exclusions The `.spdx-ignore` file excludes appropriate file types that shouldn't have SPDX headers: - Documentation files (`.md`, `.rst`, `.txt`) - Binary and generated files (`.so`, `.dll`, `.pyc`, `.ptx`) - Build artifacts (`build/*`, `dist/*`, `*.egg-info/*`) - Version files (`numba_cuda/VERSION`) - External CUDA SDK headers (`numba_cuda/numba/cuda/include/*/cuda_*.h`, `cuda_*.hpp`) - GitHub configuration files (`.github/CODEOWNERS`) ## Testing The implementation has been thoroughly tested: - ✅ All existing Python and shell files with SPDX headers pass validation - ✅ Files listed in `.spdx-ignore` are properly excluded - ✅ Files missing SPDX headers are correctly detected and reported - ✅ Empty files are allowed without headers - ✅ The checker script itself validates successfully - ✅ All pre-commit style checks now pass This ensures that as new files are added to the repository, they will be required to include proper SPDX identifiers, maintaining consistency with the existing codebase established in NVIDIA#412. Fixes NVIDIA#414.  --- 💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more [Copilot coding agent tips](https://gh.io/copilot-coding-agent-tips) in the docs. --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: kkraus14 <[email protected]> Co-authored-by: Keith Kraus <[email protected]> commit b3db2b6 Author: brandon-b-miller <[email protected]> Date: Thu Aug 21 10:42:08 2025 -0500 Remove `MVCLinker` and straggling uses of / references to `cubinlinker` (NVIDIA#413) Removes some cuda 11 related machinery. commit bef2949 Author: Asher Mancinelli <[email protected]> Date: Thu Aug 21 07:37:48 2025 -0700 Vendor in bytecode.py for CUDA-specific refactoring (NVIDIA#391) Vendors in the bytecode module for CUDA-specific refactoring. I am working towards vendoring in ir.py, but this has many dependencies and many modules use isinstance checks on classes they import from numba.core.ir directly, meaning these _users_ of numba.core.ir will all need to be vendored in before numba.core.ir itself can be. NOTE: This module was adapted to work with Python 3.9 again. I looked at the difference between the current numba development branch and [numba commit 2c9f1caa3](numba/numba@2c9f1ca). For the most part, Python 3.9 took the same branches as 3.10, so that is what I updated. The primary difference was in `get_jump_target`, because Python 3.10 changed the addressing of jump instructions. ``` commit 2c9f1caa3df941bf7066f7e977a18e122d363d42 Commit: kc611 <[email protected]> CommitDate: Wed Sep 18 17:43:06 2024 +0530 Removed references to Python 3.9 within core ``` --------- Co-authored-by: Graham Markall <[email protected]> commit dfa4911 Author: Vijay Kandiah <[email protected]> Date: Thu Aug 21 08:29:52 2025 -0500 Remove _helperlib dep., implement attempt_nocopy_reshape in python for CUDA-specific changes (NVIDIA#383) We are only using the method `attempt_nocopy_reshape` from numba's `_helperlib` C extension. This PR introduces a python implementation of attempt_nocopy_reshape for CUDA-specific customizations and to remove dependency on `_helperlib`. Co-authored-by: Graham Markall <[email protected]> commit 320e5f0 Author: Atmn Patel <[email protected]> Date: Thu Aug 21 06:40:21 2025 -0500 [Refactor][NFC] Vendor in Environment class (NVIDIA#379) This change also fixes some imports that weren't updated to Numba CUDA from a previously merged PR. Co-authored-by: Graham Markall <[email protected]> commit 3fa01f0 Author: Keith Kraus <[email protected]> Date: Wed Aug 20 20:50:25 2025 -0400 Add SPDX identifiers to all files (NVIDIA#412) Adds SPDX identifiers to all files. Converted existing Copyright statements to SPDX identifiers and added them where there was no existing Copyright statement. A few things we should check before merging here: - [x] SPDX identifiers were added to `.rst` files via block comments. Sphinx also seemingly uses comments for functionality, so we should make sure it doesn't break doc rendering and remove the SPDX identifiers if it does. - [x] Added SPDX identifiers to the `.pth` redirector file. The Python documentation (https://docs.python.org/3/library/site.html) claims that "Blank lines and lines beginning with `#` are skipped" so this should be okay. I believe our CI suite should exercise this already, but if this is not the case we should explicitly check it. - [x] There were a couple of places that files had Apache 2.0 licenses, presumably from being vendored from RAPIDS. I relicensed these files to BSD-2 to match the rest of the repo. There's no intellectual property concerns or patentability in these files. commit fba4a1e Author: brandon-b-miller <[email protected]> Date: Wed Aug 20 18:29:11 2025 -0500 Fix NRT stats tests when MVC is required and add them to thirdparty (NVIDIA#406) This PR fixes an oversight where our memsys module isn't LTO'ed by default. This shakes out in cuDF's testing under certain MVC scenarios, and adds those tests here. We probably want to incorporate changes from rapidsai/cudf#19730 (review) here too, so I'll wait to merge this. commit 5bd0885 Author: Graham Markall <[email protected]> Date: Wed Aug 20 19:44:52 2025 +0100 Fix NVIDIA#409: Vendor `dead_branch_prune()` (NVIDIA#411) This incorporates the fix from numba/numba#9758, which is required for reductions to be correct when Numba-CUDA is used with Numba 0.60. Fixes NVIDIA#409 commit ac2d459 Author: Bradley Dice <[email protected]> Date: Tue Aug 19 16:25:27 2025 -0500 Test oldest supported dependencies (NVIDIA#404) This PR adds testing for the oldest supported dependencies. Currently this is limited to `numba==0.60.0`. This depends on NVIDIA#403 to pass CI. I also cleaned up a few places where CUDA 11 was being referenced in test scripts. This is no longer needed. commit c414204 Author: Michael Droettboom <[email protected]> Date: Tue Aug 19 14:41:07 2025 -0400 Update CUDA versions in CONTRIBUTING.md (NVIDIA#408) It looks like we dropped CUDA 11 and added CUDA 13 some time ago, but these docs didn't get updated.  commit 3f8e13a Author: Asher Mancinelli <[email protected]> Date: Mon Aug 18 10:08:16 2025 -0700 Vendor in UnsupportedBytecodeError (NVIDIA#403) Vendors in UnsuportedBytecodeError and changes imports to use other utilities that have already been vendored in. _lazy_pformat was also needed when building with an older numba, which was already vendored in, so the imports have been changed.  commit d42ee2c Author: brandon-b-miller <[email protected]> Date: Mon Aug 18 09:56:00 2025 -0500 Bump thirdparty test version (NVIDIA#402) Because the issue fixed by rapidsai/cudf@dde047f exists on cudf/25.08, bumping to the 25.08 stable release incurs the errors this commit addresses using our main branch today. To fix this, we’ll need to switch to nightlies for a RAPIDS release cycle. This situation should stabilize after the next rapids release assuming we don’t have a similar situation where changes in numba-cuda expose a latent bug that requires build time fixes in cuDF. Normally I'd suggest just waiting until the next stable cuDF release, but the 25.08 release specifically added the most significant changes to extension usage in a long time and I think it's worth testing a cuDF version that includes those updates with the changes we make here sooner rather than later. commit cd74e37 Author: Atmn Patel <[email protected]> Date: Mon Aug 18 05:15:45 2025 -0500 Vendor in PassManager, typed and untyped pass, and related helpers for CUDA-specific customization (NVIDIA#330) This change vendors in the Pass Manager to ease future CUDA-specific optimizations. This requires also vendoring in all of the passes in the default and Numba CUDA pipelines to maintain the pass registry, as well as the compiler lock machinery. --------- Signed-off-by: Atmn Patel <[email protected]> Co-authored-by: Graham Markall <[email protected]> commit 0173219 Author: jiel-nv <[email protected]> Date: Mon Aug 18 02:09:23 2025 -0700 Fix correctness issue in debug mode code generation for dynamic types. (NVIDIA#305) The polymorphic variables do not need zero finalization after the usage of each variation, since they share the same memory in the same artificial debug union. The cleanup work is done in the last basic block of the function. Also, do not use the cached temporaries as the pointer for the variations, they need to be regenerated and type casted from the debug union on each load/store. The correctness test added in this patch produces wrong result with existing code, as below, [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] with this patch, the result will be produced correctly as expected, [1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] --------- Co-authored-by: Graham Markall <[email protected]> commit 8eadd36 Author: Asher Mancinelli <[email protected]> Date: Fri Aug 15 15:57:30 2025 -0700 Vendor in additional debug info tests (NVIDIA#364) Adapted more tests in numba core to work with numba-cuda's `cuda.jit`. --------- Co-authored-by: Graham Markall <[email protected]> commit 69f7918 Author: Graham Markall <[email protected]> Date: Fri Aug 15 22:27:10 2025 +0100 Bump version to 0.19.0 (NVIDIA#400) - Support CUDA 13, drop support for CUDA 11 (NVIDIA#370) - Fix package clobbering in thirdparty CI tests (NVIDIA#395) - Make device name consistent between bindings (NVIDIA#382) - [NFC] Document development dependencies and dependency groups (NVIDIA#345) - Change repo URLs in pyproject.toml to point to new URL (NVIDIA#389) - Vendor-in IR utilities for CUDA-specific refactoring (NVIDIA#377) - remove uses of rapids-configure-conda-channels (NVIDIA#372) - Relax subclass check in vendored in typing templates (NVIDIA#371) - Add locals attribute to _FunctionCompiler (NVIDIA#381) - Fix CI for CUDA 13 drivers (NVIDIA#385) - Vendor in Bytecode Interpreter (NVIDIA#355) - Add code coverage CI job (NVIDIA#367) - [Refactor] Vendor in Typing Templates (NVIDIA#360) - [Refactor][NFC] Vendor in TestCase and other testing helpers for CUDA-specific changes (NVIDIA#363) - Split out debuginfo tests for types (NVIDIA#366) - [Refactor][NFC] Vendor in Itanium Mangler for future CUDA-specific changes (NVIDIA#362) - [Refactor][NFC] Vendor in sigutils for CUDA-specific changes (NVIDIA#361) - Remove remaining references/dependencies on `pynvjitlink` (NVIDIA#357) commit 4a0487e Author: brandon-b-miller <[email protected]> Date: Fri Aug 15 12:50:09 2025 -0500 Require `cuda-bindings` `12.9.1` (NVIDIA#387) `cuda-bindings` `12.9.1` is built without monitoring, which avoids cython/cython#7050. This makes it the only safe version since the nvidia binding is used now by default. This PR requires the patched packages. --------- Co-authored-by: Leo Fang <[email protected]> Co-authored-by: Graham Markall <[email protected]> commit 5d1bbd4 Author: Graham Markall <[email protected]> Date: Fri Aug 15 17:31:02 2025 +0100 Support CUDA 13, drop support for CUDA 11 (NVIDIA#370) Required changes: - Test third party libraries with CUDA 12 only - Update matrix: - Drop CUDA 11 - Change 12.8.0 to 12.9.1 - Add CUDA 13 - Only run apt commands on Ubuntu - this was coincidentally handled by CTK 11 vs 12, where 11 ran on Rocky Linux and 12 on Ubuntu - Remove `CTK_CURAND_VMAP` - this is no longer used, and the cuRAND installation is handled as a dependency of the numba-cuda package. - Updates documentation to drop references to CTK 11, and add references to CTK 13. - MVC is handled automatically, so most of the documentation for it is dropped. - Remove code relating to CTK 11 only. - Add handling for CUDA 13 paths when using the ctypes binding - Prevent `NVRTC()` objects being creating when using the NVIDIA cuda-python bindings, and replace uses of it with . The `NVRTC()` class directly opens the NVRTC DLL / SO, so it should not be used in conjunction with the cuda-python bindings. - In test binaries, the lowest CC we should generate code for is 7.5, because this is the minimum supported by CTK 13. - The CCCL include path has changed in CUDA 13, so we add that to the include path for test binary generation. --------- Co-authored-by: Keith Kraus <[email protected]> Co-authored-by: Bradley Dice <[email protected]> commit c3ca630 Author: brandon-b-miller <[email protected]> Date: Fri Aug 15 10:09:38 2025 -0500 Fix package clobbering in thirdparty CI tests (NVIDIA#395) In our thirdparty CI tests, installation of the cuDF package is clobbering the branch version of numba-cuda we actually want to test. Ran into this when cuDF failed to import locally using `main` and trying to track down why it didn't show up in CI. commit 47731d5 Author: brandon-b-miller <[email protected]> Date: Wed Aug 13 11:07:30 2025 -0500 Make device name consistent between bindings (NVIDIA#382) Closes NVIDIA#375 --------- Co-authored-by: Leo Fang <[email protected]> commit be32e90 Author: Asher Mancinelli <[email protected]> Date: Mon Aug 11 13:47:25 2025 -0700 [NFC] Document development dependencies and dependency groups (NVIDIA#345) Follow-up to 1a69197, running the following should pull all the dependencies needed to develop numba-cuda for a given cuda version.  commit c210c96 Author: Gabe R. <[email protected]> Date: Mon Aug 11 06:29:11 2025 -0700 Change repo URLs in pyproject.toml to point to new URL (NVIDIA#389) The pyproject.toml files still pointed to the old repository URL from before it was transferred to NVIDIA. I've fixed that. commit a932a0c Author: Asher Mancinelli <[email protected]> Date: Fri Aug 8 08:15:23 2025 -0700 Vendor-in IR utilities for CUDA-specific refactoring (NVIDIA#377) Vendor in the numba core IR utilities. There were several unused variables in this component; I removed them as I found them, and when they were the result of a function call, the function call was preserved so I believe the original behavior has been preserved. NOTE: A significant portion of this module is covered by neither our numba-cuda tests nor the upstream `test_ir_utils.py` test. We may be able to trim this module significantly. NOTE: the original copyright in ir_utils.py has been preserved. Numba commit 262b49a3e was used. --------- Co-authored-by: Graham Markall <[email protected]>

ashermancinelli · 2025-09-05T14:50:59Z

There are three failing tests with this patch, and I believe they can be traced back to passes that are run through the typing system. I hit assertion errors like this in our test suite:

pytest -s numba_cuda/numba/cuda/tests/cudapy/test_compiler.py::TestCompile::test_functioncompiler_locals
...
    def __init__(self, scope, name, loc):
        # NOTE: Use of scope=None should be removed.
        assert scope is None or isinstance(scope, Scope)
        assert isinstance(name, str)
>       assert isinstance(loc, Loc)
               ^^^^^^^^^^^^^^^^^^^^
E       AssertionError

numba_cuda/numba/cuda/core/ir.py:1174: AssertionError

When I print the traceback inside the constructor of numba.ir.Loc by editing the upstream numba.core.ir in my local environment, I see the following traceback:

  File "./numba_cuda/numba/cuda/tests/cudapy/test_compiler.py", line 262, in test_functioncompiler_locals
  File "./numba_cuda/numba/cuda/decorators.py", line 207, in _jit
  File "./.venv-3.12/lib/python3.12/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
  File "./numba_cuda/numba/cuda/dispatcher.py", line 1296, in compile
  File "./.venv-3.12/lib/python3.12/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
  File "./numba_cuda/numba/cuda/dispatcher.py", line 145, in __init__
  File "./.venv-3.12/lib/python3.12/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
  File "./numba_cuda/numba/cuda/compiler.py", line 745, in compile_cuda
  File "./numba_cuda/numba/cuda/compiler.py", line 586, in compile_extra
  File "./numba_cuda/numba/cuda/core/compiler.py", line 157, in compile_extra
  File "./numba_cuda/numba/cuda/core/compiler.py", line 225, in _compile_bytecode
  File "./numba_cuda/numba/cuda/core/compiler.py", line 193, in _compile_core
  File "./numba_cuda/numba/cuda/core/compiler_machinery.py", line 385, in run
  File "./.venv-3.12/lib/python3.12/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
  File "./numba_cuda/numba/cuda/core/compiler_machinery.py", line 337, in _runPass
  File "./numba_cuda/numba/cuda/core/compiler_machinery.py", line 291, in check
  File "./numba_cuda/numba/cuda/core/typed_passes.py", line 144, in run_pass
  File "./numba_cuda/numba/cuda/core/typed_passes.py", line 123, in type_inference_stage
  File "./numba_cuda/numba/cuda/core/typeinfer.py", line 1146, in propagate
  File "./numba_cuda/numba/cuda/core/typeinfer.py", line 191, in propagate
  File "./numba_cuda/numba/cuda/core/typeinfer.py", line 619, in __call__
  File "./numba_cuda/numba/cuda/core/typeinfer.py", line 642, in resolve
  File "./numba_cuda/numba/cuda/core/typeinfer.py", line 1682, in resolve_call
  File "./numba_cuda/numba/cuda/typing/context.py", line 197, in resolve_function_type
  File "./numba_cuda/numba/cuda/typing/context.py", line 249, in _resolve_user_function_type
  File "./.venv-3.12/lib/python3.12/site-packages/numba/core/types/functions.py", line 308, in get_call_type
  File "./.venv-3.12/lib/python3.12/site-packages/numba/core/typing/templates.py", line 358, in apply
  File "./.venv-3.12/lib/python3.12/site-packages/numba/core/typing/templates.py", line 660, in generic
  File "./.venv-3.12/lib/python3.12/site-packages/numba/core/inline_closurecall.py", line 541, in run_untyped_passes
  File "./.venv-3.12/lib/python3.12/site-packages/numba/core/compiler_machinery.py", line 356, in run
  File "./.venv-3.12/lib/python3.12/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
  File "./.venv-3.12/lib/python3.12/site-packages/numba/core/compiler_machinery.py", line 311, in _runPass
  File "./.venv-3.12/lib/python3.12/site-packages/numba/core/compiler_machinery.py", line 272, in check
  File "./.venv-3.12/lib/python3.12/site-packages/numba/core/untyped_passes.py", line 86, in run_pass
  File "./.venv-3.12/lib/python3.12/site-packages/numba/core/interpreter.py", line 1406, in interpret
  File "./.venv-3.12/lib/python3.12/site-packages/numba/core/interpreter.py", line 1538, in _iter_inst
  File "./.venv-3.12/lib/python3.12/site-packages/numba/core/ir.py", line 204, in with_lineno
  File "./.venv-3.12/lib/python3.12/site-packages/numba/core/ir.py", line 39, in __init__

Notice the first time the traceback enters the upstream numba package: numba_cuda/numba/cuda/typing/context.py calls into numba.core.types.functions which eventually calls into untyped_passes.py which uses the upstream numba.core.ir.

Becuase the IR module is interdependent with so many other modules, it will be vendored in without being fully adopted inside numba-cuda. Modules inside numba-cuda will not use numba.cuda.core.ir yet. In a subsequent PR when other components are vendored in, we may update these imports. For now, its only use is in test_ir.py.

ashermancinelli · 2025-10-14T17:26:44Z

/ok to test ad24c3c

- Add support for cache-hinted load and store operations (NVIDIA#587) - Add more thirdparty tests (NVIDIA#586) - Add sphinx-lint to pre-commit and fix errors (NVIDIA#597) - Add DWARF variant part support for polymorphic variables in CUDA debug info (NVIDIA#544) - chore: clean up dead workaround for unavailable `lru_cache` (NVIDIA#598) - chore(docs): format types docs (NVIDIA#596) - refactor: decouple `Context` from `Stream` and `Event` objects (NVIDIA#579) - Fix freezing in of constant arrays with negative strides (NVIDIA#589) - Update tests to accept variants of generated PTX (NVIDIA#585) - refactor: replace device functionality with `cuda.core` APIs (NVIDIA#581) - Move frontend tests to `cudapy` namespace (NVIDIA#558) - Generalize the concurrency group for main merges (NVIDIA#582) - ci: move pre-commit checks to pre commit action (NVIDIA#577) - chore(pixi): set up doc builds; remove most `build-conda` dependencies (NVIDIA#574) - ci: ensure that python version in ci matches matrix (NVIDIA#575) - Fix the `cuda.is_supported_version()` API (NVIDIA#571) - Fix checks on main (NVIDIA#576) - feat: add `math.nextafter` (NVIDIA#543) - ci: replace conda testing with pixi (NVIDIA#554) - [CI] Run PR workflow on merge to main (NVIDIA#572) - Propose Alternative Module Path for `ext_types` and Maintain `numba.cuda.types.bfloat16` Import API (NVIDIA#569) - test: enable fail-on-warn and clean up resulting failures (NVIDIA#529) - [Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific changes (NVIDIA#565) - Fix registration with Numba, vendor MakeFunctionToJITFunction tests (NVIDIA#566) - [Refactor][NFC][Cleanups] Update imports to upstream numba to use the numba.cuda modules (NVIDIA#561) - test: refactor process-based tests to use concurrent futures in order to simplify tests (NVIDIA#550) - test: revert back to ipc futures that await each iteration (NVIDIA#564) - chore(deps): move to self-contained pixi.toml to avoid mixed-pypi-pixi environments (NVIDIA#551) - [Refactor][NFC] Vendor-in errors for future CUDA-specific changes (NVIDIA#534) - Remove dependencies on target_extension for CUDA target (NVIDIA#555) - Relax the pinning to `cuda-core` to allow it floating across minor releases (NVIDIA#559) - [WIP] Port numpy reduction tests to CUDA (NVIDIA#523) - ci: add timeout to avoid blocking the job queue (NVIDIA#556) - Handle `cuda.core.Stream` in driver operations (NVIDIA#401) - feat: add support for `math.exp2` (NVIDIA#541) - Vendor in types and datamodel for CUDA-specific changes (NVIDIA#533) - refactor: cleanup device constructor (NVIDIA#548) - bench: add cupy to array constructor kernel launch benchmarks (NVIDIA#547) - perf: cache dimension computations (NVIDIA#542) - perf: remove duplicated size computation (NVIDIA#537) - chore(perf): add torch to benchmark (NVIDIA#539) - test: speed up ipc tests by ~6.5x (NVIDIA#527) - perf: speed up kernel launch (NVIDIA#510) - perf: remove context threading in various pointer abstractions (NVIDIA#536) - perf: reduce the number of `__cuda_array_interface__` accesses (NVIDIA#538) - refactor: remove unnecessary custom map and set implementations (NVIDIA#530) - [Refactor][NFC] Vendor-in vectorize decorators for future CUDA-specific changes (NVIDIA#513) - test: add benchmarks for kernel launch for reproducibility (NVIDIA#528) - test(pixi): update pixi testing command to work with the new `testing` directory (NVIDIA#522) - refactor: fully remove `USE_NV_BINDING` (NVIDIA#525) - Draft: Vendor in the IR module (NVIDIA#439) - pyproject.toml: add search path for Pyrefly (NVIDIA#524) - Vendor in numba.core.typing for CUDA-specific changes (NVIDIA#473) - Use numba.config when available, otherwise use numba.cuda.config (NVIDIA#497) - [MNT] Drop NUMBA_CUDA_USE_NVIDIA_BINDING; always use cuda.core and cuda.bindings as fallback (NVIDIA#479) - Vendor in dispatcher, entrypoints, pretty_annotate for CUDA-specific changes (NVIDIA#502) - build: allow parallelization of nvcc testing builds (NVIDIA#521) - chore(dev-deps): add pixi (NVIDIA#505) - Vendor the imputils module for CUDA refactoring (NVIDIA#448) - Don't use `MemoryLeakMixin` for tests that don't use NRT (NVIDIA#519) - Switch back to stable cuDF release in thirdparty tests (NVIDIA#518) - Updating .gitignore with binaries in the `testing` folder (NVIDIA#516) - Remove some unnecessary uses of ContextResettingTestCase (NVIDIA#507) - Vendor in _helperlib cext for CUDA-specific changes (NVIDIA#512) - Vendor in typeconv for future CUDA-specific changes (NVIDIA#499) - [Refactor][NFC] Vendor-in numba.cpython modules for future CUDA-specific changes (NVIDIA#493) - [Refactor][NFC] Vendor-in numba.np modules for future CUDA-specific changes (NVIDIA#494) - Make the CUDA target the default for CUDA overload decorators (NVIDIA#511) - Remove C extension loading hacks (NVIDIA#506) - Ensure NUMBA can manipulate memory from CUDA graphs before the graph is launched (NVIDIA#437) - [Refactor][NFC] Vendor-in core Numba analysis utils for CUDA-specific changes (NVIDIA#433) - Fix Bf16 Test OB Error (NVIDIA#509) - Vendor in components from numba.core.runtime for CUDA-specific changes (NVIDIA#498) - [Refactor] Vendor in _dispatcher, _devicearray, mviewbuf C extension for CUDA-specific customization (NVIDIA#373) - [MNT] Managed UM memset fallback and skip CUDA IPC tests on WSL2 (NVIDIA#488) - Improve debug value range coverage (NVIDIA#461) - Add `compile_all` API (NVIDIA#484) - Vendor in core.registry for CUDA-specific changes (NVIDIA#485) - [Refactor][NFC] Vendor in numba.misc for CUDA-specific changes (NVIDIA#457) - Vendor in optional, boxing for CUDA-specific changes, fix dangling imports (NVIDIA#476) - [test] Remove dependency on cpu_target (NVIDIA#490) - Change dangling imports of numba.core.lowering to numba.cuda.lowering (NVIDIA#475) - [test] Use numpy's tolerance for float16 (NVIDIA#491) - [Refactor][NFC] Vendor-in numba.extending for future CUDA-specific changes (NVIDIA#466) - [Refactor][NFC] Vendor-in more cpython registries for future CUDA-specific changes (NVIDIA#478)

- Add support for cache-hinted load and store operations (#587) - Add more thirdparty tests (#586) - Add sphinx-lint to pre-commit and fix errors (#597) - Add DWARF variant part support for polymorphic variables in CUDA debug info (#544) - chore: clean up dead workaround for unavailable `lru_cache` (#598) - chore(docs): format types docs (#596) - refactor: decouple `Context` from `Stream` and `Event` objects (#579) - Fix freezing in of constant arrays with negative strides (#589) - Update tests to accept variants of generated PTX (#585) - refactor: replace device functionality with `cuda.core` APIs (#581) - Move frontend tests to `cudapy` namespace (#558) - Generalize the concurrency group for main merges (#582) - ci: move pre-commit checks to pre commit action (#577) - chore(pixi): set up doc builds; remove most `build-conda` dependencies (#574) - ci: ensure that python version in ci matches matrix (#575) - Fix the `cuda.is_supported_version()` API (#571) - Fix checks on main (#576) - feat: add `math.nextafter` (#543) - ci: replace conda testing with pixi (#554) - [CI] Run PR workflow on merge to main (#572) - Propose Alternative Module Path for `ext_types` and Maintain `numba.cuda.types.bfloat16` Import API (#569) - test: enable fail-on-warn and clean up resulting failures (#529) - [Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific changes (#565) - Fix registration with Numba, vendor MakeFunctionToJITFunction tests (#566) - [Refactor][NFC][Cleanups] Update imports to upstream numba to use the numba.cuda modules (#561) - test: refactor process-based tests to use concurrent futures in order to simplify tests (#550) - test: revert back to ipc futures that await each iteration (#564) - chore(deps): move to self-contained pixi.toml to avoid mixed-pypi-pixi environments (#551) - [Refactor][NFC] Vendor-in errors for future CUDA-specific changes (#534) - Remove dependencies on target_extension for CUDA target (#555) - Relax the pinning to `cuda-core` to allow it floating across minor releases (#559) - [WIP] Port numpy reduction tests to CUDA (#523) - ci: add timeout to avoid blocking the job queue (#556) - Handle `cuda.core.Stream` in driver operations (#401) - feat: add support for `math.exp2` (#541) - Vendor in types and datamodel for CUDA-specific changes (#533) - refactor: cleanup device constructor (#548) - bench: add cupy to array constructor kernel launch benchmarks (#547) - perf: cache dimension computations (#542) - perf: remove duplicated size computation (#537) - chore(perf): add torch to benchmark (#539) - test: speed up ipc tests by ~6.5x (#527) - perf: speed up kernel launch (#510) - perf: remove context threading in various pointer abstractions (#536) - perf: reduce the number of `__cuda_array_interface__` accesses (#538) - refactor: remove unnecessary custom map and set implementations (#530) - [Refactor][NFC] Vendor-in vectorize decorators for future CUDA-specific changes (#513) - test: add benchmarks for kernel launch for reproducibility (#528) - test(pixi): update pixi testing command to work with the new `testing` directory (#522) - refactor: fully remove `USE_NV_BINDING` (#525) - Draft: Vendor in the IR module (#439) - pyproject.toml: add search path for Pyrefly (#524) - Vendor in numba.core.typing for CUDA-specific changes (#473) - Use numba.config when available, otherwise use numba.cuda.config (#497) - [MNT] Drop NUMBA_CUDA_USE_NVIDIA_BINDING; always use cuda.core and cuda.bindings as fallback (#479) - Vendor in dispatcher, entrypoints, pretty_annotate for CUDA-specific changes (#502) - build: allow parallelization of nvcc testing builds (#521) - chore(dev-deps): add pixi (#505) - Vendor the imputils module for CUDA refactoring (#448) - Don't use `MemoryLeakMixin` for tests that don't use NRT (#519) - Switch back to stable cuDF release in thirdparty tests (#518) - Updating .gitignore with binaries in the `testing` folder (#516) - Remove some unnecessary uses of ContextResettingTestCase (#507) - Vendor in _helperlib cext for CUDA-specific changes (#512) - Vendor in typeconv for future CUDA-specific changes (#499) - [Refactor][NFC] Vendor-in numba.cpython modules for future CUDA-specific changes (#493) - [Refactor][NFC] Vendor-in numba.np modules for future CUDA-specific changes (#494) - Make the CUDA target the default for CUDA overload decorators (#511) - Remove C extension loading hacks (#506) - Ensure NUMBA can manipulate memory from CUDA graphs before the graph is launched (#437) - [Refactor][NFC] Vendor-in core Numba analysis utils for CUDA-specific changes (#433) - Fix Bf16 Test OB Error (#509) - Vendor in components from numba.core.runtime for CUDA-specific changes (#498) - [Refactor] Vendor in _dispatcher, _devicearray, mviewbuf C extension for CUDA-specific customization (#373) - [MNT] Managed UM memset fallback and skip CUDA IPC tests on WSL2 (#488) - Improve debug value range coverage (#461) - Add `compile_all` API (#484) - Vendor in core.registry for CUDA-specific changes (#485) - [Refactor][NFC] Vendor in numba.misc for CUDA-specific changes (#457) - Vendor in optional, boxing for CUDA-specific changes, fix dangling imports (#476) - [test] Remove dependency on cpu_target (#490) - Change dangling imports of numba.core.lowering to numba.cuda.lowering (#475) - [test] Use numpy's tolerance for float16 (#491) - [Refactor][NFC] Vendor-in numba.extending for future CUDA-specific changes (#466) - [Refactor][NFC] Vendor-in more cpython registries for future CUDA-specific changes (#478)

ashermancinelli self-assigned this Aug 28, 2025

ashermancinelli added the 2 - In Progress Currently a work in progress label Aug 28, 2025

ashermancinelli marked this pull request as draft August 28, 2025 15:12

This was referenced Aug 28, 2025

[Refactor][NFC] Vendor-in Inline Closure Call Pass for future CUDA-specific changes #434

Closed

[Refactor][NFC] Vendor-in cpu_options for future CUDA-specific changes #440

Merged

ashermancinelli added 0 - Blocked Cannot progress due to external reasons and removed 2 - In Progress Currently a work in progress labels Aug 29, 2025

ashermancinelli requested a review from gmarkall September 5, 2025 14:45

ashermancinelli force-pushed the ajm/vendor-ir branch from 15f5c03 to ad24c3c Compare October 14, 2025 17:26

ashermancinelli marked this pull request as ready for review October 14, 2025 17:51

ashermancinelli added 3 - Ready for Review Ready for review by team and removed 0 - Blocked Cannot progress due to external reasons labels Oct 14, 2025

gmarkall approved these changes Oct 15, 2025

View reviewed changes

gmarkall added 5 - Ready to merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Oct 15, 2025

gmarkall merged commit 64fe5be into NVIDIA:main Oct 15, 2025
70 checks passed

gmarkall mentioned this pull request Nov 20, 2025

Bump version to 0.21.0 #602

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Draft: Vendor in the IR module #439

Draft: Vendor in the IR module #439

Uh oh!

ashermancinelli commented Aug 28, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Aug 28, 2025

Uh oh!

copy-pr-bot bot commented Aug 28, 2025

Uh oh!

ashermancinelli commented Sep 4, 2025

Uh oh!

ashermancinelli commented Sep 5, 2025

Uh oh!

ashermancinelli commented Oct 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Draft: Vendor in the IR module #439

Draft: Vendor in the IR module #439

Uh oh!

Conversation

ashermancinelli commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Aug 28, 2025

Uh oh!

copy-pr-bot bot commented Aug 28, 2025

Uh oh!

ashermancinelli commented Sep 4, 2025

Uh oh!

ashermancinelli commented Sep 5, 2025

Uh oh!

ashermancinelli commented Oct 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ashermancinelli commented Aug 28, 2025 •

edited

Loading