Skip to content

Conversation

@ashermancinelli
Copy link
Contributor

@ashermancinelli ashermancinelli commented Aug 28, 2025

The plan-of-record as discussed offline with @gmarkall is to vendor this module piecemeal; modules in numba-cuda will continue to use numba until we have vendored in additional modules (such as types and typing).


The IR module is near the root of the import graph of numba-cuda, so it has many dependencies. Several other modules must be vendored in before this can be merged, so this will remain a draft for now. The biggest dependency is currently #373. The purpose of this PR being open prior to being ready for review is to give the team a sense of how close we are to having this module vendored in.

Depends-on:

@ashermancinelli ashermancinelli self-assigned this Aug 28, 2025
@ashermancinelli ashermancinelli added the 2 - In Progress Currently a work in progress label Aug 28, 2025
@copy-pr-bot
Copy link

copy-pr-bot bot commented Aug 28, 2025

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@ashermancinelli ashermancinelli marked this pull request as draft August 28, 2025 15:12
@copy-pr-bot
Copy link

copy-pr-bot bot commented Aug 28, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@ashermancinelli ashermancinelli added 0 - Blocked Cannot progress due to external reasons and removed 2 - In Progress Currently a work in progress labels Aug 29, 2025
gmarkall pushed a commit that referenced this pull request Sep 1, 2025
#440)

Numba-CUDA relies on the upstream Numba cpu_options to configure itself,
this change vendors in the class, so we can later remove the
ParallelOptions once #439 is merged.
@ashermancinelli
Copy link
Contributor Author

I'll re-evaluate if this is blocked on additional modules, or remove the draft status if we can really merge it.

kkraus14 added a commit to kkraus14/numba-cuda that referenced this pull request Sep 4, 2025
commit 9eb9e15
Author: Asher Mancinelli <[email protected]>
Date:   Thu Sep 4 08:01:00 2025 -0700

    Vendor in the CPU module for CUDA refactoring (NVIDIA#447)

    The CPU module has no standalone tests and is likely well-covered by
    existing tests to the extent it is used. It will be used by the registry
    module once that is vendored in.

    Depends-on:
    - NVIDIA#440

    ---------

    Co-authored-by: Graham Markall <[email protected]>

commit 24967a0
Author: brandon-b-miller <[email protected]>
Date:   Thu Sep 4 09:37:45 2025 -0500

    Close `rtsys` singleton when `cuda.close()` is called (NVIDIA#455)

    This PR fixes an issue where `cuda.close` invalidated the context
    containing the current memsys module without also invalidating the
    `_Runtime` instance that ostensibly owned that module. This caused
    reference errors later on when trying to use that module.

    Closes NVIDIA#453

commit 6db9a70
Author: Atmn Patel <[email protected]>
Date:   Thu Sep 4 09:19:08 2025 -0500

    [Refactor][NFC] Vendor-in numba.core.unsafe for future CUDA-specific changes (NVIDIA#460)

    We primarily use the exception handling mechanisms in numba.core.unsafe
    right now, but we will also use it in cpython.hashing once we've ported
    that over. I've removed the nrt support since we won't end up porting
    over it's uses from upstream Numba.

    ---------

    Co-authored-by: Graham Markall <[email protected]>

commit 6828bee
Author: Atmn Patel <[email protected]>
Date:   Thu Sep 4 08:16:52 2025 -0500

    [Cleanups] Fix dangling imports, remove residual parfor handling (NVIDIA#458)

commit f856b37
Author: Asher Mancinelli <[email protected]>
Date:   Thu Sep 4 05:43:33 2025 -0700

    Vendor in the transforms module (NVIDIA#444)

    There are no standalone tests for the transforms module, so none have
    been vendored.

commit 230b02c
Author: Graham Markall <[email protected]>
Date:   Thu Sep 4 12:45:31 2025 +0100

    Compatibility fixes for Numba 0.62 (NVIDIA#427)

    - `newpassmanagers` need to be used with llvmlite >= 0.45 (the legacy
    pass manager is removed with the move to LLVM 20)
    - The `Array._allocate()` overload was made CPU-only with
    numba/numba#10185, so we need one for the CUDA
    target.

commit a2bf0e1
Author: Atmn Patel <[email protected]>
Date:   Wed Sep 3 11:06:10 2025 -0500

    [Refactor][NFC] Remove `skip_if_no_external_compiler()` (NVIDIA#449)

    This removes the only function we have that uses pycc, which itself is not required.

commit 9a09f13
Author: Asher Mancinelli <[email protected]>
Date:   Wed Sep 3 00:55:22 2025 -0700

    Vendor the base module for CUDA-specific refactoring (NVIDIA#445)

    There are no standalone tests for this module. It is imported from the
    cpu and caching modules upstream, but we have not pulled in the cpu
    module and the caching module has already been vendored but BaseContext
    was not (as it was unused by the upstream caching module anyways).

    There are some tests which import this module, but we have versions of
    the same tests and they do not use the base module:

    * numba/numba/tests/test_target_overloadselector.py
    * numba/numba/tests/test_ufuncs.py
    * numba/numba/tests/test_target_extension.py

    ---------

    Co-authored-by: Graham Markall <[email protected]>
    Co-authored-by: Graham Markall <[email protected]>

commit 65705ad
Author: Asher Mancinelli <[email protected]>
Date:   Tue Sep 2 15:36:07 2025 -0700

    Vendor in the typeinfer module (NVIDIA#442)

    Some of the tests were specific to the CPU, so I did not port them to
    GPU kernels (many tests could still be preserved though).

    ---------

    Co-authored-by: Graham Markall <[email protected]>

commit 49f3d88
Author: Asher Mancinelli <[email protected]>
Date:   Tue Sep 2 03:34:39 2025 -0700

    Vendor in the rewrites module (NVIDIA#438)

    Vendor in the rewrites module for CUDA-specific refactoring.

commit 1bed09b
Author: Asher Mancinelli <[email protected]>
Date:   Tue Sep 2 03:21:20 2025 -0700

    Vendor in the pythonapi module (NVIDIA#446)

commit 06e6103
Author: Atmn Patel <[email protected]>
Date:   Tue Sep 2 04:10:01 2025 -0500

    [Refactor][NFC] Vendor in testing support utils for CUDA-specific changes (NVIDIA#432)

    This change vendors in numba.testing.support, specifically the Mixins
    that we use for testing for memory leaks and checking warnings.

    Notes from review: It's a little non-obvious why the change to the simulator's
    `RTSys` was needed. It is because the `BaseUFuncTest` switched to the
    Numba-CUDA `MemoryLeakMixin` which initializes the Numba-CUDA `rtsys`. Prior to
    this PR, it was erroneously using the CPU-based `MemoryLeakMixin`.

commit 2c2578d
Author: Asher Mancinelli <[email protected]>
Date:   Mon Sep 1 07:40:03 2025 -0700

    Vendor the inline-closure-call module (NVIDIA#443)

    This module has no standalone tests. All imports have been updated.

    Notes from review follow:

    It was expected that making the following change:

    ```diff
    iff --git a/numba_cuda/numba/cuda/core/ir_utils.py b/numba_cuda/numba/cuda/core/ir_utils.py
    index a2ea996..d48e2dd 100644
    --- a/numba_cuda/numba/cuda/core/ir_utils.py
    +++ b/numba_cuda/numba/cuda/core/ir_utils.py
    @@ -1990,7 +1990,7 @@ def get_ir_of_code(glbls, fcode):
         inline_pass = numba.core.inline_closurecall.InlineClosureCallPass(
             ir, numba.cuda.core.options.ParallelOptions(False), swapped
         )
    -    inline_pass.run()
    +    #inline_pass.run()

         # TODO: DO NOT ADD MORE THINGS HERE!
         # If adding more things here is being contemplated, it really is time to
    ```

    would cause an example like this to fail to run:

    ```python
    @cuda.jit()
    def f():
        def k(x):
            return x + 1
        print(k(4))

    f[1, 1]()
    ```

    but it didn't. It seems like the main reason this was added was for parfors, and
    for stencil support. I (Graham) feel 90% sure this isn't doing anything for the
    CUDA target, but would want to check a bit more thoroughly before we remove this
    entirely (sometime in the future).

    There are some imports from `numba.core.stencil` in `inline_closurecall.py` -
    for dealing with these in future we should probably delete / make a no-op the
    code referencing it, rather than trying to bring that in.

    ---------

    Co-authored-by: Graham Markall <[email protected]>

commit a20796f
Author: Vijay Kandiah <[email protected]>
Date:   Mon Sep 1 08:30:31 2025 -0500

    [Refactor] Vendor in typing.npydecl, typing.BaseContext, typing.Context for CUDA-specific changes (NVIDIA#399)

    This PR vendors in few more classes (`typing.npydecl,
    typing.BaseContext, typing.Context`) from `numba.core.typing` for
    CUDA-specific customization.

commit db7aabe
Author: Atmn Patel <[email protected]>
Date:   Mon Sep 1 08:28:09 2025 -0500

    [Refactor][NFC] Vendor-in tracing module for future CUDA-specific changes (NVIDIA#441)

    We use the `event` from this module, and we might want to implement CUDA
    target specific tracing logic.

commit e9eb4df
Author: Atmn Patel <[email protected]>
Date:   Mon Sep 1 08:19:09 2025 -0500

    [Refactor][NFC] Vendor-in cpu_options for future CUDA-specific changes (NVIDIA#440)

    Numba-CUDA relies on the upstream Numba cpu_options to configure itself,
    this change vendors in the class, so we can later remove the
    ParallelOptions once NVIDIA#439 is merged.

commit 15fdfbc
Author: lakshayg <[email protected]>
Date:   Fri Aug 29 12:02:05 2025 -0700

    Regenerate float16 bindings (NVIDIA#436)

    This PR is based on NVIDIA#265 and merely regenerates the bindings
    after all the features requested from Numbast were implemented.

commit 93404d2
Author: Vijay Kandiah <[email protected]>
Date:   Fri Aug 29 03:25:27 2025 -0500

    Vendor in debuginfo, utils, cffi_utils, enumdecl for CUDA-specific changes (NVIDIA#398)

    This PR vendors in `debuginfo, utils` from `numba.core`, and
    `cffi_utils, enumdecl` from `numba.core.typing` for CUDA-specific
    customization. Additionally fixes a dangling reference to
    `numba.tests.support.TestCase` by changing it to
    `numba.cuda.tests.support.TestCase`.

commit 56a19f1
Author: Atmn Patel <[email protected]>
Date:   Thu Aug 28 09:47:23 2025 -0500

    [Refactor] Vendor in cpython math impls for CUDA-specific customization (NVIDIA#378)

    This moves the cpython func impls (cmathimpl, mathimpl, and numbers)
    used in our ufunc_db into this repo. I no longer think that this needs
    additional comprehensive testing because the vast majority if not all of
    the impls are already tested within test_ufuncs.py, similar to
    NVIDIA#374.

    ---------

    Co-authored-by: Graham Markall <[email protected]>

commit cff1bf2
Author: Atmn Patel <[email protected]>
Date:   Thu Aug 28 04:15:28 2025 -0500

    [Cleanup] Fix dangling imports (NVIDIA#431)

    This fixes a dangling import to cgutils, removes a use of prange (not
    supported on the CUDA target), and also assumes the legacy Numba type
    system to simplify the code/remove more imports (@gmarkall suggested
    this should be default for now).

commit 75fb24c
Author: Marcus D. Hanwell <[email protected]>
Date:   Wed Aug 27 07:12:16 2025 -0400

    Add Windows CI for CUDA 12 (NVIDIA#396)

    This tests one configuation only, for a start: Windows 11, AMD64, Python
    3.12, CUDA 12.8, using wheels. The required changes are:

    - Copy the `install_gpu_driver.ps1` script from CUDA-Python's Windows
      CI.
    - Add a new job that tests on Windows, with the following minimal set of
      steps, the main ones being:
      - Install the driver
      - Install Python
      - Run the tests
    - Port the test script to Powershell.
    - Some Python code changes are also necessary:
      - Locate static libraries (cudadevrt) and includes from wheels. This
        was never needed before as they could be located on Linux test
        systems.
      - Catch a `RuntimeError` if there's an error in cuda-python loading
        the nvJitLink DLL - the exception it throws is a subclass of this.

commit 69677be
Author: Michael Wang <[email protected]>
Date:   Tue Aug 26 15:19:19 2025 -0700

    Bfloat Test Expects Different PTX Based on CC (NVIDIA#428)

    This PR amends the bfloat16 PTX assertions to consider different PTX for
    different compute capabilities.

    Co-authored-by: Michael Wang <[email protected]>

commit 7c0d300
Author: Atmn Patel <[email protected]>
Date:   Tue Aug 26 17:11:49 2025 -0500

    [Refactor] Vendor in npyfuncs for CUDA-specific customization (NVIDIA#374)

    This moves the func impls used in our ufunc_db into this repo. I no
    longer think that this needs additional comprehensive testing because
    the vast majority if not all of the impls are already tested within
    `test_ufuncs.py`.

commit 4a6f9d5
Author: Atmn Patel <[email protected]>
Date:   Tue Aug 26 12:32:56 2025 -0500

    [Cleanup] Remove deadcode parfor support (NVIDIA#425)

    This removes the majority of the support for parfor handling brought in
    by previous vendoring PRs. The `auto_parallel` option in `Flags` can't
    be removed yet, there are other places where it is checked in upstream
    Numba.

commit 1b8cdaa
Author: Asher Mancinelli <[email protected]>
Date:   Tue Aug 26 07:56:11 2025 -0700

    Vendor in SSA for CUDA-specific refactoring (NVIDIA#417)

    The tests had to undergo some significant refactoring - I tried to
    preserve as much of the original test coverage as I could, but here are
    notes on some tests that I had to remove entirely:
    * Test test_issue5482_objmode_expr_null_lowering was skipped because the
    object mode is not supported for the numba-cuda target.
    * Several tests that used custom compilation pipelines were omitted
    because custom pipelines do not work with numba.cuda.jit. We may want to
    re-add these tests if we add this functionality, but I did not include
    custom pipelines in this PR to keep the focus on vendoring in this
    component and not feature work.

    Otherwise, it was a 1-1 replacement of numba's jit and njit with
    numba.cuda.jit, and return values were replaced with assignments to
    output arrays.

commit bc23f74
Author: brandon-b-miller <[email protected]>
Date:   Tue Aug 26 06:40:32 2025 -0500

    Update thirdparty test locations (NVIDIA#426)

    This PR updates the thirdparty tests to reflect the location of the
    tests in the latest cudf nightly.

commit 9408377
Author: Asher Mancinelli <[email protected]>
Date:   Tue Aug 26 04:08:39 2025 -0700

    Vendor in postproc module for CUDA-specific refactoring (NVIDIA#418)

    This module has few dependencies and no standalone tests, so it has been
    vendored in as-is (with the copyright modifications).

    A few unrelated imports were importing modules from upstream numba that
    we have already vendored in, so I corrected them to use our copy. If we
    would like this PR to be strictly related to the postproc module, I can
    revert these changes.

commit 8851c18
Author: Asher Mancinelli <[email protected]>
Date:   Tue Aug 26 03:32:18 2025 -0700

    Vendor in the consts module for CUDA-specific refactoring (NVIDIA#419)

    consts.py has few dependencies and no standalone tests that I could
    find, so it has simply been vendored in and the existing uses in
    numba-cuda have been updated.

    I noticed untyped_passes was still conditionally importing the
    vendored-in interpreter.py based on the python version, so I updated
    that as well.

commit 2bdefec
Author: Atmn Patel <[email protected]>
Date:   Tue Aug 26 05:28:35 2025 -0500

    [Refactor][NFC] Vendor-in np support utilities for CUDA-specific optimizations (NVIDIA#394)

    This is a minimal import, we may need to revisit this file in the future
    for ufuncs and more complicated numpy array checks/operations.

    ---------

    Co-authored-by: Graham Markall <[email protected]>

commit a62854d
Author: Atmn Patel <[email protected]>
Date:   Tue Aug 26 05:27:14 2025 -0500

    [Refactor][NFC] Vendor in CloudPickler (NVIDIA#380)

commit 5b0c95e
Author: Atmn Patel <[email protected]>
Date:   Tue Aug 26 05:25:24 2025 -0500

    Vendor-In find_lib for CUDA-specific changes (NVIDIA#407)

    This vendors in the small utility class we use to find shared libraries.
    Also fixes some dangling imports caused by the recent vendoring PRs.

commit f9f2ef4
Author: Atmn Patel <[email protected]>
Date:   Tue Aug 26 04:41:54 2025 -0500

    [Refactor][NFC] Vendor in ufunc utility functions, classes for CUDA-specific optimizations (NVIDIA#393)

commit 0f6ea69
Author: brandon-b-miller <[email protected]>
Date:   Fri Aug 22 09:17:07 2025 -0500

    Depend on `cccl` explicitly for `[cuXY]` wheels (NVIDIA#421)

    Adds a header we're missing that provides `cuda/atomic`.

commit e6db7d1
Author: Michael Wang <[email protected]>
Date:   Fri Aug 22 04:25:51 2025 -0700

    Improve Bfloat16 Support in Numba-CUDA (NVIDIA#376)

    This is an overhaul of the bfloat16 support. The PR improves bfloat16 by
    implementing a unified type object to the constructor handle, defining
    proper cast and unify rules of bfloat16 to other numeric types, adding
    *initial* host side support of this type (leveraging ml_dtypes package),
    printing support in kernel, as well as plethora of intrinsics.

    Details:
    1. You may now `print(x)` on device.
    2. You may now use comparison intrinsics, such as heq, hmax, hmax_nan
    etc. on device.
    3. You may also use conversion intrinsics, such as
    `int64_to_bfloat16_rz` etc. on device; bit cast intrinsics, such as
    `int16_as_bfloat16` on device.

    ---------

    Co-authored-by: Michael Wang <[email protected]>

commit c48e213
Author: Asher Mancinelli <[email protected]>
Date:   Thu Aug 21 14:17:46 2025 -0700

    Backport interpreter.py for Python 3.9 (NVIDIA#392)

    Adds Python 3.9 to the supported versions in interpreter.py.

    Similar to NVIDIA#391, I looked at the diff between the current numba
    development branch and this commit to find the biggest differences:
    ```
    commit 2c9f1caa3df941bf7066f7e977a18e122d363d42
    Commit:     kc611 <[email protected]>
    CommitDate: Wed Sep 18 17:43:06 2024 +0530

        Removed references to Python 3.9 within core
    ```

    There were not many; for the most part, Python 3.9 is handled the same
    as 3.10.

    NOTE: I am not concerned about subclassing issues from vendoring in
    `UnsupportedBytecodeError` based on what I see in numba-cuda because I
    do not see any uses of `self.assertRaises(UnsupportedBytecodeError)` or
    explicit catch blocks, which would need to be updated.

    ---------

    Co-authored-by: Graham Markall <[email protected]>

commit d2827b7
Author: Keith Kraus <[email protected]>
Date:   Thu Aug 21 12:51:44 2025 -0400

    Add CLA and third party Numba license (NVIDIA#410)

    Adds a `CLA.md` file as well as hooked up https://cla-assistant.io to
    handle CLA signing for contributors. Additionally adds a `LICENSE.numba`
    file and add relevant section to the `pyproject.toml` file to make sure
    we ship it as part of our package.

commit 702f0ae
Author: Graham Markall <[email protected]>
Date:   Thu Aug 21 17:47:13 2025 +0100

    Remove costly but low-utility assertions (NVIDIA#416)

    I'm not aware of these assertions ever getting triggered by anyone
    making a mistake with registrations, but they do take a lot of time when
    the number of registrations is large - this commit thus removes them.

commit 7d5a762
Author: Copilot <[email protected]>
Date:   Thu Aug 21 16:33:21 2025 +0000

    [FEA] Add SPDX checker pre-commit hook (NVIDIA#415)

    This PR implements an SPDX header checker as a pre-commit hook to ensure
    that all new files include proper SPDX license and copyright
    identifiers, as requested in the feature request.

    ## Implementation

    The solution adapts the existing SPDX checker from the cuda-python
    repository:

    ### Files Added
    - **`toolshed/check_spdx.py`** - The main SPDX validation script that
    checks for both `SPDX-License-Identifier` and `SPDX-FileCopyrightText`
    headers in files
    - **`.spdx-ignore`** - Exclusion patterns for files that don't require
    SPDX headers (documentation, configuration files, build artifacts,
    version files, external headers, etc.)
    - **Updated `.pre-commit-config.yaml`** - Added a local pre-commit hook
    that runs the SPDX checker with the pathspec dependency

    ### Key Features
    - Validates that both required SPDX headers are present in source files
    - Uses gitignore-style pattern matching for file exclusions via pathspec
    - Allows empty files (no SPDX headers required)
    - Provides clear error messages showing which headers are missing from
    which files
    - Integrates seamlessly with the existing pre-commit infrastructure

    ### Exclusions
    The `.spdx-ignore` file excludes appropriate file types that shouldn't
    have SPDX headers:
    - Documentation files (`.md`, `.rst`, `.txt`)
    - Binary and generated files (`.so`, `.dll`, `.pyc`, `.ptx`)
    - Build artifacts (`build/*`, `dist/*`, `*.egg-info/*`)
    - Version files (`numba_cuda/VERSION`)
    - External CUDA SDK headers (`numba_cuda/numba/cuda/include/*/cuda_*.h`,
    `cuda_*.hpp`)
    - GitHub configuration files (`.github/CODEOWNERS`)

    ## Testing

    The implementation has been thoroughly tested:
    - ✅ All existing Python and shell files with SPDX headers pass
    validation
    - ✅ Files listed in `.spdx-ignore` are properly excluded
    - ✅ Files missing SPDX headers are correctly detected and reported
    - ✅ Empty files are allowed without headers
    - ✅ The checker script itself validates successfully
    - ✅ All pre-commit style checks now pass

    This ensures that as new files are added to the repository, they will be
    required to include proper SPDX identifiers, maintaining consistency
    with the existing codebase established in NVIDIA#412.

    Fixes NVIDIA#414.

    <!-- START COPILOT CODING AGENT TIPS -->
    ---

    💡 You can make Copilot smarter by setting up custom instructions,
    customizing its development environment and configuring Model Context
    Protocol (MCP) servers. Learn more [Copilot coding agent
    tips](https://gh.io/copilot-coding-agent-tips) in the docs.

    ---------

    Co-authored-by: copilot-swe-agent[bot] <[email protected]>
    Co-authored-by: kkraus14 <[email protected]>
    Co-authored-by: Keith Kraus <[email protected]>

commit b3db2b6
Author: brandon-b-miller <[email protected]>
Date:   Thu Aug 21 10:42:08 2025 -0500

    Remove `MVCLinker` and straggling uses of / references to `cubinlinker` (NVIDIA#413)

    Removes some cuda 11 related machinery.

commit bef2949
Author: Asher Mancinelli <[email protected]>
Date:   Thu Aug 21 07:37:48 2025 -0700

    Vendor in bytecode.py for CUDA-specific refactoring (NVIDIA#391)

    Vendors in the bytecode module for CUDA-specific refactoring. I am
    working towards vendoring in ir.py, but this has many dependencies and
    many modules use isinstance checks on classes they import from
    numba.core.ir directly, meaning these _users_ of numba.core.ir will all
    need to be vendored in before numba.core.ir itself can be.

    NOTE: This module was adapted to work with Python 3.9 again. I looked at
    the difference between the current numba development branch and [numba
    commit
    2c9f1caa3](numba/numba@2c9f1ca).
    For the most part, Python 3.9 took the same branches as 3.10, so that is
    what I updated. The primary difference was in `get_jump_target`, because
    Python 3.10 changed the addressing of jump instructions.

    ```
    commit 2c9f1caa3df941bf7066f7e977a18e122d363d42
    Commit:     kc611 <[email protected]>
    CommitDate: Wed Sep 18 17:43:06 2024 +0530

        Removed references to Python 3.9 within core
    ```

    ---------

    Co-authored-by: Graham Markall <[email protected]>

commit dfa4911
Author: Vijay Kandiah <[email protected]>
Date:   Thu Aug 21 08:29:52 2025 -0500

    Remove _helperlib dep., implement attempt_nocopy_reshape in python for CUDA-specific changes (NVIDIA#383)

    We are only using the method `attempt_nocopy_reshape` from numba's
    `_helperlib` C extension. This PR introduces a python implementation of
    attempt_nocopy_reshape for CUDA-specific customizations and to remove
    dependency on `_helperlib`.

    Co-authored-by: Graham Markall <[email protected]>

commit 320e5f0
Author: Atmn Patel <[email protected]>
Date:   Thu Aug 21 06:40:21 2025 -0500

    [Refactor][NFC] Vendor in Environment class (NVIDIA#379)

    This change also fixes some imports that weren't updated to Numba CUDA
    from a previously merged PR.

    Co-authored-by: Graham Markall <[email protected]>

commit 3fa01f0
Author: Keith Kraus <[email protected]>
Date:   Wed Aug 20 20:50:25 2025 -0400

    Add SPDX identifiers to all files (NVIDIA#412)

    Adds SPDX identifiers to all files. Converted existing Copyright
    statements to SPDX identifiers and added them where there was no
    existing Copyright statement.

    A few things we should check before merging here:

    - [x] SPDX identifiers were added to `.rst` files via block comments.
    Sphinx also seemingly uses comments for functionality, so we should make
    sure it doesn't break doc rendering and remove the SPDX identifiers if
    it does.
    - [x] Added SPDX identifiers to the `.pth` redirector file. The Python
    documentation (https://docs.python.org/3/library/site.html) claims that
    "Blank lines and lines beginning with `#` are skipped" so this should be
    okay. I believe our CI suite should exercise this already, but if this
    is not the case we should explicitly check it.
    - [x] There were a couple of places that files had Apache 2.0 licenses,
    presumably from being vendored from RAPIDS. I relicensed these files to
    BSD-2 to match the rest of the repo. There's no intellectual property
    concerns or patentability in these files.

commit fba4a1e
Author: brandon-b-miller <[email protected]>
Date:   Wed Aug 20 18:29:11 2025 -0500

    Fix NRT stats tests when MVC is required and add them to thirdparty (NVIDIA#406)

    This PR fixes an oversight where our memsys module isn't LTO'ed by
    default. This shakes out in cuDF's testing under certain MVC scenarios,
    and adds those tests here.

    We probably want to incorporate changes from
    rapidsai/cudf#19730 (review)
    here too, so I'll wait to merge this.

commit 5bd0885
Author: Graham Markall <[email protected]>
Date:   Wed Aug 20 19:44:52 2025 +0100

    Fix NVIDIA#409: Vendor `dead_branch_prune()` (NVIDIA#411)

    This incorporates the fix from numba/numba#9758, which is required for
    reductions to be correct when Numba-CUDA is used with Numba 0.60.

    Fixes NVIDIA#409

commit ac2d459
Author: Bradley Dice <[email protected]>
Date:   Tue Aug 19 16:25:27 2025 -0500

    Test oldest supported dependencies (NVIDIA#404)

    This PR adds testing for the oldest supported dependencies. Currently
    this is limited to `numba==0.60.0`.

    This depends on NVIDIA#403 to pass CI.

    I also cleaned up a few places where CUDA 11 was being referenced in
    test scripts. This is no longer needed.

commit c414204
Author: Michael Droettboom <[email protected]>
Date:   Tue Aug 19 14:41:07 2025 -0400

    Update CUDA versions in CONTRIBUTING.md (NVIDIA#408)

    It looks like we dropped CUDA 11 and added CUDA 13 some time ago, but
    these docs didn't get updated.

    <!--

    Thank you for contributing to numba-cuda :)

    Here are some guidelines to help the review process go smoothly.

    1. Please write a description in this text box of the changes that are
    being
       made.

    2. Please ensure that you have written units tests for the changes
    made/features
       added.

    3. If you are closing an issue please use one of the automatic closing
    words as
    noted here:
    https://help.github.com/articles/closing-issues-using-keywords/

    4. If your pull request is not ready for review but you want to make use
    of the
    continuous integration testing facilities please label it with `[WIP]`.

    5. If your pull request is ready to be reviewed without requiring
    additional
    work on top of it, then remove the `[WIP]` label (if present) and
    replace
    it with `[REVIEW]`. If assistance is required to complete the
    functionality,
    for example when the C/C++ code of a feature is complete but Python
    bindings
    are still required, then add the label `[HELP-REQ]` so that others can
    triage
    and assist. The additional changes then can be implemented on top of the
    same PR. If the assistance is done by members of the rapidsAI team, then
    no
    additional actions are required by the creator of the original PR for
    this,
    otherwise the original author of the PR needs to give permission to the
    person(s) assisting to commit to their personal fork of the project. If
    that
    doesn't happen then a new PR based on the code of the original PR can be
    opened by the person assisting, which then will be the PR that will be
       merged.

    6. Once all work has been done and review has taken place please do not
    add
    features or make changes out of the scope of those requested by the
    reviewer
    (doing this just add delays as already reviewed code ends up having to
    be
    re-reviewed/it is hard to tell what is new etc!). Further, please do not
    rebase your branch on main/force push/rewrite history, doing any of
    these
       causes the context of any comments made by reviewers to be lost. If
       conflicts occur against main they should be resolved by merging main
       into the branch used for making the pull request.

    Many thanks in advance for your cooperation!

    -->

commit 3f8e13a
Author: Asher Mancinelli <[email protected]>
Date:   Mon Aug 18 10:08:16 2025 -0700

    Vendor in UnsupportedBytecodeError (NVIDIA#403)

    Vendors in UnsuportedBytecodeError and changes imports to use other
    utilities that have already been vendored in. _lazy_pformat was also
    needed when building with an older numba, which was already vendored in,
    so the imports have been changed.

    <!--

    Thank you for contributing to numba-cuda :)

    Here are some guidelines to help the review process go smoothly.

    1. Please write a description in this text box of the changes that are
    being
       made.

    2. Please ensure that you have written units tests for the changes
    made/features
       added.

    3. If you are closing an issue please use one of the automatic closing
    words as
    noted here:
    https://help.github.com/articles/closing-issues-using-keywords/

    4. If your pull request is not ready for review but you want to make use
    of the
    continuous integration testing facilities please label it with `[WIP]`.

    5. If your pull request is ready to be reviewed without requiring
    additional
    work on top of it, then remove the `[WIP]` label (if present) and
    replace
    it with `[REVIEW]`. If assistance is required to complete the
    functionality,
    for example when the C/C++ code of a feature is complete but Python
    bindings
    are still required, then add the label `[HELP-REQ]` so that others can
    triage
    and assist. The additional changes then can be implemented on top of the
    same PR. If the assistance is done by members of the rapidsAI team, then
    no
    additional actions are required by the creator of the original PR for
    this,
    otherwise the original author of the PR needs to give permission to the
    person(s) assisting to commit to their personal fork of the project. If
    that
    doesn't happen then a new PR based on the code of the original PR can be
    opened by the person assisting, which then will be the PR that will be
       merged.

    6. Once all work has been done and review has taken place please do not
    add
    features or make changes out of the scope of those requested by the
    reviewer
    (doing this just add delays as already reviewed code ends up having to
    be
    re-reviewed/it is hard to tell what is new etc!). Further, please do not
    rebase your branch on main/force push/rewrite history, doing any of
    these
       causes the context of any comments made by reviewers to be lost. If
       conflicts occur against main they should be resolved by merging main
       into the branch used for making the pull request.

    Many thanks in advance for your cooperation!

    -->

commit d42ee2c
Author: brandon-b-miller <[email protected]>
Date:   Mon Aug 18 09:56:00 2025 -0500

    Bump thirdparty test version  (NVIDIA#402)

    Because the issue fixed by
    rapidsai/cudf@dde047f
    exists on cudf/25.08, bumping to the 25.08 stable release incurs the
    errors this commit addresses using our main branch today. To fix this,
    we’ll need to switch to nightlies for a RAPIDS release cycle. This
    situation should stabilize after the next rapids release assuming we
    don’t have a similar situation where changes in numba-cuda expose a
    latent bug that requires build time fixes in cuDF.

    Normally I'd suggest just waiting until the next stable cuDF release,
    but the 25.08 release specifically added the most significant changes to
    extension usage in a long time and I think it's worth testing a cuDF
    version that includes those updates with the changes we make here sooner
    rather than later.

commit cd74e37
Author: Atmn Patel <[email protected]>
Date:   Mon Aug 18 05:15:45 2025 -0500

    Vendor in PassManager, typed and untyped pass, and related helpers for CUDA-specific customization (NVIDIA#330)

    This change vendors in the Pass Manager to ease future CUDA-specific
    optimizations. This requires also vendoring in all of the passes in the
    default and Numba CUDA pipelines to maintain the pass registry, as well
    as the compiler lock machinery.

    ---------

    Signed-off-by: Atmn Patel <[email protected]>
    Co-authored-by: Graham Markall <[email protected]>

commit 0173219
Author: jiel-nv <[email protected]>
Date:   Mon Aug 18 02:09:23 2025 -0700

    Fix correctness issue in debug mode code generation for dynamic types. (NVIDIA#305)

    The polymorphic variables do not need zero finalization after the usage
    of each variation, since they share the same memory in the same
    artificial debug union. The cleanup work is done in the last basic block
    of the function.

    Also, do not use the cached temporaries as the pointer for the
    variations, they need to be regenerated and type casted from the debug
    union on each load/store.

    The correctness test added in this patch produces wrong result with
    existing code, as below,
    [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
    with this patch, the result will be produced correctly as expected,
    [1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

    ---------

    Co-authored-by: Graham Markall <[email protected]>

commit 8eadd36
Author: Asher Mancinelli <[email protected]>
Date:   Fri Aug 15 15:57:30 2025 -0700

    Vendor in additional debug info tests (NVIDIA#364)

    Adapted more tests in numba core to work with numba-cuda's `cuda.jit`.

    ---------

    Co-authored-by: Graham Markall <[email protected]>

commit 69f7918
Author: Graham Markall <[email protected]>
Date:   Fri Aug 15 22:27:10 2025 +0100

    Bump version to 0.19.0 (NVIDIA#400)

    - Support CUDA 13, drop support for CUDA 11 (NVIDIA#370)
    - Fix package clobbering in thirdparty CI tests (NVIDIA#395)
    - Make device name consistent between bindings (NVIDIA#382)
    - [NFC] Document development dependencies and dependency groups (NVIDIA#345)
    - Change repo URLs in pyproject.toml to point to new URL (NVIDIA#389)
    - Vendor-in IR utilities for CUDA-specific refactoring (NVIDIA#377)
    - remove uses of rapids-configure-conda-channels (NVIDIA#372)
    - Relax subclass check in vendored in typing templates (NVIDIA#371)
    - Add locals attribute to _FunctionCompiler (NVIDIA#381)
    - Fix CI for CUDA 13 drivers (NVIDIA#385)
    - Vendor in Bytecode Interpreter (NVIDIA#355)
    - Add code coverage CI job (NVIDIA#367)
    - [Refactor] Vendor in Typing Templates (NVIDIA#360)
    - [Refactor][NFC] Vendor in TestCase and other testing helpers for
    CUDA-specific changes (NVIDIA#363)
    - Split out debuginfo tests for types (NVIDIA#366)
    - [Refactor][NFC] Vendor in Itanium Mangler for future CUDA-specific
    changes (NVIDIA#362)
    - [Refactor][NFC] Vendor in sigutils for CUDA-specific changes (NVIDIA#361)
    - Remove remaining references/dependencies on `pynvjitlink` (NVIDIA#357)

commit 4a0487e
Author: brandon-b-miller <[email protected]>
Date:   Fri Aug 15 12:50:09 2025 -0500

    Require `cuda-bindings` `12.9.1` (NVIDIA#387)

    `cuda-bindings` `12.9.1` is built without monitoring, which avoids
    cython/cython#7050. This makes it the only
    safe version since the nvidia binding is used now by default. This PR
    requires the patched packages.

    ---------

    Co-authored-by: Leo Fang <[email protected]>
    Co-authored-by: Graham Markall <[email protected]>

commit 5d1bbd4
Author: Graham Markall <[email protected]>
Date:   Fri Aug 15 17:31:02 2025 +0100

    Support CUDA 13, drop support for CUDA 11 (NVIDIA#370)

    Required changes:

    - Test third party libraries with CUDA 12 only
    - Update matrix:
      - Drop CUDA 11
      - Change 12.8.0 to 12.9.1
      - Add CUDA 13
    - Only run apt commands on Ubuntu - this was coincidentally handled by
    CTK 11 vs
      12, where 11 ran on Rocky Linux and 12 on Ubuntu
    - Remove `CTK_CURAND_VMAP` - this is no longer used, and the cuRAND
    installation
      is handled as a dependency of the numba-cuda package.
    - Updates documentation to drop references to CTK 11, and add references
    to CTK
      13.
      - MVC is handled automatically, so most of the documentation for it is
        dropped.
    - Remove code relating to CTK 11 only.
    - Add handling for CUDA 13 paths when using the ctypes binding
    - Prevent `NVRTC()` objects being creating when using the NVIDIA
    cuda-python
    bindings, and replace uses of it with . The `NVRTC()` class directly
    opens the
    NVRTC DLL / SO, so it should not be used in conjunction with the
    cuda-python
      bindings.
    - In test binaries, the lowest CC we should generate code for is 7.5,
    because
      this is the minimum supported by CTK 13.
    - The CCCL include path has changed in CUDA 13, so we add that to the
    include
      path for test binary generation.

    ---------

    Co-authored-by: Keith Kraus <[email protected]>
    Co-authored-by: Bradley Dice <[email protected]>

commit c3ca630
Author: brandon-b-miller <[email protected]>
Date:   Fri Aug 15 10:09:38 2025 -0500

    Fix package clobbering in thirdparty CI tests (NVIDIA#395)

    In our thirdparty CI tests, installation of the cuDF package is
    clobbering the branch version of numba-cuda we actually want to test.
    Ran into this when cuDF failed to import locally using `main` and trying
    to track down why it didn't show up in CI.

commit 47731d5
Author: brandon-b-miller <[email protected]>
Date:   Wed Aug 13 11:07:30 2025 -0500

    Make device name consistent between bindings (NVIDIA#382)

    Closes NVIDIA#375

    ---------

    Co-authored-by: Leo Fang <[email protected]>

commit be32e90
Author: Asher Mancinelli <[email protected]>
Date:   Mon Aug 11 13:47:25 2025 -0700

    [NFC] Document development dependencies and dependency groups (NVIDIA#345)

    Follow-up to 1a69197, running the following should pull all the
    dependencies needed to develop numba-cuda for a given cuda version.
    <!--

    Thank you for contributing to numba-cuda :)

    Here are some guidelines to help the review process go smoothly.

    1. Please write a description in this text box of the changes that are
    being
       made.

    2. Please ensure that you have written units tests for the changes
    made/features
       added.

    3. If you are closing an issue please use one of the automatic closing
    words as
    noted here:
    https://help.github.com/articles/closing-issues-using-keywords/

    4. If your pull request is not ready for review but you want to make use
    of the
    continuous integration testing facilities please label it with `[WIP]`.

    5. If your pull request is ready to be reviewed without requiring
    additional
    work on top of it, then remove the `[WIP]` label (if present) and
    replace
    it with `[REVIEW]`. If assistance is required to complete the
    functionality,
    for example when the C/C++ code of a feature is complete but Python
    bindings
    are still required, then add the label `[HELP-REQ]` so that others can
    triage
    and assist. The additional changes then can be implemented on top of the
    same PR. If the assistance is done by members of the rapidsAI team, then
    no
    additional actions are required by the creator of the original PR for
    this,
    otherwise the original author of the PR needs to give permission to the
    person(s) assisting to commit to their personal fork of the project. If
    that
    doesn't happen then a new PR based on the code of the original PR can be
    opened by the person assisting, which then will be the PR that will be
       merged.

    6. Once all work has been done and review has taken place please do not
    add
    features or make changes out of the scope of those requested by the
    reviewer
    (doing this just add delays as already reviewed code ends up having to
    be
    re-reviewed/it is hard to tell what is new etc!). Further, please do not
    rebase your branch on main/force push/rewrite history, doing any of
    these
       causes the context of any comments made by reviewers to be lost. If
       conflicts occur against main they should be resolved by merging main
       into the branch used for making the pull request.

    Many thanks in advance for your cooperation!

    -->

commit c210c96
Author: Gabe R. <[email protected]>
Date:   Mon Aug 11 06:29:11 2025 -0700

    Change repo URLs in pyproject.toml to point to new URL (NVIDIA#389)

    The pyproject.toml files still pointed to the old repository URL from
    before it was transferred to NVIDIA. I've fixed that.

commit a932a0c
Author: Asher Mancinelli <[email protected]>
Date:   Fri Aug 8 08:15:23 2025 -0700

    Vendor-in IR utilities for CUDA-specific refactoring (NVIDIA#377)

    Vendor in the numba core IR utilities. There were several unused
    variables in this component; I removed them as I found them, and when
    they were the result of a function call, the function call was preserved
    so I believe the original behavior has been preserved.

    NOTE: A significant portion of this module is covered by neither our
    numba-cuda tests nor the upstream `test_ir_utils.py` test. We may be
    able to trim this module significantly.
    NOTE: the original copyright in ir_utils.py has been preserved.

    Numba commit 262b49a3e was used.

    ---------

    Co-authored-by: Graham Markall <[email protected]>
@ashermancinelli
Copy link
Contributor Author

There are three failing tests with this patch, and I believe they can be traced back to passes that are run through the typing system. I hit assertion errors like this in our test suite:

pytest -s numba_cuda/numba/cuda/tests/cudapy/test_compiler.py::TestCompile::test_functioncompiler_locals
...
    def __init__(self, scope, name, loc):
        # NOTE: Use of scope=None should be removed.
        assert scope is None or isinstance(scope, Scope)
        assert isinstance(name, str)
>       assert isinstance(loc, Loc)
               ^^^^^^^^^^^^^^^^^^^^
E       AssertionError

numba_cuda/numba/cuda/core/ir.py:1174: AssertionError

When I print the traceback inside the constructor of numba.ir.Loc by editing the upstream numba.core.ir in my local environment, I see the following traceback:

  File "./numba_cuda/numba/cuda/tests/cudapy/test_compiler.py", line 262, in test_functioncompiler_locals
  File "./numba_cuda/numba/cuda/decorators.py", line 207, in _jit
  File "./.venv-3.12/lib/python3.12/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
  File "./numba_cuda/numba/cuda/dispatcher.py", line 1296, in compile
  File "./.venv-3.12/lib/python3.12/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
  File "./numba_cuda/numba/cuda/dispatcher.py", line 145, in __init__
  File "./.venv-3.12/lib/python3.12/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
  File "./numba_cuda/numba/cuda/compiler.py", line 745, in compile_cuda
  File "./numba_cuda/numba/cuda/compiler.py", line 586, in compile_extra
  File "./numba_cuda/numba/cuda/core/compiler.py", line 157, in compile_extra
  File "./numba_cuda/numba/cuda/core/compiler.py", line 225, in _compile_bytecode
  File "./numba_cuda/numba/cuda/core/compiler.py", line 193, in _compile_core
  File "./numba_cuda/numba/cuda/core/compiler_machinery.py", line 385, in run
  File "./.venv-3.12/lib/python3.12/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
  File "./numba_cuda/numba/cuda/core/compiler_machinery.py", line 337, in _runPass
  File "./numba_cuda/numba/cuda/core/compiler_machinery.py", line 291, in check
  File "./numba_cuda/numba/cuda/core/typed_passes.py", line 144, in run_pass
  File "./numba_cuda/numba/cuda/core/typed_passes.py", line 123, in type_inference_stage
  File "./numba_cuda/numba/cuda/core/typeinfer.py", line 1146, in propagate
  File "./numba_cuda/numba/cuda/core/typeinfer.py", line 191, in propagate
  File "./numba_cuda/numba/cuda/core/typeinfer.py", line 619, in __call__
  File "./numba_cuda/numba/cuda/core/typeinfer.py", line 642, in resolve
  File "./numba_cuda/numba/cuda/core/typeinfer.py", line 1682, in resolve_call
  File "./numba_cuda/numba/cuda/typing/context.py", line 197, in resolve_function_type
  File "./numba_cuda/numba/cuda/typing/context.py", line 249, in _resolve_user_function_type
  File "./.venv-3.12/lib/python3.12/site-packages/numba/core/types/functions.py", line 308, in get_call_type
  File "./.venv-3.12/lib/python3.12/site-packages/numba/core/typing/templates.py", line 358, in apply
  File "./.venv-3.12/lib/python3.12/site-packages/numba/core/typing/templates.py", line 660, in generic
  File "./.venv-3.12/lib/python3.12/site-packages/numba/core/inline_closurecall.py", line 541, in run_untyped_passes
  File "./.venv-3.12/lib/python3.12/site-packages/numba/core/compiler_machinery.py", line 356, in run
  File "./.venv-3.12/lib/python3.12/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
  File "./.venv-3.12/lib/python3.12/site-packages/numba/core/compiler_machinery.py", line 311, in _runPass
  File "./.venv-3.12/lib/python3.12/site-packages/numba/core/compiler_machinery.py", line 272, in check
  File "./.venv-3.12/lib/python3.12/site-packages/numba/core/untyped_passes.py", line 86, in run_pass
  File "./.venv-3.12/lib/python3.12/site-packages/numba/core/interpreter.py", line 1406, in interpret
  File "./.venv-3.12/lib/python3.12/site-packages/numba/core/interpreter.py", line 1538, in _iter_inst
  File "./.venv-3.12/lib/python3.12/site-packages/numba/core/ir.py", line 204, in with_lineno
  File "./.venv-3.12/lib/python3.12/site-packages/numba/core/ir.py", line 39, in __init__

Notice the first time the traceback enters the upstream numba package: numba_cuda/numba/cuda/typing/context.py calls into numba.core.types.functions which eventually calls into untyped_passes.py which uses the upstream numba.core.ir.

Becuase the IR module is interdependent with so many other modules,
it will be vendored in without being fully adopted inside numba-cuda.
Modules inside numba-cuda will not use numba.cuda.core.ir yet.
In a subsequent PR when other components are vendored in, we may
update these imports. For now, its only use is in test_ir.py.
@ashermancinelli
Copy link
Contributor Author

/ok to test ad24c3c

@ashermancinelli ashermancinelli marked this pull request as ready for review October 14, 2025 17:51
@ashermancinelli ashermancinelli added 3 - Ready for Review Ready for review by team and removed 0 - Blocked Cannot progress due to external reasons labels Oct 14, 2025
@gmarkall gmarkall added 5 - Ready to merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Oct 15, 2025
@gmarkall gmarkall merged commit 64fe5be into NVIDIA:main Oct 15, 2025
70 checks passed
gmarkall added a commit to gmarkall/numba-cuda that referenced this pull request Nov 20, 2025
- Add support for cache-hinted load and store operations (NVIDIA#587)
- Add more thirdparty tests (NVIDIA#586)
- Add sphinx-lint to pre-commit and fix errors (NVIDIA#597)
- Add DWARF variant part support for polymorphic variables in CUDA debug info (NVIDIA#544)
- chore: clean up dead workaround for unavailable `lru_cache` (NVIDIA#598)
- chore(docs): format types docs (NVIDIA#596)
- refactor: decouple `Context` from `Stream` and `Event` objects (NVIDIA#579)
- Fix freezing in of constant arrays with negative strides (NVIDIA#589)
- Update tests to accept variants of generated PTX (NVIDIA#585)
- refactor: replace device functionality with `cuda.core` APIs (NVIDIA#581)
- Move frontend tests to `cudapy` namespace (NVIDIA#558)
- Generalize the concurrency group for main merges (NVIDIA#582)
- ci: move pre-commit checks to pre commit action (NVIDIA#577)
- chore(pixi): set up doc builds; remove most `build-conda` dependencies (NVIDIA#574)
- ci: ensure that python version in ci matches matrix (NVIDIA#575)
- Fix the `cuda.is_supported_version()` API (NVIDIA#571)
- Fix checks on main (NVIDIA#576)
- feat: add `math.nextafter` (NVIDIA#543)
- ci: replace conda testing with pixi (NVIDIA#554)
- [CI] Run PR workflow on merge to main (NVIDIA#572)
- Propose Alternative Module Path for `ext_types` and Maintain `numba.cuda.types.bfloat16` Import API (NVIDIA#569)
- test: enable fail-on-warn and clean up resulting failures (NVIDIA#529)
- [Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific changes (NVIDIA#565)
- Fix registration with Numba, vendor MakeFunctionToJITFunction tests (NVIDIA#566)
- [Refactor][NFC][Cleanups] Update imports to upstream numba to use the numba.cuda modules (NVIDIA#561)
- test: refactor process-based tests to use concurrent futures in order to simplify tests (NVIDIA#550)
- test: revert back to ipc futures that await each iteration (NVIDIA#564)
- chore(deps): move to self-contained pixi.toml to avoid mixed-pypi-pixi environments (NVIDIA#551)
- [Refactor][NFC] Vendor-in errors for future CUDA-specific changes (NVIDIA#534)
- Remove dependencies on target_extension for CUDA target (NVIDIA#555)
- Relax the pinning to `cuda-core` to allow it floating across minor releases (NVIDIA#559)
- [WIP] Port numpy reduction tests to CUDA (NVIDIA#523)
- ci: add timeout to avoid blocking the job queue (NVIDIA#556)
- Handle `cuda.core.Stream` in driver operations (NVIDIA#401)
- feat: add support for `math.exp2` (NVIDIA#541)
- Vendor in types and datamodel for CUDA-specific changes (NVIDIA#533)
- refactor: cleanup device constructor (NVIDIA#548)
- bench: add cupy to array constructor kernel launch benchmarks (NVIDIA#547)
- perf: cache dimension computations (NVIDIA#542)
- perf: remove duplicated size computation (NVIDIA#537)
- chore(perf): add torch to benchmark (NVIDIA#539)
- test: speed up ipc tests by ~6.5x (NVIDIA#527)
- perf: speed up kernel launch (NVIDIA#510)
- perf: remove context threading in various pointer abstractions (NVIDIA#536)
- perf: reduce the number of `__cuda_array_interface__` accesses (NVIDIA#538)
- refactor: remove unnecessary custom map and set implementations (NVIDIA#530)
- [Refactor][NFC] Vendor-in vectorize decorators for future CUDA-specific changes (NVIDIA#513)
- test: add benchmarks for kernel launch for reproducibility (NVIDIA#528)
- test(pixi): update pixi testing command to work with the new `testing` directory (NVIDIA#522)
- refactor: fully remove `USE_NV_BINDING` (NVIDIA#525)
- Draft: Vendor in the IR module (NVIDIA#439)
- pyproject.toml: add search path for Pyrefly (NVIDIA#524)
- Vendor in numba.core.typing for CUDA-specific changes (NVIDIA#473)
- Use numba.config when available, otherwise use numba.cuda.config (NVIDIA#497)
- [MNT] Drop NUMBA_CUDA_USE_NVIDIA_BINDING; always use cuda.core and cuda.bindings as fallback (NVIDIA#479)
- Vendor in dispatcher, entrypoints, pretty_annotate for CUDA-specific changes (NVIDIA#502)
- build: allow parallelization of nvcc testing builds (NVIDIA#521)
- chore(dev-deps): add pixi (NVIDIA#505)
- Vendor the imputils module for CUDA refactoring (NVIDIA#448)
- Don't use `MemoryLeakMixin` for tests that don't use NRT (NVIDIA#519)
- Switch back to stable cuDF release in thirdparty tests (NVIDIA#518)
- Updating .gitignore with binaries in the `testing` folder (NVIDIA#516)
- Remove some unnecessary uses of ContextResettingTestCase (NVIDIA#507)
- Vendor in _helperlib cext for CUDA-specific changes (NVIDIA#512)
- Vendor in typeconv for future CUDA-specific changes (NVIDIA#499)
- [Refactor][NFC] Vendor-in numba.cpython modules for future CUDA-specific changes (NVIDIA#493)
- [Refactor][NFC] Vendor-in numba.np modules for future CUDA-specific changes (NVIDIA#494)
- Make the CUDA target the default for CUDA overload decorators (NVIDIA#511)
- Remove C extension loading hacks (NVIDIA#506)
- Ensure NUMBA can manipulate memory from CUDA graphs before the graph is launched (NVIDIA#437)
- [Refactor][NFC] Vendor-in core Numba analysis utils for CUDA-specific changes (NVIDIA#433)
- Fix Bf16 Test OB Error (NVIDIA#509)
- Vendor in components from numba.core.runtime for CUDA-specific changes (NVIDIA#498)
- [Refactor] Vendor in _dispatcher, _devicearray, mviewbuf C extension for CUDA-specific customization (NVIDIA#373)
- [MNT] Managed UM memset fallback and skip CUDA IPC tests on WSL2 (NVIDIA#488)
- Improve debug value range coverage (NVIDIA#461)
- Add `compile_all` API (NVIDIA#484)
- Vendor in core.registry for CUDA-specific changes (NVIDIA#485)
- [Refactor][NFC] Vendor in numba.misc for CUDA-specific changes (NVIDIA#457)
- Vendor in optional, boxing for CUDA-specific changes, fix dangling imports (NVIDIA#476)
- [test] Remove dependency on cpu_target (NVIDIA#490)
- Change dangling imports of numba.core.lowering to numba.cuda.lowering (NVIDIA#475)
- [test] Use numpy's tolerance for float16 (NVIDIA#491)
- [Refactor][NFC] Vendor-in numba.extending for future CUDA-specific changes (NVIDIA#466)
- [Refactor][NFC] Vendor-in more cpython registries for future CUDA-specific changes (NVIDIA#478)
@gmarkall gmarkall mentioned this pull request Nov 20, 2025
gmarkall added a commit that referenced this pull request Nov 20, 2025
- Add support for cache-hinted load and store operations (#587)
- Add more thirdparty tests (#586)
- Add sphinx-lint to pre-commit and fix errors (#597)
- Add DWARF variant part support for polymorphic variables in CUDA debug
info (#544)
- chore: clean up dead workaround for unavailable `lru_cache` (#598)
- chore(docs): format types docs (#596)
- refactor: decouple `Context` from `Stream` and `Event` objects (#579)
- Fix freezing in of constant arrays with negative strides (#589)
- Update tests to accept variants of generated PTX (#585)
- refactor: replace device functionality with `cuda.core` APIs (#581)
- Move frontend tests to `cudapy` namespace (#558)
- Generalize the concurrency group for main merges (#582)
- ci: move pre-commit checks to pre commit action (#577)
- chore(pixi): set up doc builds; remove most `build-conda` dependencies
(#574)
- ci: ensure that python version in ci matches matrix (#575)
- Fix the `cuda.is_supported_version()` API (#571)
- Fix checks on main (#576)
- feat: add `math.nextafter` (#543)
- ci: replace conda testing with pixi (#554)
- [CI] Run PR workflow on merge to main (#572)
- Propose Alternative Module Path for `ext_types` and Maintain
`numba.cuda.types.bfloat16` Import API (#569)
- test: enable fail-on-warn and clean up resulting failures (#529)
- [Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific
changes (#565)
- Fix registration with Numba, vendor MakeFunctionToJITFunction tests
(#566)
- [Refactor][NFC][Cleanups] Update imports to upstream numba to use the
numba.cuda modules (#561)
- test: refactor process-based tests to use concurrent futures in order
to simplify tests (#550)
- test: revert back to ipc futures that await each iteration (#564)
- chore(deps): move to self-contained pixi.toml to avoid mixed-pypi-pixi
environments (#551)
- [Refactor][NFC] Vendor-in errors for future CUDA-specific changes
(#534)
- Remove dependencies on target_extension for CUDA target (#555)
- Relax the pinning to `cuda-core` to allow it floating across minor
releases (#559)
- [WIP] Port numpy reduction tests to CUDA (#523)
- ci: add timeout to avoid blocking the job queue (#556)
- Handle `cuda.core.Stream` in driver operations (#401)
- feat: add support for `math.exp2` (#541)
- Vendor in types and datamodel for CUDA-specific changes (#533)
- refactor: cleanup device constructor (#548)
- bench: add cupy to array constructor kernel launch benchmarks (#547)
- perf: cache dimension computations (#542)
- perf: remove duplicated size computation (#537)
- chore(perf): add torch to benchmark (#539)
- test: speed up ipc tests by ~6.5x (#527)
- perf: speed up kernel launch (#510)
- perf: remove context threading in various pointer abstractions (#536)
- perf: reduce the number of `__cuda_array_interface__` accesses (#538)
- refactor: remove unnecessary custom map and set implementations (#530)
- [Refactor][NFC] Vendor-in vectorize decorators for future
CUDA-specific changes (#513)
- test: add benchmarks for kernel launch for reproducibility (#528)
- test(pixi): update pixi testing command to work with the new `testing`
directory (#522)
- refactor: fully remove `USE_NV_BINDING` (#525)
- Draft: Vendor in the IR module (#439)
- pyproject.toml: add search path for Pyrefly (#524)
- Vendor in numba.core.typing for CUDA-specific changes (#473)
- Use numba.config when available, otherwise use numba.cuda.config
(#497)
- [MNT] Drop NUMBA_CUDA_USE_NVIDIA_BINDING; always use cuda.core and
cuda.bindings as fallback (#479)
- Vendor in dispatcher, entrypoints, pretty_annotate for CUDA-specific
changes (#502)
- build: allow parallelization of nvcc testing builds (#521)
- chore(dev-deps): add pixi (#505)
- Vendor the imputils module for CUDA refactoring (#448)
- Don't use `MemoryLeakMixin` for tests that don't use NRT (#519)
- Switch back to stable cuDF release in thirdparty tests (#518)
- Updating .gitignore with binaries in the `testing` folder (#516)
- Remove some unnecessary uses of ContextResettingTestCase (#507)
- Vendor in _helperlib cext for CUDA-specific changes (#512)
- Vendor in typeconv for future CUDA-specific changes (#499)
- [Refactor][NFC] Vendor-in numba.cpython modules for future
CUDA-specific changes (#493)
- [Refactor][NFC] Vendor-in numba.np modules for future CUDA-specific
changes (#494)
- Make the CUDA target the default for CUDA overload decorators (#511)
- Remove C extension loading hacks (#506)
- Ensure NUMBA can manipulate memory from CUDA graphs before the graph
is launched (#437)
- [Refactor][NFC] Vendor-in core Numba analysis utils for CUDA-specific
changes (#433)
- Fix Bf16 Test OB Error (#509)
- Vendor in components from numba.core.runtime for CUDA-specific changes
(#498)
- [Refactor] Vendor in _dispatcher, _devicearray, mviewbuf C extension
for CUDA-specific customization (#373)
- [MNT] Managed UM memset fallback and skip CUDA IPC tests on WSL2
(#488)
- Improve debug value range coverage (#461)
- Add `compile_all` API (#484)
- Vendor in core.registry for CUDA-specific changes (#485)
- [Refactor][NFC] Vendor in numba.misc for CUDA-specific changes (#457)
- Vendor in optional, boxing for CUDA-specific changes, fix dangling
imports (#476)
- [test] Remove dependency on cpu_target (#490)
- Change dangling imports of numba.core.lowering to numba.cuda.lowering
(#475)
- [test] Use numpy's tolerance for float16 (#491)
- [Refactor][NFC] Vendor-in numba.extending for future CUDA-specific
changes (#466)
- [Refactor][NFC] Vendor-in more cpython registries for future
CUDA-specific changes (#478)

<!--

Thank you for contributing to numba-cuda :)

Here are some guidelines to help the review process go smoothly.

1. Please write a description in this text box of the changes that are
being
   made.

2. Please ensure that you have written units tests for the changes
made/features
   added.

3. If you are closing an issue please use one of the automatic closing
words as
noted here:
https://help.github.com/articles/closing-issues-using-keywords/

4. If your pull request is not ready for review but you want to make use
of the
continuous integration testing facilities please label it with `[WIP]`.

5. If your pull request is ready to be reviewed without requiring
additional
work on top of it, then remove the `[WIP]` label (if present) and
replace
it with `[REVIEW]`. If assistance is required to complete the
functionality,
for example when the C/C++ code of a feature is complete but Python
bindings
are still required, then add the label `[HELP-REQ]` so that others can
triage
and assist. The additional changes then can be implemented on top of the
same PR. If the assistance is done by members of the rapidsAI team, then
no
additional actions are required by the creator of the original PR for
this,
otherwise the original author of the PR needs to give permission to the
person(s) assisting to commit to their personal fork of the project. If
that
doesn't happen then a new PR based on the code of the original PR can be
opened by the person assisting, which then will be the PR that will be
   merged.

6. Once all work has been done and review has taken place please do not
add
features or make changes out of the scope of those requested by the
reviewer
(doing this just add delays as already reviewed code ends up having to
be
re-reviewed/it is hard to tell what is new etc!). Further, please do not
rebase your branch on main/force push/rewrite history, doing any of
these
   causes the context of any comments made by reviewers to be lost. If
   conflicts occur against main they should be resolved by merging main
   into the branch used for making the pull request.

Many thanks in advance for your cooperation!

-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

5 - Ready to merge Testing and reviews complete, ready to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants