Fix #624: Accept Numba IR nodes in all places Numba-CUDA IR nodes are expected #643

gmarkall · 2025-12-09T13:21:03Z

This PR attempts to fix Issue #624. The root cause there is that Numba's inline pass is used when numba.extending.overload(inline="always") is used, and it creates IR nodes from numba.core.ir instead of numba.cuda.core.ir, which are not recognised in instance checks in Numba-CUDA.

Rather than redirecting the IR module to Numba, this PR aims to modify all instance checks to accept both Numba and Numba-CUDA IR nodes, to allow for flexibility in editing the numba.cuda.core.ir module and the changes co-existing with uses of the numba.core.ir module.

These changes are mostly performed by a script. The majority of these were done with:

import re
import pathlib
import sys

path = pathlib.Path(sys.argv[1])
text = path.read_text()

pattern = re.compile(r"(isinstance\([^,]+,\s*)ir\.([A-Z][A-Za-z0-9_]+)")


def repl(m):
    head = m.group(1)
    name = m.group(2).lower()
    return f"{head}ir.{name}_types"


new = pattern.sub(repl, text)
path.write_text(new)

Which replaces uses like isinstance(<thing>, ir.<Classname>) with isinstance(<thing>, ir.<classname>_types). The <classname>_types are initialized based on whether Numba is present - if it is then e.g. (Arg, numba.core.ir.Arg) would be in arg_types, and similar for all the other node types.

The script above did not change everything - some manual edits were made after those changes, which were detected using the following:

# Generate a script to grep for all the uses of classes in ir.py
grep "^class [A-Z]" numba_cuda/numba/cuda/core/ir.py | awk '{print $2}' | awk 'BEGIN { FS="(" } { print "grep --color \"ir\\." $1 "\" numba_cuda -R" }' > audit.sh
# Run the script, ignoring instances where we create a new IR node and assign it, e.g "v = ir.Var(...)"
bash audit.sh | grep -v "= ir\.[A-Z]" | sort

The bash script above can also be used to check all the remaining uses of the nodes, to ensure that none that need changing were missed.

A test based on the reproducer in #624 is also added. The test did not work initially - this is because the test case adds an implementation of get_42 to Numba's builtin registry, which was ignored by the CUDA target. The CUDA target would normally install registrations from the Numba builtin registry, but skips those that appear to be internal, as determined by the module name beginning with "numba.". Implementations defined in test code should not be considered internal (they provide a model of the use of Numba-CUDA from outside) so the is_external() method is updated to consider implementations in the numba.cuda.tests module to be external.

There were some specific isinstance checks in `ir.py` that were modified to accept Numba IR nodes that could have been generated when transformations in Numba or other areas produce them. These specific locations appear to have been insufficient, leading to NVIDIA#624. In order to ensure that Numba IR nodes are accepted in general, all isinstance checks are updated to accept either the Numba-CUDA type or the Numba type. Rather than implementing a lot of switching in the implementations, the sets of acceptable IR nodes are defined in the module global scope based on the presence of Numba, and these globals are referred to in the instance checks. Note that: - Checks on function types are not modified, and Numba and Numba-CUDA are sharing types when co-installed. - Some checks also accepted `None` - these have been slightly modified so that the assertion is only used when the variable is not `None`. - Other checks, e.g. for `int` and `str` types, require no modification.

Auditing script generated with ``` grep "^class [A-Z]" numba_cuda/numba/cuda/core/ir.py | awk '{print $2}' | awk 'BEGIN { FS="(" } { print "grep --color \"ir\\." $1 "\" numba_cuda -R" }' > audit.sh ``` and run with ``` bash audit.sh | grep -v "= ir\.[A-Z]" | sort ```

copy-pr-bot · 2025-12-09T13:21:07Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

gmarkall · 2025-12-09T13:35:19Z

/ok to test

gmarkall · 2025-12-09T13:37:34Z

/ok to test

greptile-apps · 2025-12-09T13:49:40Z

Greptile Summary

This PR fixes issue #624 by enabling Numba-CUDA to accept IR nodes from both numba.core.ir and numba.cuda.core.ir modules. When inline="always" is used with @overload, Numba's inline pass creates IR nodes from its own module rather than Numba-CUDA's, causing isinstance checks to fail.

Key Changes

IR Type Tuples: Adds 38 type tuples at the end of ir.py (e.g., var_types = (Var, numba.core.ir.Var)) that include both Numba and Numba-CUDA IR node types when Numba is available
Systematic isinstance Replacements: Updates all isinstance checks across 19 files from isinstance(x, ir.NodeType) to isinstance(x, ir.nodetype_types) using both automated scripts and manual edits
Test Module Fix: Updates is_external() in typing/context.py to treat numba.cuda.tests.* modules as external, allowing test-defined overloads to be registered correctly
Test Coverage: Adds comprehensive test based on the original bug reproducer

Implementation Quality

The changes are highly systematic and comprehensive. The script-based approach with manual verification ensures consistency. The normalization logic in transforms.py:_legalize_with_head() properly handles mixed IR node types. All assertion and isinstance checks correctly use the new type tuples.

Confidence Score: 5/5

This PR is safe to merge - changes are systematic, well-tested, and address a critical regression
The refactoring follows a clear, consistent pattern across all files. The approach of using type tuples is elegant and maintainable. The is_external() fix is logically correct. The test reproduces the exact issue from [BUG] [REGRESSION] inline="always" does not work on overload #624. No logic errors or edge cases detected.
No files require special attention

Important Files Changed

Filename	Overview
numba_cuda/numba/cuda/core/ir.py	Core change: Adds type tuples (e.g., `var_types`, `loc_types`) at end of file to include both Numba-CUDA and Numba IR node types, then updates all isinstance checks throughout to use these tuples. Changes are comprehensive and systematic.
numba_cuda/numba/cuda/typing/context.py	Updates `is_external()` to treat `numba.cuda.tests.*` modules as external, allowing test-defined overloads to be registered correctly. Logic is clear and correct.
numba_cuda/numba/cuda/tests/cudapy/test_numba_interop.py	New test file that reproduces issue #624, testing that `inline="always"` works with overloads. Test is well-structured and properly skipped on simulator.
numba_cuda/numba/cuda/core/analysis.py	Updates isinstance checks to use new type tuples (e.g., `ir.assign_types`, `ir.arg_types`). All changes follow the consistent pattern of accepting both Numba and Numba-CUDA IR nodes.
numba_cuda/numba/cuda/core/typeinfer.py	Replaces IR class isinstance checks with type tuple checks throughout. Changes are systematic and cover all statement and expression type checks.
numba_cuda/numba/cuda/core/transforms.py	Updates isinstance checks and adds normalization logic in `_legalize_with_head()` to handle both Numba and Numba-CUDA IR node types correctly.

greptile-apps

Additional Comments (4)

numba_cuda/numba/cuda/core/ir.py, line 571 (link)

logic: inconsistent type checking - uses Var directly instead of var_types
numba_cuda/numba/cuda/core/ir.py, line 795 (link)

logic: inconsistent type checking - uses Var directly instead of var_types
numba_cuda/numba/cuda/core/ir.py, line 877 (link)

logic: inconsistent type checking - uses Var directly instead of var_types
numba_cuda/numba/cuda/core/analysis.py, line 46-51 (link)

style: The multiple isinstance checks could be simplified using the same tuple concatenation pattern used elsewhere in the file

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

_{19 files reviewed, 4 comments}

_{Edit Code Review Agent Settings | Greptile}

Also use the Numba/Numba-CUDA types within ir.py

gmarkall · 2025-12-10T13:32:11Z

/ok to test

gmarkall · 2025-12-10T13:40:12Z

/ok to test

gmarkall · 2025-12-11T12:41:38Z

/ok to test

gmarkall · 2025-12-11T13:02:02Z

/ok to test

gmarkall · 2025-12-11T13:02:50Z

/ok to test

copy-pr-bot · 2025-12-11T15:51:22Z

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

greptile-apps

Additional Comments (2)

numba_cuda/numba/cuda/serialize.py, line 73 (link)

logic: This file imports ir from llvmlite (line 16), not from numba.cuda.core. The llvmlite.ir module has a Value class but does not have a value_types tuple. This will cause AttributeError: module 'llvmlite.ir' has no attribute 'value_types' at runtime.

This change should be reverted since it's operating on a different ir module than the other files in this PR.
numba_cuda/numba/cuda/core/postproc.py, line 10 (link)

logic: block_types is not defined in ir.py. The file only defines popblock_types for the PopBlock class. This will cause AttributeError: module 'numba.cuda.core.ir' has no attribute 'block_types'.

You need to either:
1. Add block_types = (Block, numba.core.ir.Block) to ir.py, or
2. Keep the original ir.Block check here

_{21 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

kkraus14 · 2025-12-15T17:49:41Z

replace_checks.py

Was this intended to be checked in?

It's not intended to be in the code that gets merged. I did have it checked in whilst I was working on the branch but it needs removing.

kkraus14 · 2025-12-15T17:50:06Z

@gmarkall changes LGTM other than checking in the script to automate the replacement

gmarkall · 2025-12-16T11:44:10Z

Many thanks @kkraus14!

@ZzEeKkAa I believe this addresses the issue as in the reproducer. Would you be able to give it a test with your real code to ensure that it resolves the real issues for you please?

- In `serialize`, the `ir` module is from llvmlite, so it should not have been altered. - I never added `block_types` as I mistakenly thought `ir.Block` was never instance checked in the code base. There is a usage of it like this. It may be dead code in Numba-CUDA, but I am adding `block_types` just in case it is not.

gmarkall · 2025-12-16T11:51:09Z

/ok to test

greptile-apps

_{20 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

rparolin

lgtm, as long as the script that @kkraus14 highlighted is removed before submitting.

- Fix NVIDIA#624: Accept Numba IR nodes in all places Numba-CUDA IR nodes are expected (NVIDIA#643) - Fix Issue NVIDIA#588: separate compilation of NVVM IR modules when generating debuginfo (NVIDIA#591) - feat: allow printing nested tuples (NVIDIA#667) - build(deps): bump actions/setup-python from 5.6.0 to 6.1.0 (NVIDIA#655) - build(deps): bump actions/upload-artifact from 4 to 5 (NVIDIA#652) - Test RAPIDS 25.12 (NVIDIA#661) - Do not manually set DUMP_ASSEMBLY in `nvjitlink` tests (NVIDIA#662) - feat: add print support for int64 tuples (NVIDIA#663) - Only run dependabot monthly and open fewer PRs (NVIDIA#658) - test: fix bogus `self` argument to `Context` (NVIDIA#656) - Fix false negative NRT link decision when NRT was previously toggled on (NVIDIA#650) - Add support for dependabot (NVIDIA#647) - refactor: cull dead linker objects (NVIDIA#649) - Migrate numba-cuda driver to use cuda.core.launch API (NVIDIA#609) - feat: add set_shared_memory_carveout (NVIDIA#629) - chore: bump version in pixi.toml (NVIDIA#641) - refactor: remove devicearray code to reduce complexity (NVIDIA#600)

- Capture global device arrays in kernels and device functions (#666) - Fix #624: Accept Numba IR nodes in all places Numba-CUDA IR nodes are expected (#643) - Fix Issue #588: separate compilation of NVVM IR modules when generating debuginfo (#591) - feat: allow printing nested tuples (#667) - build(deps): bump actions/setup-python from 5.6.0 to 6.1.0 (#655) - build(deps): bump actions/upload-artifact from 4 to 5 (#652) - Test RAPIDS 25.12 (#661) - Do not manually set DUMP_ASSEMBLY in `nvjitlink` tests (#662) - feat: add print support for int64 tuples (#663) - Only run dependabot monthly and open fewer PRs (#658) - test: fix bogus `self` argument to `Context` (#656) - Fix false negative NRT link decision when NRT was previously toggled on (#650) - Add support for dependabot (#647) - refactor: cull dead linker objects (#649) - Migrate numba-cuda driver to use cuda.core.launch API (#609) - feat: add set_shared_memory_carveout (#629) - chore: bump version in pixi.toml (#641) - refactor: remove devicearray code to reduce complexity (#600)

v0.23.0 - Capture global device arrays in kernels and device functions (NVIDIA#666) - Fix NVIDIA#624: Accept Numba IR nodes in all places Numba-CUDA IR nodes are expected (NVIDIA#643) - Fix Issue NVIDIA#588: separate compilation of NVVM IR modules when generating debuginfo (NVIDIA#591) - feat: allow printing nested tuples (NVIDIA#667) - build(deps): bump actions/setup-python from 5.6.0 to 6.1.0 (NVIDIA#655) - build(deps): bump actions/upload-artifact from 4 to 5 (NVIDIA#652) - Test RAPIDS 25.12 (NVIDIA#661) - Do not manually set DUMP_ASSEMBLY in `nvjitlink` tests (NVIDIA#662) - feat: add print support for int64 tuples (NVIDIA#663) - Only run dependabot monthly and open fewer PRs (NVIDIA#658) - test: fix bogus `self` argument to `Context` (NVIDIA#656) - Fix false negative NRT link decision when NRT was previously toggled on (NVIDIA#650) - Add support for dependabot (NVIDIA#647) - refactor: cull dead linker objects (NVIDIA#649) - Migrate numba-cuda driver to use cuda.core.launch API (NVIDIA#609) - feat: add set_shared_memory_carveout (NVIDIA#629) - chore: bump version in pixi.toml (NVIDIA#641) - refactor: remove devicearray code to reduce complexity (NVIDIA#600)

gmarkall added 10 commits December 2, 2025 20:28

[WIP] Attempt to handle Numba IR nodes appearing everywhere

62eeebb

Address some more cases, in untyped_passes

05a9744

WIP transformations

f6411a7

Script for replacements

cd034ee

Transforms on ir_utils

0dd2694

A bunch more changes

8bacf6d

Enough to fix the NVIDIA#624 reproducer

aa1b904

Some more nodes

a1555c7

Fix up line accidentally left in

038bbbf

Merge remote-tracking branch 'NVIDIA/main' into issue-624

3504afd

greptile-apps bot reviewed Dec 9, 2025

View reviewed changes

gmarkall added the 2 - In Progress Currently a work in progress label Dec 9, 2025

Implement all node types for non-Numba case

647a7bf

Also use the Numba/Numba-CUDA types within ir.py

gmarkall added 2 commits December 11, 2025 12:39

Fix is_external and add test

23d3989

Return test to original form from reproducer

8377749

Skip interop tests on simulator

7775f7e

gmarkall changed the title ~~[WIP] Fix Issue #624~~ Fix #624: Accept Numba IR nodes in all places Numba-CUDA IR nodes are expected Dec 11, 2025

gmarkall added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Dec 11, 2025

gmarkall marked this pull request as ready for review December 11, 2025 15:51

greptile-apps bot reviewed Dec 11, 2025

View reviewed changes

kkraus14 reviewed Dec 15, 2025

View reviewed changes

Remove script used in development

91f1bd8

greptile-apps bot reviewed Dec 16, 2025

View reviewed changes

rparolin approved these changes Dec 16, 2025

View reviewed changes

gmarkall merged commit 309b030 into NVIDIA:main Dec 17, 2025
72 checks passed

gmarkall mentioned this pull request Dec 17, 2025

Bump version to 0.23.0 #668

Merged

Fix #624: Accept Numba IR nodes in all places Numba-CUDA IR nodes are expected #643

Fix #624: Accept Numba IR nodes in all places Numba-CUDA IR nodes are expected #643

Uh oh!

Conversation

gmarkall commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Dec 9, 2025

Uh oh!

gmarkall commented Dec 9, 2025

Uh oh!

gmarkall commented Dec 9, 2025

Uh oh!

greptile-apps bot commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Key Changes

Implementation Quality

Confidence Score: 5/5

Important Files Changed

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Additional Comments (4)

Uh oh!

gmarkall commented Dec 10, 2025

Uh oh!

gmarkall commented Dec 10, 2025

Uh oh!

gmarkall commented Dec 11, 2025

Uh oh!

gmarkall commented Dec 11, 2025

Uh oh!

gmarkall commented Dec 11, 2025

Uh oh!

copy-pr-bot bot commented Dec 11, 2025

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Additional Comments (2)

Uh oh!

kkraus14 Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

gmarkall Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

kkraus14 commented Dec 15, 2025

Uh oh!

gmarkall commented Dec 16, 2025

Uh oh!

gmarkall commented Dec 16, 2025

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rparolin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gmarkall commented Dec 9, 2025 •

edited

Loading

greptile-apps bot commented Dec 9, 2025 •

edited

Loading

greptile-apps bot left a comment •

edited

Loading

greptile-apps bot left a comment •

edited

Loading

greptile-apps bot left a comment •

edited

Loading