Skip to content

Conversation

@gmarkall
Copy link
Contributor

@gmarkall gmarkall commented Dec 9, 2025

This PR attempts to fix Issue #624. The root cause there is that Numba's inline pass is used when numba.extending.overload(inline="always") is used, and it creates IR nodes from numba.core.ir instead of numba.cuda.core.ir, which are not recognised in instance checks in Numba-CUDA.

Rather than redirecting the IR module to Numba, this PR aims to modify all instance checks to accept both Numba and Numba-CUDA IR nodes, to allow for flexibility in editing the numba.cuda.core.ir module and the changes co-existing with uses of the numba.core.ir module.

These changes are mostly performed by a script. The majority of these were done with:

import re
import pathlib
import sys

path = pathlib.Path(sys.argv[1])
text = path.read_text()

pattern = re.compile(r"(isinstance\([^,]+,\s*)ir\.([A-Z][A-Za-z0-9_]+)")


def repl(m):
    head = m.group(1)
    name = m.group(2).lower()
    return f"{head}ir.{name}_types"


new = pattern.sub(repl, text)
path.write_text(new)

Which replaces uses like isinstance(<thing>, ir.<Classname>) with isinstance(<thing>, ir.<classname>_types). The <classname>_types are initialized based on whether Numba is present - if it is then e.g. (Arg, numba.core.ir.Arg) would be in arg_types, and similar for all the other node types.

The script above did not change everything - some manual edits were made after those changes, which were detected using the following:

# Generate a script to grep for all the uses of classes in ir.py
grep "^class [A-Z]" numba_cuda/numba/cuda/core/ir.py | awk '{print $2}' | awk 'BEGIN { FS="(" } { print "grep --color \"ir\\." $1 "\" numba_cuda -R" }' > audit.sh
# Run the script, ignoring instances where we create a new IR node and assign it, e.g "v = ir.Var(...)"
bash audit.sh | grep -v "= ir\.[A-Z]" | sort

The bash script above can also be used to check all the remaining uses of the nodes, to ensure that none that need changing were missed.

A test based on the reproducer in #624 is also added. The test did not work initially - this is because the test case adds an implementation of get_42 to Numba's builtin registry, which was ignored by the CUDA target. The CUDA target would normally install registrations from the Numba builtin registry, but skips those that appear to be internal, as determined by the module name beginning with "numba.". Implementations defined in test code should not be considered internal (they provide a model of the use of Numba-CUDA from outside) so the is_external() method is updated to consider implementations in the numba.cuda.tests module to be external.

There were some specific isinstance checks in `ir.py` that were modified
to accept Numba IR nodes that could have been generated when
transformations in Numba or other areas produce them. These specific
locations appear to have been insufficient, leading to NVIDIA#624.

In order to ensure that Numba IR nodes are accepted in general, all
isinstance checks are updated to accept either the Numba-CUDA type or
the Numba type.

Rather than implementing a lot of switching in the implementations, the
sets of acceptable IR nodes are defined in the module global scope based
on the presence of Numba, and these globals are referred to in the
instance checks.

Note that:

- Checks on function types are not modified, and Numba and Numba-CUDA
  are sharing types when co-installed.
- Some checks also accepted `None` - these have been slightly modified
  so that the assertion is only used when the variable is not `None`.
- Other checks, e.g. for `int` and `str` types, require no modification.
Auditing script generated with

```
grep "^class [A-Z]" numba_cuda/numba/cuda/core/ir.py | awk '{print $2}' | awk 'BEGIN { FS="(" } { print "grep --color \"ir\\." $1 "\" numba_cuda -R" }' > audit.sh
```

and run with

```
bash audit.sh | grep -v "= ir\.[A-Z]" | sort
```
@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 9, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@gmarkall
Copy link
Contributor Author

gmarkall commented Dec 9, 2025

/ok to test

@gmarkall
Copy link
Contributor Author

gmarkall commented Dec 9, 2025

/ok to test

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 9, 2025

Greptile Summary

This PR fixes issue #624 by enabling Numba-CUDA to accept IR nodes from both numba.core.ir and numba.cuda.core.ir modules. When inline="always" is used with @overload, Numba's inline pass creates IR nodes from its own module rather than Numba-CUDA's, causing isinstance checks to fail.

Key Changes

  • IR Type Tuples: Adds 38 type tuples at the end of ir.py (e.g., var_types = (Var, numba.core.ir.Var)) that include both Numba and Numba-CUDA IR node types when Numba is available
  • Systematic isinstance Replacements: Updates all isinstance checks across 19 files from isinstance(x, ir.NodeType) to isinstance(x, ir.nodetype_types) using both automated scripts and manual edits
  • Test Module Fix: Updates is_external() in typing/context.py to treat numba.cuda.tests.* modules as external, allowing test-defined overloads to be registered correctly
  • Test Coverage: Adds comprehensive test based on the original bug reproducer

Implementation Quality

The changes are highly systematic and comprehensive. The script-based approach with manual verification ensures consistency. The normalization logic in transforms.py:_legalize_with_head() properly handles mixed IR node types. All assertion and isinstance checks correctly use the new type tuples.

Confidence Score: 5/5

  • This PR is safe to merge - changes are systematic, well-tested, and address a critical regression
  • The refactoring follows a clear, consistent pattern across all files. The approach of using type tuples is elegant and maintainable. The is_external() fix is logically correct. The test reproduces the exact issue from [BUG] [REGRESSION] inline="always" does not work on overload #624. No logic errors or edge cases detected.
  • No files require special attention

Important Files Changed

Filename Overview
numba_cuda/numba/cuda/core/ir.py Core change: Adds type tuples (e.g., var_types, loc_types) at end of file to include both Numba-CUDA and Numba IR node types, then updates all isinstance checks throughout to use these tuples. Changes are comprehensive and systematic.
numba_cuda/numba/cuda/typing/context.py Updates is_external() to treat numba.cuda.tests.* modules as external, allowing test-defined overloads to be registered correctly. Logic is clear and correct.
numba_cuda/numba/cuda/tests/cudapy/test_numba_interop.py New test file that reproduces issue #624, testing that inline="always" works with overloads. Test is well-structured and properly skipped on simulator.
numba_cuda/numba/cuda/core/analysis.py Updates isinstance checks to use new type tuples (e.g., ir.assign_types, ir.arg_types). All changes follow the consistent pattern of accepting both Numba and Numba-CUDA IR nodes.
numba_cuda/numba/cuda/core/typeinfer.py Replaces IR class isinstance checks with type tuple checks throughout. Changes are systematic and cover all statement and expression type checks.
numba_cuda/numba/cuda/core/transforms.py Updates isinstance checks and adds normalization logic in _legalize_with_head() to handle both Numba and Numba-CUDA IR node types correctly.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (4)

  1. numba_cuda/numba/cuda/core/ir.py, line 571 (link)

    logic: inconsistent type checking - uses Var directly instead of var_types

  2. numba_cuda/numba/cuda/core/ir.py, line 795 (link)

    logic: inconsistent type checking - uses Var directly instead of var_types

  3. numba_cuda/numba/cuda/core/ir.py, line 877 (link)

    logic: inconsistent type checking - uses Var directly instead of var_types

  4. numba_cuda/numba/cuda/core/analysis.py, line 46-51 (link)

    style: The multiple isinstance checks could be simplified using the same tuple concatenation pattern used elsewhere in the file

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

19 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

@gmarkall gmarkall added the 2 - In Progress Currently a work in progress label Dec 9, 2025
Also use the Numba/Numba-CUDA types within ir.py
@gmarkall
Copy link
Contributor Author

/ok to test

1 similar comment
@gmarkall
Copy link
Contributor Author

/ok to test

@gmarkall
Copy link
Contributor Author

/ok to test

1 similar comment
@gmarkall
Copy link
Contributor Author

/ok to test

@gmarkall
Copy link
Contributor Author

/ok to test

@gmarkall gmarkall changed the title [WIP] Fix Issue #624 Fix #624: Accept Numba IR nodes in all places Numba-CUDA IR nodes are expected Dec 11, 2025
@gmarkall gmarkall added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Dec 11, 2025
@gmarkall gmarkall marked this pull request as ready for review December 11, 2025 15:51
@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 11, 2025

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (2)

  1. numba_cuda/numba/cuda/serialize.py, line 73 (link)

    logic: This file imports ir from llvmlite (line 16), not from numba.cuda.core. The llvmlite.ir module has a Value class but does not have a value_types tuple. This will cause AttributeError: module 'llvmlite.ir' has no attribute 'value_types' at runtime.

    This change should be reverted since it's operating on a different ir module than the other files in this PR.

  2. numba_cuda/numba/cuda/core/postproc.py, line 10 (link)

    logic: block_types is not defined in ir.py. The file only defines popblock_types for the PopBlock class. This will cause AttributeError: module 'numba.cuda.core.ir' has no attribute 'block_types'.

    You need to either:

    1. Add block_types = (Block, numba.core.ir.Block) to ir.py, or
    2. Keep the original ir.Block check here

21 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this intended to be checked in?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not intended to be in the code that gets merged. I did have it checked in whilst I was working on the branch but it needs removing.

@kkraus14
Copy link
Contributor

@gmarkall changes LGTM other than checking in the script to automate the replacement

@gmarkall
Copy link
Contributor Author

Many thanks @kkraus14!

@ZzEeKkAa I believe this addresses the issue as in the reproducer. Would you be able to give it a test with your real code to ensure that it resolves the real issues for you please?

- In `serialize`, the `ir` module is from llvmlite, so it should not
  have been altered.
- I never added `block_types` as I mistakenly thought `ir.Block` was
  never instance checked in the code base. There is a usage of it like
  this. It may be dead code in Numba-CUDA, but I am adding `block_types`
  just in case it is not.
@gmarkall
Copy link
Contributor Author

/ok to test

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

20 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@rparolin rparolin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, as long as the script that @kkraus14 highlighted is removed before submitting.

@gmarkall gmarkall merged commit 309b030 into NVIDIA:main Dec 17, 2025
72 checks passed
gmarkall added a commit to gmarkall/numba-cuda that referenced this pull request Dec 17, 2025
- Fix NVIDIA#624: Accept Numba IR nodes in all places Numba-CUDA IR nodes are expected (NVIDIA#643)
- Fix Issue NVIDIA#588: separate compilation of NVVM IR modules when generating debuginfo (NVIDIA#591)
- feat: allow printing nested tuples (NVIDIA#667)
- build(deps): bump actions/setup-python from 5.6.0 to 6.1.0 (NVIDIA#655)
- build(deps): bump actions/upload-artifact from 4 to 5 (NVIDIA#652)
- Test RAPIDS 25.12 (NVIDIA#661)
- Do not manually set DUMP_ASSEMBLY in `nvjitlink` tests (NVIDIA#662)
- feat: add print support for int64 tuples (NVIDIA#663)
- Only run dependabot monthly and open fewer PRs (NVIDIA#658)
- test: fix bogus `self` argument to `Context` (NVIDIA#656)
- Fix false negative NRT link decision when NRT was previously toggled on (NVIDIA#650)
- Add support for dependabot (NVIDIA#647)
- refactor: cull dead linker objects (NVIDIA#649)
- Migrate numba-cuda driver to use cuda.core.launch API (NVIDIA#609)
- feat: add set_shared_memory_carveout (NVIDIA#629)
- chore: bump version in pixi.toml (NVIDIA#641)
- refactor: remove devicearray code to reduce complexity (NVIDIA#600)
@gmarkall gmarkall mentioned this pull request Dec 17, 2025
gmarkall added a commit that referenced this pull request Dec 17, 2025
- Capture global device arrays in kernels and device functions (#666)
- Fix #624: Accept Numba IR
nodes in all places Numba-CUDA IR nodes are expected
(#643)
- Fix Issue #588: separate
compilation of NVVM IR modules when generating debuginfo
(#591)
- feat: allow printing nested tuples
(#667)
- build(deps): bump actions/setup-python from 5.6.0 to 6.1.0
(#655)
- build(deps): bump actions/upload-artifact from 4 to 5
(#652)
- Test RAPIDS 25.12 (#661)
- Do not manually set DUMP_ASSEMBLY in `nvjitlink` tests
(#662)
- feat: add print support for int64 tuples
(#663)
- Only run dependabot monthly and open fewer PRs
(#658)
- test: fix bogus `self` argument to `Context`
(#656)
- Fix false negative NRT link decision when NRT was previously toggled
on (#650)
- Add support for dependabot
(#647)
- refactor: cull dead linker objects
(#649)
- Migrate numba-cuda driver to use cuda.core.launch API
(#609)
- feat: add set_shared_memory_carveout
(#629)
- chore: bump version in pixi.toml
(#641)
- refactor: remove devicearray code to reduce complexity
(#600)
ZzEeKkAa added a commit to ZzEeKkAa/numba-cuda that referenced this pull request Jan 8, 2026
v0.23.0

- Capture global device arrays in kernels and device functions (NVIDIA#666)
- Fix NVIDIA#624: Accept Numba IR nodes in all places Numba-CUDA IR nodes are expected (NVIDIA#643)
- Fix Issue NVIDIA#588: separate compilation of NVVM IR modules when generating debuginfo (NVIDIA#591)
- feat: allow printing nested tuples (NVIDIA#667)
- build(deps): bump actions/setup-python from 5.6.0 to 6.1.0 (NVIDIA#655)
- build(deps): bump actions/upload-artifact from 4 to 5 (NVIDIA#652)
- Test RAPIDS 25.12 (NVIDIA#661)
- Do not manually set DUMP_ASSEMBLY in `nvjitlink` tests (NVIDIA#662)
- feat: add print support for int64 tuples (NVIDIA#663)
- Only run dependabot monthly and open fewer PRs (NVIDIA#658)
- test: fix bogus `self` argument to `Context` (NVIDIA#656)
- Fix false negative NRT link decision when NRT was previously toggled on (NVIDIA#650)
- Add support for dependabot (NVIDIA#647)
- refactor: cull dead linker objects (NVIDIA#649)
- Migrate numba-cuda driver to use cuda.core.launch API (NVIDIA#609)
- feat: add set_shared_memory_carveout (NVIDIA#629)
- chore: bump version in pixi.toml (NVIDIA#641)
- refactor: remove devicearray code to reduce complexity (NVIDIA#600)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3 - Ready for Review Ready for review by team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants