Fix false negative NRT link decision when NRT was previously toggled on #650

brandon-b-miller · 2025-12-10T23:55:16Z

This PR fixes an NRT caching bug encountered while running groups of tests for a separate task.

NRT currently must be toggleable, because otherwise the lack of the proper incref/decref pruning pass from upstream numba makes the test suite walltime intractable. When enabled, numba may inject incref/decref calls into generated code. Later on, numba inspects the code to be linked for these calls to determine if the other half of the NRT library needs to be linked. However the current logic answers "no" if NRT is not currently enabled.

This may cause linker errors down the line if we get a cache hit and the setting is later toggled off. This may be observed via:

def array_reshape1d(arr, newshape, got):
    y = arr.reshape(newshape)
    for i in range(y.shape[0]):
        got[i] = y[i]



def array_reshape(arr, newshape):
    return arr.reshape(newshape)


config.CUDA_ENABLE_NRT=1
@cuda.jit
def kernel(out):
    out[0] = np.min(np.array([1.0, 2.0, 0.0, -0.0, 1.0, -1.5]))

out = cp.zeros(1, dtype=cp.float64)      
kernel[1, 1](out)
config.CUDA_ENABLE_NRT=0


arr = np.arange(24)
kernel2 = cuda.jit(array_reshape1d)
expected = array_reshape(arr, (24,))
got = np.zeros(expected.shape, dtype=arr.dtype)
kernel2[1, 1](arr, (24,), got)

cuda.bindings.nvjitlink.nvJitLinkError: ERROR_PTX_COMPILE (4)                                                                                                                                                                                              
Linker error log: ptxas fatal   : Unresolved extern function 'NRT_incref'                                                                                                                                                                                  
ERROR NVJITLINK_ERROR_PTX_COMPILE: JIT the PTX (ltoPtx)

This PR fixes the issue by remembering if NRT was ever turned on for this python session and avoiding the short circuit that decides not to link NRT if it's not currently enabled.

copy-pr-bot · 2025-12-10T23:55:20Z

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

brandon-b-miller · 2025-12-10T23:55:32Z

/ok to test

greptile-apps · 2025-12-10T23:57:34Z

Greptile Overview

Greptile Summary

This PR fixes an NRT (Numba Runtime) caching bug that caused linker errors when NRT was toggled between kernel compilations.

Root cause: The compile_subroutine method's cache key did not include NRT state, allowing cached NRT-enabled code to be reused when NRT was later disabled
Fix: Adds self.enable_nrt to the cache key in base.py, ensuring subroutines compiled with different NRT states are cached separately
Test: Adds regression test that compiles a kernel with NRT enabled, then compiles another kernel with NRT disabled that uses the same reshape primitive, verifying no linker errors occur

Confidence Score: 5/5

This PR is safe to merge - it's a targeted bug fix with a regression test that adds NRT state to the cache key
The fix is minimal, well-targeted, and addresses the root cause correctly. Adding NRT state to the cache key is the appropriate solution to prevent false cache hits. The regression test validates the fix by reproducing the exact scenario described in the PR.
No files require special attention

Important Files Changed

File Analysis

Filename	Score	Overview
numba_cuda/numba/cuda/core/base.py	5/5	Adds `self.enable_nrt` to the subroutine cache key to prevent cache hits when NRT state differs, fixing a bug where NRT-enabled code could be reused when NRT is disabled.
numba_cuda/numba/cuda/tests/nrt/test_nrt.py	5/5	Adds regression test for the NRT toggle caching bug, verifying that kernels work correctly when NRT is enabled for one kernel then disabled for another that reuses cached primitives.

greptile-apps

Additional Comments (1)

numba_cuda/numba/cuda/dispatcher.py, line 231-235 (link)

style: Docstring is now outdated. It says "NRT must be enabled for the CUDATargetContext" but the code now also proceeds when _LINKED_NRT is True (i.e., when NRT was previously linked in this session).

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

gmarkall · 2025-12-11T09:52:29Z

Can the reproducer in the description be turned into a test case?

brandon-b-miller · 2025-12-11T11:39:07Z

/ok to test

greptile-apps

_{2 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

This reverts commit 03aadb1.

This reverts commit 61fb971.

brandon-b-miller · 2025-12-11T14:27:09Z

/ok to test

greptile-apps

Additional Comments (2)

numba_cuda/numba/cuda/tests/nrt/test_nrt.py, line 386 (link)

logic: _Kernel._LINKED_NRT does not exist in the codebase. This line was added when _LINKED_NRT existed in a previous commit, but that commit was reverted. This will raise AttributeError: type object '_Kernel' has no attribute '_LINKED_NRT' when the test runs. Either remove this line or re-add the _LINKED_NRT attribute to the _Kernel class.
numba_cuda/numba/cuda/tests/nrt/test_nrt.py, line 396-406 (link)

style: Config changes not restored after test. Consider using override_config context manager (used elsewhere in this file) or add cleanup to restore config.CUDA_ENABLE_NRT to its original value to prevent test pollution.

_{2 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

brandon-b-miller · 2025-12-11T14:37:56Z

/ok to test

greptile-apps

_{2 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

- Fix NVIDIA#624: Accept Numba IR nodes in all places Numba-CUDA IR nodes are expected (NVIDIA#643) - Fix Issue NVIDIA#588: separate compilation of NVVM IR modules when generating debuginfo (NVIDIA#591) - feat: allow printing nested tuples (NVIDIA#667) - build(deps): bump actions/setup-python from 5.6.0 to 6.1.0 (NVIDIA#655) - build(deps): bump actions/upload-artifact from 4 to 5 (NVIDIA#652) - Test RAPIDS 25.12 (NVIDIA#661) - Do not manually set DUMP_ASSEMBLY in `nvjitlink` tests (NVIDIA#662) - feat: add print support for int64 tuples (NVIDIA#663) - Only run dependabot monthly and open fewer PRs (NVIDIA#658) - test: fix bogus `self` argument to `Context` (NVIDIA#656) - Fix false negative NRT link decision when NRT was previously toggled on (NVIDIA#650) - Add support for dependabot (NVIDIA#647) - refactor: cull dead linker objects (NVIDIA#649) - Migrate numba-cuda driver to use cuda.core.launch API (NVIDIA#609) - feat: add set_shared_memory_carveout (NVIDIA#629) - chore: bump version in pixi.toml (NVIDIA#641) - refactor: remove devicearray code to reduce complexity (NVIDIA#600)

- Capture global device arrays in kernels and device functions (#666) - Fix #624: Accept Numba IR nodes in all places Numba-CUDA IR nodes are expected (#643) - Fix Issue #588: separate compilation of NVVM IR modules when generating debuginfo (#591) - feat: allow printing nested tuples (#667) - build(deps): bump actions/setup-python from 5.6.0 to 6.1.0 (#655) - build(deps): bump actions/upload-artifact from 4 to 5 (#652) - Test RAPIDS 25.12 (#661) - Do not manually set DUMP_ASSEMBLY in `nvjitlink` tests (#662) - feat: add print support for int64 tuples (#663) - Only run dependabot monthly and open fewer PRs (#658) - test: fix bogus `self` argument to `Context` (#656) - Fix false negative NRT link decision when NRT was previously toggled on (#650) - Add support for dependabot (#647) - refactor: cull dead linker objects (#649) - Migrate numba-cuda driver to use cuda.core.launch API (#609) - feat: add set_shared_memory_carveout (#629) - chore: bump version in pixi.toml (#641) - refactor: remove devicearray code to reduce complexity (#600)

v0.23.0 - Capture global device arrays in kernels and device functions (NVIDIA#666) - Fix NVIDIA#624: Accept Numba IR nodes in all places Numba-CUDA IR nodes are expected (NVIDIA#643) - Fix Issue NVIDIA#588: separate compilation of NVVM IR modules when generating debuginfo (NVIDIA#591) - feat: allow printing nested tuples (NVIDIA#667) - build(deps): bump actions/setup-python from 5.6.0 to 6.1.0 (NVIDIA#655) - build(deps): bump actions/upload-artifact from 4 to 5 (NVIDIA#652) - Test RAPIDS 25.12 (NVIDIA#661) - Do not manually set DUMP_ASSEMBLY in `nvjitlink` tests (NVIDIA#662) - feat: add print support for int64 tuples (NVIDIA#663) - Only run dependabot monthly and open fewer PRs (NVIDIA#658) - test: fix bogus `self` argument to `Context` (NVIDIA#656) - Fix false negative NRT link decision when NRT was previously toggled on (NVIDIA#650) - Add support for dependabot (NVIDIA#647) - refactor: cull dead linker objects (NVIDIA#649) - Migrate numba-cuda driver to use cuda.core.launch API (NVIDIA#609) - feat: add set_shared_memory_carveout (NVIDIA#629) - chore: bump version in pixi.toml (NVIDIA#641) - refactor: remove devicearray code to reduce complexity (NVIDIA#600)

check again if NRT was ever enabled

03aadb1

greptile-apps bot reviewed Dec 10, 2025

View reviewed changes

gmarkall added the 4 - Waiting on author Waiting for author to respond to review label Dec 11, 2025

brandon-b-miller added 2 commits December 11, 2025 03:37

add test

399acb3

wrangle greptile

61fb971

brandon-b-miller added 3 - Ready for Review Ready for review by team and removed 4 - Waiting on author Waiting for author to respond to review labels Dec 11, 2025

greptile-apps bot reviewed Dec 11, 2025

View reviewed changes

brandon-b-miller added 3 commits December 11, 2025 06:25

Revert "check again if NRT was ever enabled"

a530f53

This reverts commit 03aadb1.

cache on NRT state

8cb274d

Revert "wrangle greptile"

53f4048

This reverts commit 61fb971.

greptile-apps bot reviewed Dec 11, 2025

View reviewed changes

override config

44313f4

greptile-apps bot reviewed Dec 11, 2025

View reviewed changes

gmarkall approved these changes Dec 11, 2025

View reviewed changes

gmarkall added 5 - Ready to merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Dec 11, 2025

brandon-b-miller merged commit 30b024f into NVIDIA:main Dec 11, 2025
71 checks passed

brandon-b-miller deleted the fix-nrt-caching branch December 11, 2025 15:10

gmarkall mentioned this pull request Dec 17, 2025

Bump version to 0.23.0 #668

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix false negative NRT link decision when NRT was previously toggled on #650

Fix false negative NRT link decision when NRT was previously toggled on #650

Uh oh!

brandon-b-miller commented Dec 10, 2025

Uh oh!

copy-pr-bot bot commented Dec 10, 2025

Uh oh!

brandon-b-miller commented Dec 10, 2025

Uh oh!

greptile-apps bot commented Dec 10, 2025 •

edited

Loading

Uh oh!

greptile-apps bot left a comment •

edited

Loading

Uh oh!

gmarkall commented Dec 11, 2025

Uh oh!

brandon-b-miller commented Dec 11, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

brandon-b-miller commented Dec 11, 2025

Uh oh!

greptile-apps bot left a comment •

edited

Loading

Uh oh!

brandon-b-miller commented Dec 11, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix false negative NRT link decision when NRT was previously toggled on #650

Fix false negative NRT link decision when NRT was previously toggled on #650

Uh oh!

Conversation

brandon-b-miller commented Dec 10, 2025

Uh oh!

copy-pr-bot bot commented Dec 10, 2025

Uh oh!

brandon-b-miller commented Dec 10, 2025

Uh oh!

greptile-apps bot commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Additional Comments (1)

Uh oh!

gmarkall commented Dec 11, 2025

Uh oh!

brandon-b-miller commented Dec 11, 2025

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

brandon-b-miller commented Dec 11, 2025

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Additional Comments (2)

Uh oh!

brandon-b-miller commented Dec 11, 2025

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps bot commented Dec 10, 2025 •

edited

Loading

greptile-apps bot left a comment •

edited

Loading

greptile-apps bot left a comment •

edited

Loading