-
Notifications
You must be signed in to change notification settings - Fork 54
Fix false negative NRT link decision when NRT was previously toggled on #650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix false negative NRT link decision when NRT was previously toggled on #650
Conversation
|
Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
Greptile OverviewGreptile SummaryThis PR fixes an NRT (Numba Runtime) caching bug that caused linker errors when NRT was toggled between kernel compilations.
Confidence Score: 5/5
Important Files ChangedFile Analysis
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (1)
-
numba_cuda/numba/cuda/dispatcher.py, line 231-235 (link)style: Docstring is now outdated. It says "NRT must be enabled for the CUDATargetContext" but the code now also proceeds when
_LINKED_NRTisTrue(i.e., when NRT was previously linked in this session).
1 file reviewed, 1 comment
|
Can the reproducer in the description be turned into a test case? |
|
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, no comments
|
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (2)
-
numba_cuda/numba/cuda/tests/nrt/test_nrt.py, line 386 (link)logic:
_Kernel._LINKED_NRTdoes not exist in the codebase. This line was added when_LINKED_NRTexisted in a previous commit, but that commit was reverted. This will raiseAttributeError: type object '_Kernel' has no attribute '_LINKED_NRT'when the test runs. Either remove this line or re-add the_LINKED_NRTattribute to the_Kernelclass. -
numba_cuda/numba/cuda/tests/nrt/test_nrt.py, line 396-406 (link)style: Config changes not restored after test. Consider using
override_configcontext manager (used elsewhere in this file) or add cleanup to restoreconfig.CUDA_ENABLE_NRTto its original value to prevent test pollution.
2 files reviewed, 2 comments
|
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, no comments
- Fix NVIDIA#624: Accept Numba IR nodes in all places Numba-CUDA IR nodes are expected (NVIDIA#643) - Fix Issue NVIDIA#588: separate compilation of NVVM IR modules when generating debuginfo (NVIDIA#591) - feat: allow printing nested tuples (NVIDIA#667) - build(deps): bump actions/setup-python from 5.6.0 to 6.1.0 (NVIDIA#655) - build(deps): bump actions/upload-artifact from 4 to 5 (NVIDIA#652) - Test RAPIDS 25.12 (NVIDIA#661) - Do not manually set DUMP_ASSEMBLY in `nvjitlink` tests (NVIDIA#662) - feat: add print support for int64 tuples (NVIDIA#663) - Only run dependabot monthly and open fewer PRs (NVIDIA#658) - test: fix bogus `self` argument to `Context` (NVIDIA#656) - Fix false negative NRT link decision when NRT was previously toggled on (NVIDIA#650) - Add support for dependabot (NVIDIA#647) - refactor: cull dead linker objects (NVIDIA#649) - Migrate numba-cuda driver to use cuda.core.launch API (NVIDIA#609) - feat: add set_shared_memory_carveout (NVIDIA#629) - chore: bump version in pixi.toml (NVIDIA#641) - refactor: remove devicearray code to reduce complexity (NVIDIA#600)
- Capture global device arrays in kernels and device functions (#666) - Fix #624: Accept Numba IR nodes in all places Numba-CUDA IR nodes are expected (#643) - Fix Issue #588: separate compilation of NVVM IR modules when generating debuginfo (#591) - feat: allow printing nested tuples (#667) - build(deps): bump actions/setup-python from 5.6.0 to 6.1.0 (#655) - build(deps): bump actions/upload-artifact from 4 to 5 (#652) - Test RAPIDS 25.12 (#661) - Do not manually set DUMP_ASSEMBLY in `nvjitlink` tests (#662) - feat: add print support for int64 tuples (#663) - Only run dependabot monthly and open fewer PRs (#658) - test: fix bogus `self` argument to `Context` (#656) - Fix false negative NRT link decision when NRT was previously toggled on (#650) - Add support for dependabot (#647) - refactor: cull dead linker objects (#649) - Migrate numba-cuda driver to use cuda.core.launch API (#609) - feat: add set_shared_memory_carveout (#629) - chore: bump version in pixi.toml (#641) - refactor: remove devicearray code to reduce complexity (#600)
v0.23.0 - Capture global device arrays in kernels and device functions (NVIDIA#666) - Fix NVIDIA#624: Accept Numba IR nodes in all places Numba-CUDA IR nodes are expected (NVIDIA#643) - Fix Issue NVIDIA#588: separate compilation of NVVM IR modules when generating debuginfo (NVIDIA#591) - feat: allow printing nested tuples (NVIDIA#667) - build(deps): bump actions/setup-python from 5.6.0 to 6.1.0 (NVIDIA#655) - build(deps): bump actions/upload-artifact from 4 to 5 (NVIDIA#652) - Test RAPIDS 25.12 (NVIDIA#661) - Do not manually set DUMP_ASSEMBLY in `nvjitlink` tests (NVIDIA#662) - feat: add print support for int64 tuples (NVIDIA#663) - Only run dependabot monthly and open fewer PRs (NVIDIA#658) - test: fix bogus `self` argument to `Context` (NVIDIA#656) - Fix false negative NRT link decision when NRT was previously toggled on (NVIDIA#650) - Add support for dependabot (NVIDIA#647) - refactor: cull dead linker objects (NVIDIA#649) - Migrate numba-cuda driver to use cuda.core.launch API (NVIDIA#609) - feat: add set_shared_memory_carveout (NVIDIA#629) - chore: bump version in pixi.toml (NVIDIA#641) - refactor: remove devicearray code to reduce complexity (NVIDIA#600)
This PR fixes an NRT caching bug encountered while running groups of tests for a separate task.
NRT currently must be toggleable, because otherwise the lack of the proper incref/decref pruning pass from upstream numba makes the test suite walltime intractable. When enabled, numba may inject incref/decref calls into generated code. Later on, numba inspects the code to be linked for these calls to determine if the other half of the NRT library needs to be linked. However the current logic answers "no" if NRT is not currently enabled.
This may cause linker errors down the line if we get a cache hit and the setting is later toggled off. This may be observed via:
This PR fixes the issue by remembering if NRT was ever turned on for this python session and avoiding the short circuit that decides not to link NRT if it's not currently enabled.