-
Notifications
You must be signed in to change notification settings - Fork 54
Fix: Pass correct flags to linker when debugging in the presence of LTOIR code #698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Pass correct flags to linker when debugging in the presence of LTOIR code #698
Conversation
…TOIR code The linker code was passing in -lto to linker invocations that did not involve LTOIR code. When enabling debugging of a Numba CUDA kernel which calls into LTOIR code, an exception was being raised by nvjitlink. This change corrects that behavior, only passing in -lto for cases where at least one LTOIR code object is in the link list. The lto= parameter to the Linker initialization is still used to control compilation of .cu code with LTO enabled (which will result in the self._has_ltoir flag being set). A testcase for validating this change and catching regressions is included. Closes NVIDIA#696
Greptile SummaryCorrects linker flag handling to only pass Key Changes:
Issues Found:
Confidence Score: 2/5
Important Files Changed
|
|
/ok to test 94e6745 |
| add_from_numba = cuda.declare_device( | ||
| "add_from_numba", | ||
| "int32(int32, int32)", | ||
| link=["testing/test_device_functions.ltoir"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks wrong (hardcoding the path will not work in all cases) - instead it should be like in other test cases, e.g.:
numba-cuda/numba_cuda/numba/cuda/tests/cudapy/test_compiler.py
Lines 41 to 43 in 6a1db64
| test_device_functions_ltoir = os.path.join( | |
| TEST_BIN_DIR, "test_device_functions.ltoir" | |
| ) |
| self.arch = arch | ||
| if lto is False: | ||
| # WAR for apparent nvjitlink issue | ||
| lto = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this will be a problem with the 12.9 CUDA Python bindings - there was a version that would only accept None and not False, but I don't recall what the version was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I could tell from the CI so far, the removal of the None value did not cause an issue - perhaps it is no longer needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the tests that are now failing, it seems that this workaround is probably still needed.
|
/ok to test ec52710 |
|
Since the code changes in the non-test code looked good, I've just pushed the relevant fix for the test code to get this over the line. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 files reviewed, 3 comments
| add_from_numba = cuda.declare_device( | ||
| "add_from_numba", | ||
| "int32(int32, int32)", | ||
| link=[test_device_functions_ltoir], | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is defined outside the if TEST_BIN_DIR: block (lines 114-117) but references test_device_functions_ltoir which is only defined inside that block. When TEST_BIN_DIR is not set, importing this test file will raise NameError: name 'test_device_functions_ltoir' is not defined.
This code should be indented to be inside the if TEST_BIN_DIR: block, or the declarations should be conditional:
| add_from_numba = cuda.declare_device( | |
| "add_from_numba", | |
| "int32(int32, int32)", | |
| link=[test_device_functions_ltoir], | |
| ) | |
| if TEST_BIN_DIR: | |
| test_device_functions_ltoir = os.path.join( | |
| TEST_BIN_DIR, "test_device_functions.ltoir" | |
| ) | |
| add_from_numba = cuda.declare_device( | |
| "add_from_numba", | |
| "int32(int32, int32)", | |
| link=[test_device_functions_ltoir], | |
| ) | |
| def debuggable_kernel(result): | |
| i = cuda.grid(1) | |
| result[i] = add_from_numba(i, i) |
| def test_debug_kernel_with_lto(self): | ||
| cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test uses debuggable_kernel which references add_from_numba, but that function is only defined when TEST_BIN_DIR is set. The test will fail with NameError when TEST_BIN_DIR is not set.
The test should either:
- Be skipped when
TEST_BIN_DIRis not set, or - Be moved inside a conditional block
| def test_debug_kernel_with_lto(self): | |
| cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel) | |
| @unittest.skipIf(not os.getenv("NUMBA_CUDA_TEST_BIN_DIR"), "TEST_BIN_DIR not set") | |
| def test_debug_kernel_with_lto(self): | |
| cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel) |
Additional Comments (1)
When such files are added, the linker will incorrectly use Consider tracking LTOIR presence for these file types as well. This could be done by:
Example approach: def add_fatbin(self, fatbin, name="<cudapy-fatbin>", has_ltoir=False):
obj = ObjectCode.from_fatbin(fatbin, name=name)
self._object_codes.append(obj)
if has_ltoir:
self._has_ltoir = True
def add_object(self, obj, name="<cudapy-object>", has_ltoir=False):
obj = ObjectCode.from_object(obj, name=name)
self._object_codes.append(obj)
if has_ltoir:
self._has_ltoir = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, 2 comments
| def test_debug_kernel_with_lto(self): | ||
| cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test test_debug_kernel_with_lto references debuggable_kernel and add_from_numba (lines 127-129 and 120-124), which are only defined when TEST_BIN_DIR is set (line 114). When TEST_BIN_DIR is not set, this test will fail with a NameError.
This test needs a skip decorator like other tests in the codebase that depend on TEST_BIN_DIR:
| def test_debug_kernel_with_lto(self): | |
| cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel) | |
| @unittest.skipIf(not TEST_BIN_DIR, "necessary binaries not generated.") | |
| def test_debug_kernel_with_lto(self): |
See test_nrt.py line 182 and test_linkable_code.py line 40 for examples of the correct pattern.
Additional Comments (1)
While Note that the code in Consider updating these methods to detect and set def add_fatbin(self, fatbin, name="<cudapy-fatbin>"):
obj = ObjectCode.from_fatbin(fatbin, name=name)
self._object_codes.append(obj)
# Check if fatbin contains LTOIR and set flag if neededThis would ensure consistent behavior regardless of which file format contains the LTOIR code. |
|
/ok to test |
@gmarkall, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/ |
|
/ok to test 78e0634 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, 2 comments
| add_from_numba = cuda.declare_device( | ||
| "add_from_numba", | ||
| "int32(int32, int32)", | ||
| link=[test_device_functions_ltoir], | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code will cause a NameError when TEST_BIN_DIR is not set, because test_device_functions_ltoir is only defined inside the if TEST_BIN_DIR: block (lines 114-117). When the module loads, the cuda.declare_device() call will fail trying to access an undefined variable.
This will prevent the entire test module from importing when NUMBA_CUDA_TEST_BIN_DIR is not set in the environment.
The module-level declarations of add_from_numba and debuggable_kernel should be moved inside the if TEST_BIN_DIR: block, or the test method test_debug_kernel_with_lto should include a skip decorator similar to other tests in test_nvjitlink.py:
| add_from_numba = cuda.declare_device( | |
| "add_from_numba", | |
| "int32(int32, int32)", | |
| link=[test_device_functions_ltoir], | |
| ) | |
| if TEST_BIN_DIR: | |
| test_device_functions_ltoir = os.path.join( | |
| TEST_BIN_DIR, "test_device_functions.ltoir" | |
| ) | |
| add_from_numba = cuda.declare_device( | |
| "add_from_numba", | |
| "int32(int32, int32)", | |
| link=[test_device_functions_ltoir], | |
| ) | |
| def debuggable_kernel(result): | |
| i = cuda.grid(1) | |
| result[i] = add_from_numba(i, i) |
| def test_debug_kernel_with_lto(self): | ||
| cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test test_debug_kernel_with_lto will fail when TEST_BIN_DIR environment variable is not set because debuggable_kernel and add_from_numba are only defined when TEST_BIN_DIR is set (as suggested in the comment on lines 120-124).
This test should include a skip decorator similar to tests in test_nvjitlink.py:
| def test_debug_kernel_with_lto(self): | |
| cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel) | |
| @unittest.skipIf( | |
| not TEST_BIN_DIR, "TEST_BIN_DIR not set" | |
| ) | |
| def test_debug_kernel_with_lto(self): |
This reverts commit 78e0634.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, 2 comments
| TEST_BIN_DIR = os.getenv("NUMBA_CUDA_TEST_BIN_DIR") | ||
| if TEST_BIN_DIR: | ||
| test_device_functions_ltoir = os.path.join( | ||
| TEST_BIN_DIR, "test_device_functions.ltoir" | ||
| ) | ||
|
|
||
|
|
||
| add_from_numba = cuda.declare_device( | ||
| "add_from_numba", | ||
| "int32(int32, int32)", | ||
| link=[test_device_functions_ltoir], | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable test_device_functions_ltoir is only defined when TEST_BIN_DIR is set (line 114-117), but it's used unconditionally on line 123. This will cause a NameError when the test module is imported if TEST_BIN_DIR is not set in the environment, preventing the entire test module from loading.
This should either:
- Move the
add_from_numbaanddebuggable_kerneldefinitions inside the conditional block, or - Initialize
test_device_functions_ltoirto a default value (likeNone) and add appropriate guards
| TEST_BIN_DIR = os.getenv("NUMBA_CUDA_TEST_BIN_DIR") | |
| if TEST_BIN_DIR: | |
| test_device_functions_ltoir = os.path.join( | |
| TEST_BIN_DIR, "test_device_functions.ltoir" | |
| ) | |
| add_from_numba = cuda.declare_device( | |
| "add_from_numba", | |
| "int32(int32, int32)", | |
| link=[test_device_functions_ltoir], | |
| ) | |
| TEST_BIN_DIR = os.getenv("NUMBA_CUDA_TEST_BIN_DIR") | |
| if TEST_BIN_DIR: | |
| test_device_functions_ltoir = os.path.join( | |
| TEST_BIN_DIR, "test_device_functions.ltoir" | |
| ) | |
| add_from_numba = cuda.declare_device( | |
| "add_from_numba", | |
| "int32(int32, int32)", | |
| link=[test_device_functions_ltoir], | |
| ) | |
| def debuggable_kernel(result): | |
| i = cuda.grid(1) | |
| result[i] = add_from_numba(i, i) |
| def test_debug_kernel_with_lto(self): | ||
| cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test uses debuggable_kernel which references test_device_functions_ltoir that only exists when TEST_BIN_DIR is set. The test should have a skip decorator to avoid failures when the required LTOIR file is not available.
Consider adding a skip decorator similar to other tests in this file:
| def test_debug_kernel_with_lto(self): | |
| cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel) | |
| @unittest.skipUnless(TEST_BIN_DIR, "TEST_BIN_DIR not set") | |
| def test_debug_kernel_with_lto(self): | |
| cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel) |
gmarkall
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the old WAR is not related to the fails on CUDA <= 12.2 - this seems to be some other issue that needs debugging - accordingly, I've left the change in its original form, and the only change of mine that I kept is the fix so that the test can run.
I would ignore greptile, it is talking without enough idea about the larger context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, 2 comments
| add_from_numba = cuda.declare_device( | ||
| "add_from_numba", | ||
| "int32(int32, int32)", | ||
| link=[test_device_functions_ltoir], | ||
| ) | ||
|
|
||
|
|
||
| def debuggable_kernel(result): | ||
| i = cuda.grid(1) | ||
| result[i] = add_from_numba(i, i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Module will fail to import when NUMBA_CUDA_TEST_BIN_DIR is not set, since test_device_functions_ltoir is only defined inside the conditional block. Move these declarations inside the test method or make them conditional.
| add_from_numba = cuda.declare_device( | |
| "add_from_numba", | |
| "int32(int32, int32)", | |
| link=[test_device_functions_ltoir], | |
| ) | |
| def debuggable_kernel(result): | |
| i = cuda.grid(1) | |
| result[i] = add_from_numba(i, i) | |
| # Move these inside test_debug_kernel_with_lto method | |
| # or wrap in: if TEST_BIN_DIR: |
| def test_debug_kernel_with_lto(self): | ||
| cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Missing skip decorator for when TEST_BIN_DIR is not set. Test will fail with NameError accessing debuggable_kernel.
| def test_debug_kernel_with_lto(self): | |
| cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel) | |
| @unittest.skipIf(not TEST_BIN_DIR, "necessary binaries not generated.") | |
| def test_debug_kernel_with_lto(self): |
… WAR) WAR for cuda-core < 0.4.0 where passing False to link_time_optimization and ptx options incorrectly appends flags due to "is not None" check. Fixed in cuda-python PR #989, released in cuda-core v0.4.0.
|
/ok to test 0aa4138 |
|
@mmason-nvidia, I have a fix commited to my local repo, but not able to push to your fork. Could you please add me as a collaborator on your fork? |
|
/ok to test fa5ad37 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, 2 comments
| add_from_numba = cuda.declare_device( | ||
| "add_from_numba", | ||
| "int32(int32, int32)", | ||
| link=[test_device_functions_ltoir], | ||
| ) | ||
|
|
||
|
|
||
| def debuggable_kernel(result): | ||
| i = cuda.grid(1) | ||
| result[i] = add_from_numba(i, i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Module will fail to import when NUMBA_CUDA_TEST_BIN_DIR is not set. test_device_functions_ltoir is only defined inside the conditional block (lines 114-117), but is referenced unconditionally here.
Move these declarations inside the if TEST_BIN_DIR: block:
| add_from_numba = cuda.declare_device( | |
| "add_from_numba", | |
| "int32(int32, int32)", | |
| link=[test_device_functions_ltoir], | |
| ) | |
| def debuggable_kernel(result): | |
| i = cuda.grid(1) | |
| result[i] = add_from_numba(i, i) | |
| if TEST_BIN_DIR: | |
| test_device_functions_ltoir = os.path.join( | |
| TEST_BIN_DIR, "test_device_functions.ltoir" | |
| ) | |
| add_from_numba = cuda.declare_device( | |
| "add_from_numba", | |
| "int32(int32, int32)", | |
| link=[test_device_functions_ltoir], | |
| ) | |
| def debuggable_kernel(result): | |
| i = cuda.grid(1) | |
| result[i] = add_from_numba(i, i) |
| def test_debug_kernel_with_lto(self): | ||
| cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Missing skip decorator. Test will fail with NameError when TEST_BIN_DIR is not set.
| def test_debug_kernel_with_lto(self): | |
| cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel) | |
| @unittest.skipIf(not TEST_BIN_DIR, "necessary binaries not generated.") | |
| def test_debug_kernel_with_lto(self): | |
| cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel) |
|
I think I see the fix so I just pushed it so ti can get tested whilst CI is (hopefully) not too busy. |
Thanks a lot, @gmarkall, for helping to push the fix in. Just wondering how you did it, because I tried to push and were not successful. |
|
I did where that remote is |
- Add Python 3.14 to the wheel publishing matrix (NVIDIA#750) - feat: swap out internal device array usage with `StridedMemoryView` (NVIDIA#703) - Fix max block size computation in `forall` (NVIDIA#744) - Fix prologue debug line info pointing to decorator instead of def line (NVIDIA#746) - Fix kernel return type in DISubroutineType debug metadata (NVIDIA#745) - Fix missing line info in Jupyter notebooks (NVIDIA#742) - Fix: Pass correct flags to linker when debugging in the presence of LTOIR code (NVIDIA#698) - chore(deps): add cuda-pathfinder to pixi deps (NVIDIA#741) - fix: enable flake8-bugbear lints and fix found problems (NVIDIA#708) - fix: Fix race condition in CUDA Simulator (NVIDIA#690) - ci: run tests in parallel (NVIDIA#740) - feat: users can pass `shared_memory_carveout` to @cuda.jit (NVIDIA#642) - Fix compatibility with NumPy 2.4: np.trapz and np.in1d removed (NVIDIA#739) - Pass the -numba-debug flag to libnvvm (NVIDIA#681) - ci: remove rapids containers from conda ci (NVIDIA#737) - Use `pathfinder` for dynamic libraries (NVIDIA#308) - CI: Add CUDA 13.1 testing support (NVIDIA#705) - Adding `pixi run test` and `pixi run test-par` support (NVIDIA#724) - Disable per-PR nvmath tests + follow same test practice (NVIDIA#723) - chore(deps): regenerate pixi lockfile (NVIDIA#722) - Fix DISubprogram line number to point to function definition line (NVIDIA#695) - revert: chore(dev): build pixi using rattler (NVIDIA#713) (NVIDIA#719) - [feat] Initial version of the Numba CUDA GDB pretty-printer (NVIDIA#692) - chore(dev): build pixi using rattler (NVIDIA#713) - build(deps): bump the actions-monthly group across 1 directory with 8 updates (NVIDIA#704)
- Add Python 3.14 to the wheel publishing matrix (#750) - feat: swap out internal device array usage with `StridedMemoryView` (#703) - Fix max block size computation in `forall` (#744) - Fix prologue debug line info pointing to decorator instead of def line (#746) - Fix kernel return type in DISubroutineType debug metadata (#745) - Fix missing line info in Jupyter notebooks (#742) - Fix: Pass correct flags to linker when debugging in the presence of LTOIR code (#698) - chore(deps): add cuda-pathfinder to pixi deps (#741) - fix: enable flake8-bugbear lints and fix found problems (#708) - fix: Fix race condition in CUDA Simulator (#690) - ci: run tests in parallel (#740) - feat: users can pass `shared_memory_carveout` to @cuda.jit (#642) - Fix compatibility with NumPy 2.4: np.trapz and np.in1d removed (#739) - Pass the -numba-debug flag to libnvvm (#681) - ci: remove rapids containers from conda ci (#737) - Use `pathfinder` for dynamic libraries (#308) - CI: Add CUDA 13.1 testing support (#705) - Adding `pixi run test` and `pixi run test-par` support (#724) - Disable per-PR nvmath tests + follow same test practice (#723) - chore(deps): regenerate pixi lockfile (#722) - Fix DISubprogram line number to point to function definition line (#695) - revert: chore(dev): build pixi using rattler (#713) (#719) - [feat] Initial version of the Numba CUDA GDB pretty-printer (#692) - chore(dev): build pixi using rattler (#713) - build(deps): bump the actions-monthly group across 1 directory with 8 updates (#704) <!-- Thank you for contributing to numba-cuda :) Here are some guidelines to help the review process go smoothly. 1. Please write a description in this text box of the changes that are being made. 2. Please ensure that you have written units tests for the changes made/features added. 3. If you are closing an issue please use one of the automatic closing words as noted here: https://help.github.com/articles/closing-issues-using-keywords/ 4. If your pull request is not ready for review but you want to make use of the continuous integration testing facilities please label it with `[WIP]`. 5. If your pull request is ready to be reviewed without requiring additional work on top of it, then remove the `[WIP]` label (if present) and replace it with `[REVIEW]`. If assistance is required to complete the functionality, for example when the C/C++ code of a feature is complete but Python bindings are still required, then add the label `[HELP-REQ]` so that others can triage and assist. The additional changes then can be implemented on top of the same PR. If the assistance is done by members of the rapidsAI team, then no additional actions are required by the creator of the original PR for this, otherwise the original author of the PR needs to give permission to the person(s) assisting to commit to their personal fork of the project. If that doesn't happen then a new PR based on the code of the original PR can be opened by the person assisting, which then will be the PR that will be merged. 6. Once all work has been done and review has taken place please do not add features or make changes out of the scope of those requested by the reviewer (doing this just add delays as already reviewed code ends up having to be re-reviewed/it is hard to tell what is new etc!). Further, please do not rebase your branch on main/force push/rewrite history, doing any of these causes the context of any comments made by reviewers to be lost. If conflicts occur against main they should be resolved by merging main into the branch used for making the pull request. Many thanks in advance for your cooperation! -->
The linker code was passing in -lto to linker invocations that did not involve LTOIR code, and not passing it in some cases where LTOIR code was being linked. When enabling debugging of a Numba CUDA kernel which calls into LTOIR code, an exception was being raised by nvjitlink.
This change corrects that behavior, only passing in -lto for cases where at least one LTOIR code object is in the link list. The lto= parameter to the Linker initialization is still used to control compilation of .cu code with LTO enabled (which will result in the self._has_ltoir flag being set).
A testcase for validating this change and catching regressions is included.
Closes #696