Fix: Pass correct flags to linker when debugging in the presence of LTOIR code #698

mmason-nvidia · 2026-01-07T00:03:56Z

The linker code was passing in -lto to linker invocations that did not involve LTOIR code, and not passing it in some cases where LTOIR code was being linked. When enabling debugging of a Numba CUDA kernel which calls into LTOIR code, an exception was being raised by nvjitlink.

This change corrects that behavior, only passing in -lto for cases where at least one LTOIR code object is in the link list. The lto= parameter to the Linker initialization is still used to control compilation of .cu code with LTO enabled (which will result in the self._has_ltoir flag being set).

A testcase for validating this change and catching regressions is included.

Closes #696

…TOIR code The linker code was passing in -lto to linker invocations that did not involve LTOIR code. When enabling debugging of a Numba CUDA kernel which calls into LTOIR code, an exception was being raised by nvjitlink. This change corrects that behavior, only passing in -lto for cases where at least one LTOIR code object is in the link list. The lto= parameter to the Linker initialization is still used to control compilation of .cu code with LTO enabled (which will result in the self._has_ltoir flag being set). A testcase for validating this change and catching regressions is included. Closes NVIDIA#696

copy-pr-bot · 2026-01-07T00:04:00Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

greptile-apps · 2026-01-07T00:07:47Z

Greptile Summary

Corrects linker flag handling to only pass -lto when LTOIR code is actually present, fixing ERROR_LTO_NOT_ENABLED when debugging kernels with LTO dependencies.

Key Changes:

Introduced _has_ltoir flag to track presence of LTOIR object code
Refactored linker options to be set at link time based on actual code types rather than constructor parameter
Flag is set when .cu files are compiled with lto=True or when .ltoir files are added
Removed workaround for passing False as None (now handled by passing None when not needed)

Issues Found:

Test module will fail to import when NUMBA_CUDA_TEST_BIN_DIR environment variable is not set due to module-level variable reference before conditional definition

Confidence Score: 2/5

This PR has critical import errors that will break test infrastructure
The driver.py logic changes are sound and fix the reported issue correctly, but the test file has a critical bug that causes NameError when the module is imported without TEST_BIN_DIR set, breaking the test suite
Pay close attention to numba_cuda/numba/cuda/tests/cudadrv/test_linker.py - fix the module-level declarations before merging

Important Files Changed

Filename	Overview
numba_cuda/numba/cuda/cudadrv/driver.py	Refactored linker to pass `-lto` flag only when LTOIR code is present, fixing debug mode errors
numba_cuda/numba/cuda/tests/cudadrv/test_linker.py	Added test for debug kernel with LTO, but has critical import error when `TEST_BIN_DIR` is not set

gmarkall · 2026-01-12T15:12:03Z

/ok to test 94e6745

gmarkall · 2026-01-12T15:14:33Z

numba_cuda/numba/cuda/tests/cudadrv/test_linker.py

+add_from_numba = cuda.declare_device(
+    "add_from_numba",
+    "int32(int32, int32)",
+    link=["testing/test_device_functions.ltoir"],


This looks wrong (hardcoding the path will not work in all cases) - instead it should be like in other test cases, e.g.:

numba-cuda/numba_cuda/numba/cuda/tests/cudapy/test_compiler.py

Lines 41 to 43 in 6a1db64

test_device_functions_ltoir = os.path.join(

TEST_BIN_DIR, "test_device_functions.ltoir"

)

gmarkall · 2026-01-12T15:18:31Z

numba_cuda/numba/cuda/cudadrv/driver.py

        self.arch = arch
-        if lto is False:
-            # WAR for apparent nvjitlink issue
-            lto = None


I wonder if this will be a problem with the 12.9 CUDA Python bindings - there was a version that would only accept None and not False, but I don't recall what the version was.

As far as I could tell from the CI so far, the removal of the None value did not cause an issue - perhaps it is no longer needed.

Looking at the tests that are now failing, it seems that this workaround is probably still needed.

gmarkall · 2026-01-12T15:57:28Z

/ok to test ec52710

gmarkall · 2026-01-12T15:59:23Z

Since the code changes in the non-test code looked good, I've just pushed the relevant fix for the test code to get this over the line.

greptile-apps

_{3 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-12T16:03:57Z

numba_cuda/numba/cuda/tests/cudadrv/test_linker.py

+add_from_numba = cuda.declare_device(
+    "add_from_numba",
+    "int32(int32, int32)",
+    link=[test_device_functions_ltoir],
+)


This code is defined outside the if TEST_BIN_DIR: block (lines 114-117) but references test_device_functions_ltoir which is only defined inside that block. When TEST_BIN_DIR is not set, importing this test file will raise NameError: name 'test_device_functions_ltoir' is not defined.

This code should be indented to be inside the if TEST_BIN_DIR: block, or the declarations should be conditional:

Suggested change

add_from_numba = cuda.declare_device(

"add_from_numba",

"int32(int32, int32)",

link=[test_device_functions_ltoir],

)

if TEST_BIN_DIR:

test_device_functions_ltoir = os.path.join(

TEST_BIN_DIR, "test_device_functions.ltoir"

)

add_from_numba = cuda.declare_device(

"add_from_numba",

"int32(int32, int32)",

link=[test_device_functions_ltoir],

)

def debuggable_kernel(result):

i = cuda.grid(1)

result[i] = add_from_numba(i, i)

greptile-apps · 2026-01-12T16:03:58Z

numba_cuda/numba/cuda/tests/cudadrv/test_linker.py

+    def test_debug_kernel_with_lto(self):
+        cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel)


This test uses debuggable_kernel which references add_from_numba, but that function is only defined when TEST_BIN_DIR is set. The test will fail with NameError when TEST_BIN_DIR is not set.

The test should either:

Be skipped when TEST_BIN_DIR is not set, or

Be moved inside a conditional block

Suggested change

def test_debug_kernel_with_lto(self):

cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel)

@unittest.skipIf(not os.getenv("NUMBA_CUDA_TEST_BIN_DIR"), "TEST_BIN_DIR not set")

def test_debug_kernel_with_lto(self):

cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel)

greptile-apps · 2026-01-12T16:04:04Z

Additional Comments (1)

numba_cuda/numba/cuda/cudadrv/driver.py
The add_fatbin() and add_object() methods don't set _has_ltoir = True, but object files and fatbin files can contain embedded LTOIR/NVVM code (as indicated by the inspect_obj_content checks in add_file_guess_ext at lines 2354-2356 and 2385-2386).

When such files are added, the linker will incorrectly use link_time_optimization=False and ptx=False in the linker options, potentially causing the same linking errors this PR aims to fix.

Consider tracking LTOIR presence for these file types as well. This could be done by:

Accepting an optional parameter to indicate LTOIR content, or
Using inspect_obj_content to check for NVVM entries when these methods are called

Example approach:

def add_fatbin(self, fatbin, name="<cudapy-fatbin>", has_ltoir=False):
    obj = ObjectCode.from_fatbin(fatbin, name=name)
    self._object_codes.append(obj)
    if has_ltoir:
        self._has_ltoir = True

def add_object(self, obj, name="<cudapy-object>", has_ltoir=False):
    obj = ObjectCode.from_object(obj, name=name)
    self._object_codes.append(obj)
    if has_ltoir:
        self._has_ltoir = True

greptile-apps

_{2 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-12T16:07:09Z

numba_cuda/numba/cuda/tests/cudadrv/test_linker.py

+    def test_debug_kernel_with_lto(self):
+        cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel)


The test test_debug_kernel_with_lto references debuggable_kernel and add_from_numba (lines 127-129 and 120-124), which are only defined when TEST_BIN_DIR is set (line 114). When TEST_BIN_DIR is not set, this test will fail with a NameError.

This test needs a skip decorator like other tests in the codebase that depend on TEST_BIN_DIR:

Suggested change

def test_debug_kernel_with_lto(self):

cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel)

@unittest.skipIf(not TEST_BIN_DIR, "necessary binaries not generated.")

def test_debug_kernel_with_lto(self):

See test_nrt.py line 182 and test_linkable_code.py line 40 for examples of the correct pattern.

greptile-apps · 2026-01-12T16:07:15Z

Additional Comments (1)

numba_cuda/numba/cuda/cudadrv/driver.py
The _has_ltoir flag is not being set when adding object files (.o), fatbins (.fatbin), or libraries (.a/.lib) that may contain LTOIR/NVVM code. This could lead to the same bug this PR is fixing - the linker won't pass -lto even when LTOIR code is present in these file types.

While add_ltoir() correctly sets the flag (line 2446), and add_cu() sets it when self.lto is true (line 2436-2437), the methods add_fatbin(), add_object(), and add_library() do not set this flag.

Note that the code in add_file_guess_ext() (lines 2354-2356) uses inspect_obj_content() to detect LTOIR in objects/fatbins, but only for the ignore_nonlto logic - it doesn't set _has_ltoir.

Consider updating these methods to detect and set _has_ltoir when appropriate, similar to how add_file_guess_ext() checks for "nvvm" entries. For example:

def add_fatbin(self, fatbin, name="<cudapy-fatbin>"):
    obj = ObjectCode.from_fatbin(fatbin, name=name)
    self._object_codes.append(obj)
    # Check if fatbin contains LTOIR and set flag if needed

This would ensure consistent behavior regardless of which file format contains the LTOIR code.

gmarkall · 2026-01-12T17:51:40Z

/ok to test

copy-pr-bot · 2026-01-12T17:51:43Z

/ok to test

@gmarkall, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

gmarkall · 2026-01-12T17:51:47Z

/ok to test 78e0634

greptile-apps

_{2 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-12T17:55:52Z

numba_cuda/numba/cuda/tests/cudadrv/test_linker.py

+add_from_numba = cuda.declare_device(
+    "add_from_numba",
+    "int32(int32, int32)",
+    link=[test_device_functions_ltoir],
+)


This code will cause a NameError when TEST_BIN_DIR is not set, because test_device_functions_ltoir is only defined inside the if TEST_BIN_DIR: block (lines 114-117). When the module loads, the cuda.declare_device() call will fail trying to access an undefined variable.

This will prevent the entire test module from importing when NUMBA_CUDA_TEST_BIN_DIR is not set in the environment.

The module-level declarations of add_from_numba and debuggable_kernel should be moved inside the if TEST_BIN_DIR: block, or the test method test_debug_kernel_with_lto should include a skip decorator similar to other tests in test_nvjitlink.py:

Suggested change

add_from_numba = cuda.declare_device(

"add_from_numba",

"int32(int32, int32)",

link=[test_device_functions_ltoir],

)

if TEST_BIN_DIR:

test_device_functions_ltoir = os.path.join(

TEST_BIN_DIR, "test_device_functions.ltoir"

)

add_from_numba = cuda.declare_device(

"add_from_numba",

"int32(int32, int32)",

link=[test_device_functions_ltoir],

)

def debuggable_kernel(result):

i = cuda.grid(1)

result[i] = add_from_numba(i, i)

greptile-apps · 2026-01-12T17:55:53Z

numba_cuda/numba/cuda/tests/cudadrv/test_linker.py

+    def test_debug_kernel_with_lto(self):
+        cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel)


The test test_debug_kernel_with_lto will fail when TEST_BIN_DIR environment variable is not set because debuggable_kernel and add_from_numba are only defined when TEST_BIN_DIR is set (as suggested in the comment on lines 120-124).

This test should include a skip decorator similar to tests in test_nvjitlink.py:

Suggested change

def test_debug_kernel_with_lto(self):

cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel)

@unittest.skipIf(

not TEST_BIN_DIR, "TEST_BIN_DIR not set"

)

def test_debug_kernel_with_lto(self):

This reverts commit 78e0634.

greptile-apps

_{2 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-12T18:04:15Z

numba_cuda/numba/cuda/tests/cudadrv/test_linker.py

+TEST_BIN_DIR = os.getenv("NUMBA_CUDA_TEST_BIN_DIR")
+if TEST_BIN_DIR:
+    test_device_functions_ltoir = os.path.join(
+        TEST_BIN_DIR, "test_device_functions.ltoir"
+    )
+
+
+add_from_numba = cuda.declare_device(
+    "add_from_numba",
+    "int32(int32, int32)",
+    link=[test_device_functions_ltoir],
+)


The variable test_device_functions_ltoir is only defined when TEST_BIN_DIR is set (line 114-117), but it's used unconditionally on line 123. This will cause a NameError when the test module is imported if TEST_BIN_DIR is not set in the environment, preventing the entire test module from loading.

This should either:

Move the add_from_numba and debuggable_kernel definitions inside the conditional block, or

Initialize test_device_functions_ltoir to a default value (like None) and add appropriate guards

Suggested change

TEST_BIN_DIR = os.getenv("NUMBA_CUDA_TEST_BIN_DIR")

if TEST_BIN_DIR:

test_device_functions_ltoir = os.path.join(

TEST_BIN_DIR, "test_device_functions.ltoir"

)

add_from_numba = cuda.declare_device(

"add_from_numba",

"int32(int32, int32)",

link=[test_device_functions_ltoir],

)

TEST_BIN_DIR = os.getenv("NUMBA_CUDA_TEST_BIN_DIR")

if TEST_BIN_DIR:

test_device_functions_ltoir = os.path.join(

TEST_BIN_DIR, "test_device_functions.ltoir"

)

add_from_numba = cuda.declare_device(

"add_from_numba",

"int32(int32, int32)",

link=[test_device_functions_ltoir],

)

def debuggable_kernel(result):

i = cuda.grid(1)

result[i] = add_from_numba(i, i)

greptile-apps · 2026-01-12T18:04:16Z

numba_cuda/numba/cuda/tests/cudadrv/test_linker.py

+    def test_debug_kernel_with_lto(self):
+        cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel)


This test uses debuggable_kernel which references test_device_functions_ltoir that only exists when TEST_BIN_DIR is set. The test should have a skip decorator to avoid failures when the required LTOIR file is not available.

Consider adding a skip decorator similar to other tests in this file:

Suggested change

def test_debug_kernel_with_lto(self):

cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel)

@unittest.skipUnless(TEST_BIN_DIR, "TEST_BIN_DIR not set")

def test_debug_kernel_with_lto(self):

cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel)

gmarkall

It looks like the old WAR is not related to the fails on CUDA <= 12.2 - this seems to be some other issue that needs debugging - accordingly, I've left the change in its original form, and the only change of mine that I kept is the fix so that the test can run.

I would ignore greptile, it is talking without enough idea about the larger context.

greptile-apps

_{2 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-21T22:33:13Z

numba_cuda/numba/cuda/tests/cudadrv/test_linker.py

+add_from_numba = cuda.declare_device(
+    "add_from_numba",
+    "int32(int32, int32)",
+    link=[test_device_functions_ltoir],
+)
+
+
+def debuggable_kernel(result):
+    i = cuda.grid(1)
+    result[i] = add_from_numba(i, i)


logic: Module will fail to import when NUMBA_CUDA_TEST_BIN_DIR is not set, since test_device_functions_ltoir is only defined inside the conditional block. Move these declarations inside the test method or make them conditional.

Suggested change

add_from_numba = cuda.declare_device(

"add_from_numba",

"int32(int32, int32)",

link=[test_device_functions_ltoir],

)

def debuggable_kernel(result):

i = cuda.grid(1)

result[i] = add_from_numba(i, i)

# Move these inside test_debug_kernel_with_lto method

# or wrap in: if TEST_BIN_DIR:

greptile-apps · 2026-01-21T22:33:13Z

numba_cuda/numba/cuda/tests/cudadrv/test_linker.py

+    def test_debug_kernel_with_lto(self):
+        cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel)


logic: Missing skip decorator for when TEST_BIN_DIR is not set. Test will fail with NameError accessing debuggable_kernel.

Suggested change

def test_debug_kernel_with_lto(self):

cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel)

@unittest.skipIf(not TEST_BIN_DIR, "necessary binaries not generated.")

def test_debug_kernel_with_lto(self):

… WAR) WAR for cuda-core < 0.4.0 where passing False to link_time_optimization and ptx options incorrectly appends flags due to "is not None" check. Fixed in cuda-python PR #989, released in cuda-core v0.4.0.

jiel-nv · 2026-01-21T22:46:36Z

/ok to test 0aa4138

jiel-nv · 2026-01-22T00:48:03Z

@mmason-nvidia, I have a fix commited to my local repo, but not able to push to your fork. Could you please add me as a collaborator on your fork?

gmarkall · 2026-01-22T10:32:41Z

/ok to test fa5ad37

greptile-apps

_{2 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-22T10:34:29Z

numba_cuda/numba/cuda/tests/cudadrv/test_linker.py

+add_from_numba = cuda.declare_device(
+    "add_from_numba",
+    "int32(int32, int32)",
+    link=[test_device_functions_ltoir],
+)
+
+
+def debuggable_kernel(result):
+    i = cuda.grid(1)
+    result[i] = add_from_numba(i, i)


logic: Module will fail to import when NUMBA_CUDA_TEST_BIN_DIR is not set. test_device_functions_ltoir is only defined inside the conditional block (lines 114-117), but is referenced unconditionally here.

Move these declarations inside the if TEST_BIN_DIR: block:

Suggested change

add_from_numba = cuda.declare_device(

"add_from_numba",

"int32(int32, int32)",

link=[test_device_functions_ltoir],

)

def debuggable_kernel(result):

i = cuda.grid(1)

result[i] = add_from_numba(i, i)

if TEST_BIN_DIR:

test_device_functions_ltoir = os.path.join(

TEST_BIN_DIR, "test_device_functions.ltoir"

)

add_from_numba = cuda.declare_device(

"add_from_numba",

"int32(int32, int32)",

link=[test_device_functions_ltoir],

)

def debuggable_kernel(result):

i = cuda.grid(1)

result[i] = add_from_numba(i, i)

greptile-apps · 2026-01-22T10:34:30Z

numba_cuda/numba/cuda/tests/cudadrv/test_linker.py

+    def test_debug_kernel_with_lto(self):
+        cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel)


logic: Missing skip decorator. Test will fail with NameError when TEST_BIN_DIR is not set.

Suggested change

def test_debug_kernel_with_lto(self):

cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel)

@unittest.skipIf(not TEST_BIN_DIR, "necessary binaries not generated.")

def test_debug_kernel_with_lto(self):

cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel)

gmarkall · 2026-01-22T10:34:54Z

I think I see the fix so I just pushed it so ti can get tested whilst CI is (hopefully) not too busy.

jiel-nv · 2026-01-22T16:28:41Z

I think I see the fix so I just pushed it so ti can get tested whilst CI is (hopefully) not too busy.

Thanks a lot, @gmarkall, for helping to push the fix in.

Just wondering how you did it, because I tried to push and were not successful.

gmarkall · 2026-01-23T10:59:31Z

I did

git push mmason-nvidia mmason-nvidia/bugfix/lto-linker-flags

where that remote is

$ git remote -v
mmason-nvidia	git@github.com:mmason-nvidia/numba-cuda.git (push)

- Add Python 3.14 to the wheel publishing matrix (NVIDIA#750) - feat: swap out internal device array usage with `StridedMemoryView` (NVIDIA#703) - Fix max block size computation in `forall` (NVIDIA#744) - Fix prologue debug line info pointing to decorator instead of def line (NVIDIA#746) - Fix kernel return type in DISubroutineType debug metadata (NVIDIA#745) - Fix missing line info in Jupyter notebooks (NVIDIA#742) - Fix: Pass correct flags to linker when debugging in the presence of LTOIR code (NVIDIA#698) - chore(deps): add cuda-pathfinder to pixi deps (NVIDIA#741) - fix: enable flake8-bugbear lints and fix found problems (NVIDIA#708) - fix: Fix race condition in CUDA Simulator (NVIDIA#690) - ci: run tests in parallel (NVIDIA#740) - feat: users can pass `shared_memory_carveout` to @cuda.jit (NVIDIA#642) - Fix compatibility with NumPy 2.4: np.trapz and np.in1d removed (NVIDIA#739) - Pass the -numba-debug flag to libnvvm (NVIDIA#681) - ci: remove rapids containers from conda ci (NVIDIA#737) - Use `pathfinder` for dynamic libraries (NVIDIA#308) - CI: Add CUDA 13.1 testing support (NVIDIA#705) - Adding `pixi run test` and `pixi run test-par` support (NVIDIA#724) - Disable per-PR nvmath tests + follow same test practice (NVIDIA#723) - chore(deps): regenerate pixi lockfile (NVIDIA#722) - Fix DISubprogram line number to point to function definition line (NVIDIA#695) - revert: chore(dev): build pixi using rattler (NVIDIA#713) (NVIDIA#719) - [feat] Initial version of the Numba CUDA GDB pretty-printer (NVIDIA#692) - chore(dev): build pixi using rattler (NVIDIA#713) - build(deps): bump the actions-monthly group across 1 directory with 8 updates (NVIDIA#704)

- Add Python 3.14 to the wheel publishing matrix (#750) - feat: swap out internal device array usage with `StridedMemoryView` (#703) - Fix max block size computation in `forall` (#744) - Fix prologue debug line info pointing to decorator instead of def line (#746) - Fix kernel return type in DISubroutineType debug metadata (#745) - Fix missing line info in Jupyter notebooks (#742) - Fix: Pass correct flags to linker when debugging in the presence of LTOIR code (#698) - chore(deps): add cuda-pathfinder to pixi deps (#741) - fix: enable flake8-bugbear lints and fix found problems (#708) - fix: Fix race condition in CUDA Simulator (#690) - ci: run tests in parallel (#740) - feat: users can pass `shared_memory_carveout` to @cuda.jit (#642) - Fix compatibility with NumPy 2.4: np.trapz and np.in1d removed (#739) - Pass the -numba-debug flag to libnvvm (#681) - ci: remove rapids containers from conda ci (#737) - Use `pathfinder` for dynamic libraries (#308) - CI: Add CUDA 13.1 testing support (#705) - Adding `pixi run test` and `pixi run test-par` support (#724) - Disable per-PR nvmath tests + follow same test practice (#723) - chore(deps): regenerate pixi lockfile (#722) - Fix DISubprogram line number to point to function definition line (#695) - revert: chore(dev): build pixi using rattler (#713) (#719) - [feat] Initial version of the Numba CUDA GDB pretty-printer (#692) - chore(dev): build pixi using rattler (#713) - build(deps): bump the actions-monthly group across 1 directory with 8 updates (#704)

gmarkall reviewed Jan 12, 2026

View reviewed changes

gmarkall and others added 2 commits January 12, 2026 15:56

Correct test LTO-IR path

a583167

Merge branch 'main' into mmason-nvidia/bugfix/lto-linker-flags

ec52710

greptile-apps bot reviewed Jan 12, 2026

View reviewed changes

Reinstante WAR for early 12.x CUDA toolkits

78e0634

greptile-apps bot reviewed Jan 12, 2026

View reviewed changes

Revert "Reinstante WAR for early 12.x CUDA toolkits"

5ce8d5e

This reverts commit 78e0634.

greptile-apps bot reviewed Jan 12, 2026

View reviewed changes

gmarkall requested changes Jan 12, 2026

View reviewed changes

gmarkall added the 4 - Waiting on author Waiting for author to respond to review label Jan 12, 2026

Merge branch 'main' into mmason-nvidia/bugfix/lto-linker-flags

0aa4138

jiel-nv added 2 - In Progress Currently a work in progress and removed 4 - Waiting on author Waiting for author to respond to review labels Jan 21, 2026

greptile-apps bot reviewed Jan 21, 2026

View reviewed changes

Fix: Pass None instead of False for linker options (cuda-core < 0.4.0…

fa5ad37

… WAR) WAR for cuda-core < 0.4.0 where passing False to link_time_optimization and ptx options incorrectly appends flags due to "is not None" check. Fixed in cuda-python PR #989, released in cuda-core v0.4.0.

greptile-apps bot reviewed Jan 22, 2026

View reviewed changes

gmarkall added 4 - Waiting on CI Waiting for a CI run to finish successfully and removed 2 - In Progress Currently a work in progress labels Jan 22, 2026

gmarkall approved these changes Jan 23, 2026

View reviewed changes

gmarkall added 5 - Ready to merge Testing and reviews complete, ready to merge and removed 4 - Waiting on CI Waiting for a CI run to finish successfully labels Jan 23, 2026

gmarkall merged commit 08ab491 into NVIDIA:main Jan 23, 2026
105 checks passed

gmarkall mentioned this pull request Jan 27, 2026

Bump version to 0.25.0 #752

Merged

	test_device_functions_ltoir = os.path.join(
	TEST_BIN_DIR, "test_device_functions.ltoir"
	)

		def test_debug_kernel_with_lto(self):
		cuda.jit("void(int32[::1])", debug=True, opt=False)(debuggable_kernel)

Fix: Pass correct flags to linker when debugging in the presence of LTOIR code #698

Fix: Pass correct flags to linker when debugging in the presence of LTOIR code #698

Uh oh!

Conversation

mmason-nvidia commented Jan 7, 2026

Uh oh!

copy-pr-bot bot commented Jan 7, 2026

Uh oh!

greptile-apps bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Uh oh!

gmarkall commented Jan 12, 2026

Uh oh!

gmarkall Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

gmarkall Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

gmarkall Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

gmarkall Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

gmarkall commented Jan 12, 2026

Uh oh!

gmarkall commented Jan 12, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Jan 12, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Jan 12, 2026

Uh oh!

gmarkall commented Jan 12, 2026

Uh oh!

copy-pr-bot bot commented Jan 12, 2026

Uh oh!

gmarkall commented Jan 12, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

gmarkall left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

greptile-apps bot commented Jan 7, 2026 •

edited

Loading

jiel-nv commented Jan 22, 2026 •

edited

Loading