Mmason nvidia/bugfix/lto linking by mmason-nvidia · Pull Request #474 · NVIDIA/numba-cuda

mmason-nvidia · 2025-09-16T20:09:55Z

Update linker options to only set link_time_optimization=True if LTO-IR code is being linked.

Move linker option creation to a separate function to reduce duplicate code,
and remove self.options in favor of computing the options on the fly.

copy-pr-bot · 2025-09-16T20:09:58Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

mmason-nvidia · 2025-09-16T20:12:09Z

/ok to test 6128da7

mmason-nvidia · 2025-09-16T22:31:46Z

/ok to test 6128da7

brandon-b-miller · 2025-09-17T12:55:10Z

numba_cuda/numba/cuda/cudadrv/driver.py

+        # Enable link time optimization if there is an LTO-IR object in the
+        # _object_codes list. This has to be deferred until now as it requires
+        # the full set of objects to be available.
+        has_ltoir = any(obj._code_type == "ltoir" for obj in self._object_codes)


Hm, it would be nice if ObjectCode._code_type was exposed as a public property. @leofang ?

Yeah why isn't it already exposed, I wonder? 🤔

Addressing this in NVIDIA/cuda-python#890.

This is now a publicly exposed property under .code_type

If we use the public attribute now, we'll need to bump the lower bound on cuda.core to 0.4.0.

If we don't use the public attribute, at one point when we cythonize ObjectCode (NVIDIA/cuda-python#1081) this will likely break.

Just wanna call this out to make an informed decision. Whatever works for me 😛

brandon-b-miller · 2025-09-17T13:15:33Z

numba_cuda/numba/cuda/cudadrv/driver.py

            arch=self.arch,
-            link_time_optimization=True,
-            ptx=True,
+            link_time_optimization=has_ltoir,


would we still be able to force LTO even if just a bunch of PTX files were passed?

No, it's not possible. LTO happens at the NVVM level before the PTX is generated.

brandon-b-miller

Couple Q's otherwise LGTM

brandon-b-miller · 2025-09-17T13:16:11Z

/ok to test

copy-pr-bot · 2025-09-17T13:16:14Z

/ok to test

@brandon-b-miller, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

brandon-b-miller · 2025-09-17T16:45:29Z

/ok to test 6128da7

rparolin

LGTM.

leofang · 2025-09-17T18:29:26Z

numba_cuda/numba/cuda/cudadrv/driver.py

-            link_time_optimization=True,
-            ptx=True,
+            link_time_optimization=has_ltoir,
+            ptx=ptx and has_ltoir,


Q: Why don't we emit PTX if has_ltoir is False?

According to the linker manual (https://docs.nvidia.com/cuda//pdf/nvjitlink.pdf page 14 section 2.3) the -ptx command line option is only valid when used with -lto:

▶ -lto Do link time optimization.
▶ -ptx Emit ptx after linking instead of cubin; only supported with -lto

brandon-b-miller · 2025-09-22T13:11:15Z

Is this

numba-cuda/numba_cuda/numba/cuda/cudadrv/driver.py

Lines 2966 to 2968 in 97ce4b1

    
           if lto is False: 
        
               # WAR for apparent nvjitlink issue 
        
               lto = None

removable with this PR?

gmarkall · 2025-10-08T14:25:55Z

What bug does this fix?

mmason-nvidia · 2025-10-09T15:44:20Z

What bug does this fix?

There was an issue with which flags we were passing to nvJitLink in the case of linking a JITed kernel with LTO-IR code where the kernel was compiled with debugging enabled. In particular: the ltoir flag should only be used if there is at least one LTO-IR object to be linked, and ptx=True is only supported if ltoir=True. Not all unsupported linker option combinations are checked for, so without this change undefined behavior will occur (usually in the form of an internal error from the linker).

mmason-nvidia · 2025-10-09T16:04:49Z

Is this

numba-cuda/numba_cuda/numba/cuda/cudadrv/driver.py

Lines 2966 to 2968 in 97ce4b1

if lto is False:

# WAR for apparent nvjitlink issue

lto = None

removable with this PR?

I think so. I'll give it a try.

gmarkall · 2025-10-10T15:29:12Z

I get the impression that this PR is now causing us to silently accept an incorrect set of flags, instead of rejecting it, but it's hard to tell without a test case - can you add a test case representing the bug that is fixed here please?

kkraus14 · 2025-10-13T14:19:18Z

numba_cuda/numba/cuda/cudadrv/driver.py

+        # Due to a bug in cuda.core linker flag creation, we need to pass in None
+        # instead of False for boolean values. Once cuda_core is fixed, we can
+        # remove this workaround.


This was fixed in NVIDIA/cuda-python#989 and is included in the recently released cuda.core 0.4.0

mmason-nvidia · 2025-10-14T15:09:10Z

I get the impression that this PR is now causing us to silently accept an incorrect set of flags, instead of rejecting it, but it's hard to tell without a test case - can you add a test case representing the bug that is fixed here please?

Yes, I will update the PR in response to this and Keith's comment about NVIDIA/cuda-python#989 being merged and released.

greptile-apps · 2025-12-01T23:34:35Z

Greptile Overview

Greptile Summary

This PR refactors the linker options logic in the _Linker class to only enable LTO when LTOIR objects are actually present. The changes include:

Removed the pre-computed self.options from __init__ and replaced it with a new _get_linker_options() method that computes options on-the-fly
Added dynamic detection of LTOIR objects via has_ltoir = any(obj._code_type == "ltoir" for obj in self._object_codes)
Only sets link_time_optimization=True when LTOIR objects are present in the object codes list
Includes a workaround for a cuda.core bug where boolean flags must be None instead of False

The refactoring reduces code duplication by centralizing option creation logic. However, there's a potential issue with the ptx flag logic that may change behavior when get_linked_ptx() is called without LTOIR objects present.

Confidence Score: 2/5

This PR has a potential logic bug that could break existing functionality when PTX is requested without LTOIR objects
The refactoring correctly implements the stated goal of only enabling LTO when LTOIR objects are present. However, the ptx flag logic in _get_linker_options at line 2785 may introduce a regression: previously get_linked_ptx() always set ptx=True, but now it only does so when LTOIR objects are present. This behavioral change could break code that calls get_linked_ptx() on a linker without LTOIR objects. The code is well-structured and the refactoring reduces duplication, but the logic issue needs to be verified or fixed before merging.
Pay close attention to numba_cuda/numba/cuda/cudadrv/driver.py, specifically line 2785 where the ptx flag logic may cause unexpected behavior

Important Files Changed

File Analysis

Filename	Score	Overview
numba_cuda/numba/cuda/cudadrv/driver.py	2/5	Refactors linker options creation; potential logic issue with `ptx` flag when no LTOIR objects present

greptile-apps

_{1 file reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

gmarkall

To avoid confusion from @greptileai's very positive review, I just want to re-state my request / concern here:

I get the impression that this PR is now causing us to silently accept an incorrect set of flags, instead of rejecting it, but it's hard to tell without a test case - can you add a test case representing the bug that is fixed here please?

greptile-apps

_{1 file reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps

Additional Comments (1)

numba_cuda/numba/cuda/cudadrv/driver.py, line 2785 (link)

logic: The ptx flag logic may cause issues when get_linked_ptx() is called without LTOIR objects. Previously, get_linked_ptx() always set ptx=True, but now it only sets ptx=True when has_ltoir is also true. This changes the behavior: if someone calls get_linked_ptx() on a linker with only PTX or cubin objects (no LTOIR), the ptx option will be None instead of True. Consider whether the logic should be:

This would preserve the original behavior where get_linked_ptx() always requests PTX output, while complete() doesn't request PTX output.

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

copy-pr-bot · 2025-12-19T20:15:19Z

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

greptile-apps · 2025-12-19T20:15:19Z

Skipped: No reviewable files found.

brandon-b-miller reviewed Sep 17, 2025

View reviewed changes

rparolin previously approved these changes Sep 17, 2025

View reviewed changes

leofang reviewed Sep 17, 2025

View reviewed changes

mmason-nvidia dismissed rparolin’s stale review via a98639b September 24, 2025 14:15

gmarkall added the 4 - Waiting on author Waiting for author to respond to review label Oct 10, 2025

kkraus14 reviewed Oct 13, 2025

View reviewed changes

greptile-apps bot reviewed Dec 1, 2025

View reviewed changes

gmarkall requested changes Dec 2, 2025

View reviewed changes

greptile-apps bot reviewed Dec 2, 2025

View reviewed changes

greptile-apps bot reviewed Dec 10, 2025

View reviewed changes

mmason-nvidia closed this Dec 19, 2025

mmason-nvidia force-pushed the mmason-nvidia/bugfix/lto-linking branch from 2b5b4e5 to aff41e9 Compare December 19, 2025 20:15

Conversation

mmason-nvidia commented Sep 16, 2025

Uh oh!

copy-pr-bot bot commented Sep 16, 2025

Uh oh!

mmason-nvidia commented Sep 16, 2025

Uh oh!

mmason-nvidia commented Sep 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kkraus14 Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brandon-b-miller left a comment

Choose a reason for hiding this comment

Uh oh!

brandon-b-miller commented Sep 17, 2025

Uh oh!

copy-pr-bot bot commented Sep 17, 2025

Uh oh!

brandon-b-miller commented Sep 17, 2025

Uh oh!

rparolin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brandon-b-miller commented Sep 22, 2025

Uh oh!

gmarkall commented Oct 8, 2025

Uh oh!

mmason-nvidia commented Oct 9, 2025

Uh oh!

mmason-nvidia commented Oct 9, 2025

Uh oh!

gmarkall commented Oct 10, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mmason-nvidia commented Oct 14, 2025

Uh oh!

greptile-apps bot commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

gmarkall left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Additional Comments (1)

Uh oh!

copy-pr-bot bot commented Dec 19, 2025

Uh oh!

greptile-apps bot commented Dec 19, 2025

Uh oh!

kkraus14 Oct 13, 2025 •

edited

Loading

greptile-apps bot commented Dec 1, 2025 •

edited

Loading

greptile-apps bot left a comment •

edited

Loading