Skip to content

Mmason nvidia/bugfix/lto linking#474

Closed
mmason-nvidia wants to merge 0 commit intoNVIDIA:mainfrom
mmason-nvidia:mmason-nvidia/bugfix/lto-linking
Closed

Mmason nvidia/bugfix/lto linking#474
mmason-nvidia wants to merge 0 commit intoNVIDIA:mainfrom
mmason-nvidia:mmason-nvidia/bugfix/lto-linking

Conversation

@mmason-nvidia
Copy link
Contributor

Update linker options to only set link_time_optimization=True if LTO-IR code is being linked.

Move linker option creation to a separate function to reduce duplicate code,
and remove self.options in favor of computing the options on the fly.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 16, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@mmason-nvidia
Copy link
Contributor Author

/ok to test 6128da7

1 similar comment
@mmason-nvidia
Copy link
Contributor Author

/ok to test 6128da7

# Enable link time optimization if there is an LTO-IR object in the
# _object_codes list. This has to be deferred until now as it requires
# the full set of objects to be available.
has_ltoir = any(obj._code_type == "ltoir" for obj in self._object_codes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, it would be nice if ObjectCode._code_type was exposed as a public property. @leofang ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah why isn't it already exposed, I wonder? 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressing this in NVIDIA/cuda-python#890.

Copy link
Contributor

@kkraus14 kkraus14 Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now a publicly exposed property under .code_type

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use the public attribute now, we'll need to bump the lower bound on cuda.core to 0.4.0.

If we don't use the public attribute, at one point when we cythonize ObjectCode (NVIDIA/cuda-python#1081) this will likely break.

Just wanna call this out to make an informed decision. Whatever works for me 😛

arch=self.arch,
link_time_optimization=True,
ptx=True,
link_time_optimization=has_ltoir,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would we still be able to force LTO even if just a bunch of PTX files were passed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's not possible. LTO happens at the NVVM level before the PTX is generated.

Copy link
Contributor

@brandon-b-miller brandon-b-miller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple Q's otherwise LGTM

@brandon-b-miller
Copy link
Contributor

/ok to test

@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 17, 2025

/ok to test

@brandon-b-miller, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

@brandon-b-miller
Copy link
Contributor

/ok to test 6128da7

rparolin
rparolin previously approved these changes Sep 17, 2025
Copy link
Contributor

@rparolin rparolin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

link_time_optimization=True,
ptx=True,
link_time_optimization=has_ltoir,
ptx=ptx and has_ltoir,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Why don't we emit PTX if has_ltoir is False?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the linker manual (https://docs.nvidia.com/cuda//pdf/nvjitlink.pdf page 14 section 2.3) the -ptx command line option is only valid when used with -lto:

▶ -lto Do link time optimization.
▶ -ptx Emit ptx after linking instead of cubin; only supported with -lto

@brandon-b-miller
Copy link
Contributor

Is this

if lto is False:
# WAR for apparent nvjitlink issue
lto = None
removable with this PR?

@gmarkall
Copy link
Contributor

gmarkall commented Oct 8, 2025

What bug does this fix?

@mmason-nvidia
Copy link
Contributor Author

What bug does this fix?

There was an issue with which flags we were passing to nvJitLink in the case of linking a JITed kernel with LTO-IR code where the kernel was compiled with debugging enabled. In particular: the ltoir flag should only be used if there is at least one LTO-IR object to be linked, and ptx=True is only supported if ltoir=True. Not all unsupported linker option combinations are checked for, so without this change undefined behavior will occur (usually in the form of an internal error from the linker).

@mmason-nvidia
Copy link
Contributor Author

Is this

if lto is False:
# WAR for apparent nvjitlink issue
lto = None

removable with this PR?

I think so. I'll give it a try.

@gmarkall
Copy link
Contributor

I get the impression that this PR is now causing us to silently accept an incorrect set of flags, instead of rejecting it, but it's hard to tell without a test case - can you add a test case representing the bug that is fixed here please?

@gmarkall gmarkall added the 4 - Waiting on author Waiting for author to respond to review label Oct 10, 2025
Comment on lines 3038 to 3040
# Due to a bug in cuda.core linker flag creation, we need to pass in None
# instead of False for boolean values. Once cuda_core is fixed, we can
# remove this workaround.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was fixed in NVIDIA/cuda-python#989 and is included in the recently released cuda.core 0.4.0

@mmason-nvidia
Copy link
Contributor Author

I get the impression that this PR is now causing us to silently accept an incorrect set of flags, instead of rejecting it, but it's hard to tell without a test case - can you add a test case representing the bug that is fixed here please?

Yes, I will update the PR in response to this and Keith's comment about NVIDIA/cuda-python#989 being merged and released.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 1, 2025

Greptile Overview

Greptile Summary

This PR refactors the linker options logic in the _Linker class to only enable LTO when LTOIR objects are actually present. The changes include:

  • Removed the pre-computed self.options from __init__ and replaced it with a new _get_linker_options() method that computes options on-the-fly
  • Added dynamic detection of LTOIR objects via has_ltoir = any(obj._code_type == "ltoir" for obj in self._object_codes)
  • Only sets link_time_optimization=True when LTOIR objects are present in the object codes list
  • Includes a workaround for a cuda.core bug where boolean flags must be None instead of False

The refactoring reduces code duplication by centralizing option creation logic. However, there's a potential issue with the ptx flag logic that may change behavior when get_linked_ptx() is called without LTOIR objects present.

Confidence Score: 2/5

  • This PR has a potential logic bug that could break existing functionality when PTX is requested without LTOIR objects
  • The refactoring correctly implements the stated goal of only enabling LTO when LTOIR objects are present. However, the ptx flag logic in _get_linker_options at line 2785 may introduce a regression: previously get_linked_ptx() always set ptx=True, but now it only does so when LTOIR objects are present. This behavioral change could break code that calls get_linked_ptx() on a linker without LTOIR objects. The code is well-structured and the refactoring reduces duplication, but the logic issue needs to be verified or fixed before merging.
  • Pay close attention to numba_cuda/numba/cuda/cudadrv/driver.py, specifically line 2785 where the ptx flag logic may cause unexpected behavior

Important Files Changed

File Analysis

Filename Score Overview
numba_cuda/numba/cuda/cudadrv/driver.py 2/5 Refactors linker options creation; potential logic issue with ptx flag when no LTOIR objects present

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@gmarkall gmarkall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid confusion from @greptileai's very positive review, I just want to re-state my request / concern here:

I get the impression that this PR is now causing us to silently accept an incorrect set of flags, instead of rejecting it, but it's hard to tell without a test case - can you add a test case representing the bug that is fixed here please?

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. numba_cuda/numba/cuda/cudadrv/driver.py, line 2785 (link)

    logic: The ptx flag logic may cause issues when get_linked_ptx() is called without LTOIR objects. Previously, get_linked_ptx() always set ptx=True, but now it only sets ptx=True when has_ltoir is also true. This changes the behavior: if someone calls get_linked_ptx() on a linker with only PTX or cubin objects (no LTOIR), the ptx option will be None instead of True. Consider whether the logic should be:

    This would preserve the original behavior where get_linked_ptx() always requests PTX output, while complete() doesn't request PTX output.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@mmason-nvidia mmason-nvidia force-pushed the mmason-nvidia/bugfix/lto-linking branch from 2b5b4e5 to aff41e9 Compare December 19, 2025 20:15
@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 19, 2025

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 19, 2025

Skipped: No reviewable files found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

4 - Waiting on author Waiting for author to respond to review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants