Skip to content

Enable LTO by default when pynvjitlink is available#310

Merged
gmarkall merged 2 commits intoNVIDIA:mainfrom
gmarkall:lto-default
Jun 26, 2025
Merged

Enable LTO by default when pynvjitlink is available#310
gmarkall merged 2 commits intoNVIDIA:mainfrom
gmarkall:lto-default

Conversation

@gmarkall
Copy link
Contributor

Enabling LTO by default when pynvjitlink is available should:

  • Provide a general improvement in performance for various use cases, particularly those linking external code. This ought to be benchmarked, but I'm making an assumption that it helps for now based on prior anecdotal / informal experience.
  • Make the case where users link LTO-IR to kernels or as part of device function declarations "just work" as long as pynvjitlink is installed.

A further improvement would still be to error out when a users tries to link LTO-IR when pynvjitlink is not installed - that is left to be done in a future PR.

Enabling LTO by default when pynvjitlink is available should:

- Provide a general improvement in performance for various use cases,
  particularly those linking external code. This ought to be
  benchmarked, but I'm making an assumption that it helps for now based
  on prior anecdotal / informal experience.
- Make the case where users link LTO-IR to kernels or as part of device
  function declarations "just work" as long as pynvjitlink is installed.

A further improvement would still be to error out when a users tries to
link LTO-IR when pynvjitlink is not installed - that is left to be done
in a future PR.
@copy-pr-bot
Copy link

copy-pr-bot bot commented Jun 25, 2025

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@gmarkall
Copy link
Contributor Author

/ok to test

@tpn
Copy link
Contributor

tpn commented Jun 25, 2025

This looks like it'll fix the recent cuda.coop issues I ran into, so, +1 from me.

Copy link
Contributor

@brandon-b-miller brandon-b-miller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@brandon-b-miller
Copy link
Contributor

Looking into the failures here

@brandon-b-miller
Copy link
Contributor

To fix the simulator test we need to add lto=None to numba_cuda/numba/cuda/simulator/api.py but locally for me there's more needed to fix the failing test, still tracking down exactly what.

@brandon-b-miller
Copy link
Contributor

To fix the simulator test we need to add lto=None to numba_cuda/numba/cuda/simulator/api.py but locally for me there's more needed to fix the failing test, still tracking down exactly what.

Actually I think it's just https://github.com/NVIDIA/numba-cuda/pull/310/files#r2167468144

Also skip an irrelevant test on cudasim.
@gmarkall
Copy link
Contributor Author

/ok to test

@gmarkall gmarkall merged commit ae94550 into NVIDIA:main Jun 26, 2025
39 checks passed
gmarkall added a commit to gmarkall/numba-cuda that referenced this pull request Jul 2, 2025
- Updates for recent API changes (NVIDIA#313)
- Fix lineinfo generation when compile_internal used (NVIDIA#271) (NVIDIA#287)
- Build docs with NVIDIA Sphinx theme (NVIDIA#312)
- Don't skip debug tests when LTO enabled by default (NVIDIA#311)
- Use `cuda.bindings` and `cuda.core` for `Linker` (NVIDIA#133)
- Enable LTO by default when pynvjitlink is available (NVIDIA#310)
@gmarkall gmarkall mentioned this pull request Jul 2, 2025
gmarkall added a commit that referenced this pull request Jul 2, 2025
- Updates for recent API changes (#313)
- Fix lineinfo generation when compile_internal used (#271) (#287)
- Build docs with NVIDIA Sphinx theme (#312)
- Don't skip debug tests when LTO enabled by default (#311)
- Use `cuda.bindings` and `cuda.core` for `Linker` (#133)
- Enable LTO by default when pynvjitlink is available (#310)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants