Conversation
|
/ok to test 6128da7 |
1 similar comment
|
/ok to test 6128da7 |
| # Enable link time optimization if there is an LTO-IR object in the | ||
| # _object_codes list. This has to be deferred until now as it requires | ||
| # the full set of objects to be available. | ||
| has_ltoir = any(obj._code_type == "ltoir" for obj in self._object_codes) |
There was a problem hiding this comment.
Hm, it would be nice if ObjectCode._code_type was exposed as a public property. @leofang ?
There was a problem hiding this comment.
Yeah why isn't it already exposed, I wonder? 🤔
There was a problem hiding this comment.
This is now a publicly exposed property under .code_type
There was a problem hiding this comment.
If we use the public attribute now, we'll need to bump the lower bound on cuda.core to 0.4.0.
If we don't use the public attribute, at one point when we cythonize ObjectCode (NVIDIA/cuda-python#1081) this will likely break.
Just wanna call this out to make an informed decision. Whatever works for me 😛
| arch=self.arch, | ||
| link_time_optimization=True, | ||
| ptx=True, | ||
| link_time_optimization=has_ltoir, |
There was a problem hiding this comment.
would we still be able to force LTO even if just a bunch of PTX files were passed?
There was a problem hiding this comment.
No, it's not possible. LTO happens at the NVVM level before the PTX is generated.
brandon-b-miller
left a comment
There was a problem hiding this comment.
Couple Q's otherwise LGTM
|
/ok to test |
@brandon-b-miller, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/ |
|
/ok to test 6128da7 |
| link_time_optimization=True, | ||
| ptx=True, | ||
| link_time_optimization=has_ltoir, | ||
| ptx=ptx and has_ltoir, |
There was a problem hiding this comment.
Q: Why don't we emit PTX if has_ltoir is False?
There was a problem hiding this comment.
According to the linker manual (https://docs.nvidia.com/cuda//pdf/nvjitlink.pdf page 14 section 2.3) the -ptx command line option is only valid when used with -lto:
▶ -lto Do link time optimization.
▶ -ptx Emit ptx after linking instead of cubin; only supported with -lto
|
Is this numba-cuda/numba_cuda/numba/cuda/cudadrv/driver.py Lines 2966 to 2968 in 97ce4b1 |
|
What bug does this fix? |
There was an issue with which flags we were passing to nvJitLink in the case of linking a JITed kernel with LTO-IR code where the kernel was compiled with debugging enabled. In particular: the ltoir flag should only be used if there is at least one LTO-IR object to be linked, and ptx=True is only supported if ltoir=True. Not all unsupported linker option combinations are checked for, so without this change undefined behavior will occur (usually in the form of an internal error from the linker). |
I think so. I'll give it a try. |
|
I get the impression that this PR is now causing us to silently accept an incorrect set of flags, instead of rejecting it, but it's hard to tell without a test case - can you add a test case representing the bug that is fixed here please? |
| # Due to a bug in cuda.core linker flag creation, we need to pass in None | ||
| # instead of False for boolean values. Once cuda_core is fixed, we can | ||
| # remove this workaround. |
There was a problem hiding this comment.
This was fixed in NVIDIA/cuda-python#989 and is included in the recently released cuda.core 0.4.0
Yes, I will update the PR in response to this and Keith's comment about NVIDIA/cuda-python#989 being merged and released. |
Greptile OverviewGreptile SummaryThis PR refactors the linker options logic in the
The refactoring reduces code duplication by centralizing option creation logic. However, there's a potential issue with the Confidence Score: 2/5
Important Files ChangedFile Analysis
|
gmarkall
left a comment
There was a problem hiding this comment.
To avoid confusion from @greptileai's very positive review, I just want to re-state my request / concern here:
I get the impression that this PR is now causing us to silently accept an incorrect set of flags, instead of rejecting it, but it's hard to tell without a test case - can you add a test case representing the bug that is fixed here please?
There was a problem hiding this comment.
Additional Comments (1)
-
numba_cuda/numba/cuda/cudadrv/driver.py, line 2785 (link)logic: The
ptxflag logic may cause issues whenget_linked_ptx()is called without LTOIR objects. Previously,get_linked_ptx()always setptx=True, but now it only setsptx=Truewhenhas_ltoiris also true. This changes the behavior: if someone callsget_linked_ptx()on a linker with only PTX or cubin objects (no LTOIR), theptxoption will beNoneinstead ofTrue. Consider whether the logic should be:This would preserve the original behavior where
get_linked_ptx()always requests PTX output, whilecomplete()doesn't request PTX output.
1 file reviewed, 1 comment
2b5b4e5 to
aff41e9
Compare
|
Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Skipped: No reviewable files found. |
Update linker options to only set link_time_optimization=True if LTO-IR code is being linked.
Move linker option creation to a separate function to reduce duplicate code,
and remove self.options in favor of computing the options on the fly.