Fix inlining behaviour at the NVVM IR level (Patch on v0.10.0) #247
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR #181 aimed to align the behaviour of the
inlinekwarg with that of upstream Numba, in that it now forces inlining at the Numba IR level. It turns out that this kwarg in Numba-CUDA already had the prior effect of enabling inlining at the NVVM IR level.Because the default value of
inlineis"never", this was interpreted by thecompile_cuda()function as aTrueish value and every device function got marked with thealwaysinlinefunction attribute. This is a minor problem in that it probably forces a lot of inlining that we don't want, but also a major problem in that it triggers an NVVM bug that was only resolved in CUDA 12.3 that causes a hang innvvmCompileProgram().To rectify these issues, we add the
forceinlinekwarg to the@cuda.jitdecorator and thecuda.compile[_*]()functions. Now,compile_cuda()will only enable inlining at the NVVM IR level forforceinlineand notinline. This is aligned with the behaviour of upstream Numba (see numba/numba#10068). We now document theinlineandforceinlinekwargs to clarify the intent and behaviour for users.For clarity: the behaviour is now:
inlinekwarg enables inlining only at the Numba IR level.forceinlinekwarg enables inlining only at the NVVM IR level.