-
Notifications
You must be signed in to change notification settings - Fork 54
Fix inlining behaviour at the NVVM IR level #246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
PR NVIDIA#181 aimed to align the behaviour of the `inline` kwarg with that of upstream Numba, in that it now forces inlining at the Numba IR level. It turns out that this kwarg in Numba-CUDA already had the prior effect of enabling inlining at the NVVM IR level. Because the default value of `inline` is `"never"`, this was interpreted by the `compile_cuda()` function as a `True`ish value and every device function got marked with the `alwaysinline` function attribute. This is a minor problem in that it probably forces a lot of inlining that we don't want, but also a major problem in that it triggers an NVVM bug that was only resolved in CUDA 12.3 that causes a hang in `nvvmCompileProgram()`. To rectify these issues, we add the `forceinline` kwarg to the `@cuda.jit` decorator and the `cuda.compile[_*]()` functions. Now, `compile_cuda()` will only enable inlining at the NVVM IR level for `forceinline` and not `inline`. This is aligned with the behaviour of upstream Numba (see numba/numba#10068). We now document the `inline` and `forceinline` kwargs to clarify the intent and behaviour for users. For clarity: the behaviour is now: - The `inline` kwarg enables inlining only at the Numba IR level. - The `forceinline` kwarg enables inlining only at the NVVM IR level.
- Fix inlining behaviour at the NVVM IR level (NVIDIA#246 / NVIDIA#247)
| self.argtypes, | ||
| debug=self.debug, | ||
| lineinfo=lineinfo, | ||
| inline=inline, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just double checking, since it is happening on LLVM IR level there is no need to pass inline, because it affects only Numba IR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, exactly - inline affects online Numba IR, forceinline affects only LLVM IR (which is how it should be when all is working correctly)
| ``"always"``. See `Notes on Inlining | ||
| <https://numba.readthedocs.io/en/stable/developer/inlining.html>`_. | ||
| :type inline: str | ||
| :param forceinline: Enables inlining at the NVVM IR level when set to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want it to be bool, or we want to expose llvm IR attributes directly:
https://llvm.org/docs/LangRef.html#function-attributes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want it to be bool so it's consistent with Numba. There is a plan for Numba to expose LLVM attributes directly, so I'll aim to align with that in future.
ZzEeKkAa
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, few comments!
Thank you for fixing it. Did not mean to break inlining with the original MR. There was luck of docs that explains that behavior...
Not only that, but I didn't even know |
|
@ZzEeKkAa Many thanks for the review - I've responded to the comments - let me know if I should follow up any further! |
|
I'll merge this a little later today, as it seems to have been the resolution to RAPIDS hanging (in rapidsai/cudf#18688) in the v0.10.1 release - assuming there are no objections? |
PR #181 aimed to align the behaviour of the
inlinekwarg with that of upstream Numba, in that it now forces inlining at the Numba IR level. It turns out that this kwarg in Numba-CUDA already had the prior effect of enabling inlining at the NVVM IR level.Because the default value of
inlineis"never", this was interpreted by thecompile_cuda()function as aTrueish value and every device function got marked with thealwaysinlinefunction attribute. This is a minor problem in that it probably forces a lot of inlining that we don't want, but also a major problem in that it triggers an NVVM bug that was only resolved in CUDA 12.3 that causes a hang innvvmCompileProgram().To rectify these issues, we add the
forceinlinekwarg to the@cuda.jitdecorator and thecuda.compile[_*]()functions. Now,compile_cuda()will only enable inlining at the NVVM IR level forforceinlineand notinline. This is aligned with the behaviour of upstream Numba (see numba/numba#10068). We now document theinlineandforceinlinekwargs to clarify the intent and behaviour for users.For clarity: the behaviour is now:
inlinekwarg enables inlining only at the Numba IR level.forceinlinekwarg enables inlining only at the NVVM IR level.