Skip to content

Conversation

@gmarkall
Copy link
Contributor

@gmarkall gmarkall commented May 9, 2025

PR #181 aimed to align the behaviour of the inline kwarg with that of upstream Numba, in that it now forces inlining at the Numba IR level. It turns out that this kwarg in Numba-CUDA already had the prior effect of enabling inlining at the NVVM IR level.

Because the default value of inline is "never", this was interpreted by the compile_cuda() function as a Trueish value and every device function got marked with the alwaysinline function attribute. This is a minor problem in that it probably forces a lot of inlining that we don't want, but also a major problem in that it triggers an NVVM bug that was only resolved in CUDA 12.3 that causes a hang in nvvmCompileProgram().

To rectify these issues, we add the forceinline kwarg to the @cuda.jit decorator and the cuda.compile[_*]() functions. Now, compile_cuda() will only enable inlining at the NVVM IR level for forceinline and not inline. This is aligned with the behaviour of upstream Numba (see numba/numba#10068). We now document the inline and forceinline kwargs to clarify the intent and behaviour for users.

For clarity: the behaviour is now:

  • The inline kwarg enables inlining only at the Numba IR level.
  • The forceinline kwarg enables inlining only at the NVVM IR level.

PR NVIDIA#181 aimed to align the behaviour of the `inline` kwarg with that of
upstream Numba, in that it now forces inlining at the Numba IR level.
It turns out that this kwarg in Numba-CUDA already had the prior effect
of enabling inlining at the NVVM IR level.

Because the default value of `inline` is `"never"`, this was interpreted
by the `compile_cuda()` function as a `True`ish value and every device
function got marked with the `alwaysinline` function attribute. This is
a minor problem in that it probably forces a lot of inlining that we
don't want, but also a major problem in that it triggers an NVVM bug
that was only resolved in CUDA 12.3 that causes a hang in
`nvvmCompileProgram()`.

To rectify these issues, we add the `forceinline` kwarg to the
`@cuda.jit` decorator and the `cuda.compile[_*]()` functions. Now,
`compile_cuda()` will only enable inlining at the NVVM IR level for
`forceinline` and not `inline`. This is aligned with the behaviour of
upstream Numba (see numba/numba#10068). We now document the `inline` and
`forceinline` kwargs to clarify the intent and behaviour for users.

For clarity: the behaviour is now:

- The `inline` kwarg enables inlining only at the Numba IR level.
- The `forceinline` kwarg enables inlining only at the NVVM IR level.
@gmarkall gmarkall added the 3 - Ready for Review Ready for review by team label May 9, 2025
gmarkall added a commit to gmarkall/numba-cuda that referenced this pull request May 9, 2025
- Fix inlining behaviour at the NVVM IR level (NVIDIA#246 / NVIDIA#247)
@gmarkall gmarkall mentioned this pull request May 9, 2025
gmarkall added a commit that referenced this pull request May 9, 2025
- Fix inlining behaviour at the NVVM IR level (#246 / #247)
self.argtypes,
debug=self.debug,
lineinfo=lineinfo,
inline=inline,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just double checking, since it is happening on LLVM IR level there is no need to pass inline, because it affects only Numba IR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, exactly - inline affects online Numba IR, forceinline affects only LLVM IR (which is how it should be when all is working correctly)

``"always"``. See `Notes on Inlining
<https://numba.readthedocs.io/en/stable/developer/inlining.html>`_.
:type inline: str
:param forceinline: Enables inlining at the NVVM IR level when set to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want it to be bool, or we want to expose llvm IR attributes directly:
https://llvm.org/docs/LangRef.html#function-attributes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want it to be bool so it's consistent with Numba. There is a plan for Numba to expose LLVM attributes directly, so I'll aim to align with that in future.

Copy link
Contributor

@ZzEeKkAa ZzEeKkAa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, few comments!

Thank you for fixing it. Did not mean to break inlining with the original MR. There was luck of docs that explains that behavior...

@gmarkall
Copy link
Contributor Author

gmarkall commented May 9, 2025

There was luck of docs that explains that behavior...

Not only that, but I didn't even know inline was working in any sense at all... I thought it was one of those things that was perpetually (or for a really long time) broken / a no-op.

@gmarkall
Copy link
Contributor Author

gmarkall commented May 9, 2025

@ZzEeKkAa Many thanks for the review - I've responded to the comments - let me know if I should follow up any further!

@gmarkall gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 3 - Ready for Review Ready for review by team labels May 9, 2025
@gmarkall
Copy link
Contributor Author

gmarkall commented May 9, 2025

I'll merge this a little later today, as it seems to have been the resolution to RAPIDS hanging (in rapidsai/cudf#18688) in the v0.10.1 release - assuming there are no objections?

@gmarkall gmarkall merged commit 3a3ff78 into NVIDIA:main May 9, 2025
37 checks passed
@gmarkall gmarkall added 5 - Ready to merge Testing and reviews complete, ready to merge and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author labels May 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

5 - Ready to merge Testing and reviews complete, ready to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants