Skip to content

Conversation

@gmarkall
Copy link
Contributor

@gmarkall gmarkall commented May 9, 2025

PR #181 aimed to align the behaviour of the inline kwarg with that of upstream Numba, in that it now forces inlining at the Numba IR level. It turns out that this kwarg in Numba-CUDA already had the prior effect of enabling inlining at the NVVM IR level.

Because the default value of inline is "never", this was interpreted by the compile_cuda() function as a Trueish value and every device function got marked with the alwaysinline function attribute. This is a minor problem in that it probably forces a lot of inlining that we don't want, but also a major problem in that it triggers an NVVM bug that was only resolved in CUDA 12.3 that causes a hang in nvvmCompileProgram().

To rectify these issues, we add the forceinline kwarg to the @cuda.jit decorator and the cuda.compile[_*]() functions. Now, compile_cuda() will only enable inlining at the NVVM IR level for forceinline and not inline. This is aligned with the behaviour of upstream Numba (see numba/numba#10068). We now document the inline and forceinline kwargs to clarify the intent and behaviour for users.

For clarity: the behaviour is now:

  • The inline kwarg enables inlining only at the Numba IR level.
  • The forceinline kwarg enables inlining only at the NVVM IR level.

PR NVIDIA#181 aimed to align the behaviour of the `inline` kwarg with that of
upstream Numba, in that it now forces inlining at the Numba IR level.
It turns out that this kwarg in Numba-CUDA already had the prior effect
of enabling inlining at the NVVM IR level.

Because the default value of `inline` is `"never"`, this was interpreted
by the `compile_cuda()` function as a `True`ish value and every device
function got marked with the `alwaysinline` function attribute. This is
a minor problem in that it probably forces a lot of inlining that we
don't want, but also a major problem in that it triggers an NVVM bug
that was only resolved in CUDA 12.3 that causes a hang in
`nvvmCompileProgram()`.

To rectify these issues, we add the `forceinline` kwarg to the
`@cuda.jit` decorator and the `cuda.compile[_*]()` functions. Now,
`compile_cuda()` will only enable inlining at the NVVM IR level for
`forceinline` and not `inline`. This is aligned with the behaviour of
upstream Numba (see numba/numba#10068). We now document the `inline` and
`forceinline` kwargs to clarify the intent and behaviour for users.

For clarity: the behaviour is now:

- The `inline` kwarg enables inlining only at the Numba IR level.
- The `forceinline` kwarg enables inlining only at the NVVM IR level.
@gmarkall gmarkall merged commit 879b1d4 into NVIDIA:branch-v010 May 9, 2025
37 checks passed
gmarkall added a commit to gmarkall/numba-cuda that referenced this pull request May 9, 2025
- Fix inlining behaviour at the NVVM IR level (NVIDIA#246 / NVIDIA#247)
@gmarkall gmarkall mentioned this pull request May 9, 2025
gmarkall added a commit that referenced this pull request May 9, 2025
- Fix inlining behaviour at the NVVM IR level (#246 / #247)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant