Fix inlining behaviour at the NVVM IR level (Patch on v0.10.0) #247

gmarkall · 2025-05-09T14:21:46Z

PR #181 aimed to align the behaviour of the inline kwarg with that of upstream Numba, in that it now forces inlining at the Numba IR level. It turns out that this kwarg in Numba-CUDA already had the prior effect of enabling inlining at the NVVM IR level.

Because the default value of inline is "never", this was interpreted by the compile_cuda() function as a Trueish value and every device function got marked with the alwaysinline function attribute. This is a minor problem in that it probably forces a lot of inlining that we don't want, but also a major problem in that it triggers an NVVM bug that was only resolved in CUDA 12.3 that causes a hang in nvvmCompileProgram().

To rectify these issues, we add the forceinline kwarg to the @cuda.jit decorator and the cuda.compile[_*]() functions. Now, compile_cuda() will only enable inlining at the NVVM IR level for forceinline and not inline. This is aligned with the behaviour of upstream Numba (see numba/numba#10068). We now document the inline and forceinline kwargs to clarify the intent and behaviour for users.

For clarity: the behaviour is now:

The inline kwarg enables inlining only at the Numba IR level.
The forceinline kwarg enables inlining only at the NVVM IR level.

PR NVIDIA#181 aimed to align the behaviour of the `inline` kwarg with that of upstream Numba, in that it now forces inlining at the Numba IR level. It turns out that this kwarg in Numba-CUDA already had the prior effect of enabling inlining at the NVVM IR level. Because the default value of `inline` is `"never"`, this was interpreted by the `compile_cuda()` function as a `True`ish value and every device function got marked with the `alwaysinline` function attribute. This is a minor problem in that it probably forces a lot of inlining that we don't want, but also a major problem in that it triggers an NVVM bug that was only resolved in CUDA 12.3 that causes a hang in `nvvmCompileProgram()`. To rectify these issues, we add the `forceinline` kwarg to the `@cuda.jit` decorator and the `cuda.compile[_*]()` functions. Now, `compile_cuda()` will only enable inlining at the NVVM IR level for `forceinline` and not `inline`. This is aligned with the behaviour of upstream Numba (see numba/numba#10068). We now document the `inline` and `forceinline` kwargs to clarify the intent and behaviour for users. For clarity: the behaviour is now: - The `inline` kwarg enables inlining only at the Numba IR level. - The `forceinline` kwarg enables inlining only at the NVVM IR level.

- Fix inlining behaviour at the NVVM IR level (NVIDIA#246 / NVIDIA#247)

- Fix inlining behaviour at the NVVM IR level (#246 / #247)

gmarkall merged commit 879b1d4 into NVIDIA:branch-v010 May 9, 2025
37 checks passed

gmarkall added a commit to gmarkall/numba-cuda that referenced this pull request May 9, 2025

Bump version to 0.10.1

51d32d1

- Fix inlining behaviour at the NVVM IR level (NVIDIA#246 / NVIDIA#247)

gmarkall mentioned this pull request May 9, 2025

Bump version to 0.10.1 #248

Merged

gmarkall added a commit that referenced this pull request May 9, 2025

Bump version to 0.10.1 (#248)

6ca135b

- Fix inlining behaviour at the NVVM IR level (#246 / #247)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix inlining behaviour at the NVVM IR level (Patch on v0.10.0) #247

Fix inlining behaviour at the NVVM IR level (Patch on v0.10.0) #247

Uh oh!

gmarkall commented May 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix inlining behaviour at the NVVM IR level (Patch on v0.10.0) #247

Fix inlining behaviour at the NVVM IR level (Patch on v0.10.0) #247

Uh oh!

Conversation

gmarkall commented May 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant