Fix Invalid NVVM IR emitted when lowering shfl_sync APIs #231

isVoid · 2025-04-22T20:27:01Z

Per nvvm documentation, shuffle APIs takes a IR constant for the mode parameter. In current Numba implementation, it is a variable. This could crash the NVVM lowering when the ConstantFolds are not applied to IR (-opt=3 e.g.). This PR fixes #228

gmarkall

Thanks for moving this PR forward and fixing the correctness issues in it.

When I tried to follow through the implementation to understand what's going on and correlate it with the NVVM IR and CUDA C/C++ programming documentation, I found it very hard to follow what was going on, because the implementation diverges from both in a lot of small ways (because it always did in Numba, not as a result of this PR).

There was also some redundant work with creating structs and extracting values from them that was not quite redundant in the old implementation. Now that the implementation is all combined into a single intrinsic instead of the old way of having typing, lowering, a stub, and jitted functions, the extra packing and unpacking is un-needed.

To get to a point where I could follow through what was going on I found I had to iterate through modifying the code to align it with the docs and C/C++ implementation, and get rid of the extra packing and unpacking - this is in gmarkall@931f865

A couple of other notes from whilst I was doing that:

I kept making mistakes that involved test_shfl_sync_types test failing, so it was helpful to use subTests for it: gmarkall@de76f69
The docstrings for the intrinsics were missing, so they wouldn't have been visible in an IDE or shown up in the documentation.

I suspect we will also have a similar issue for the vote sync intrinsic, which is documented as requiring a constant mode as well: https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html#vote - we should get this PR in and then repeat the process for that intrinsic (and maybe check if there are other intrinsics we use where we make this mistake too).

gmarkall · 2025-04-23T11:57:08Z

numba_cuda/numba/cuda/tests/cudapy/test_warp_ops.py

        compiled[1, nelem](ary, xor)
        self.assertTrue(np.all(ary == exp))

+    def test_shfl_sync_const_mode_val(self):


We should also have a variant of this test that uses compile_ptx() with cc=(10, 0) to ensure that all is well when going through the path that uses LLVM 18.x IR.

numba_cuda/numba/cuda/tests/cudapy/test_warp_ops.py

numba_cuda/numba/cuda/intrinsics.py

This consists of a number of changes: - Add notes on implementation and links to docs - Map public API arguments to those referred to in the NVVM IR docs - Restore docstrings for intrinsics - Simplify some of the intrinsic logic - there is no need to construct aggregates containing the full returned structure from the intrinsic only to extract the computed value just prior to returning from the intrinsic.

isVoid · 2025-04-23T17:43:03Z

The docstrings for the intrinsics were missing, so they wouldn't have been visible in an IDE or shown up in the documentation.

Can you elaborate? The code is directly documented here: https://github.com/NVIDIA/numba-cuda/blob/main/docs/source/reference/kernel.rst#warp-intrinsics. It should show up when the API is accessible from numba.cuda?

isVoid · 2025-04-23T18:22:56Z

numba_cuda/numba/cuda/intrinsics.py

+    typingctx,
+    membermask_type,
+    mode_value,
+    a_type,


I had a slight confusion when reading the code, in that a_type is both passed in from typing and extracted from lowering arg parameter. Later to realize that the first a_type is used for typing and the second for lowering. And they are executed in different times.

Good point - there's no need to get it again from the signature, so I think that definition later of it can be removed - see below. What do you think?

gmarkall · 2025-05-01T14:28:59Z

numba_cuda/numba/cuda/intrinsics.py

+        membermask, a, b = args
+
+        # Types
+        a_type = sig.args[1]


I think we don't need this, as per https://github.com/NVIDIA/numba-cuda/pull/231/files#r2070337358:

Suggested change

a_type = sig.args[1]

Further to our discussion earlier where you suggested not doing this so that it's not captured from the outer function - I'll not commit this suggestion, and merge the PR as-is.

gmarkall · 2025-05-01T14:31:59Z

numba_cuda/numba/cuda/tests/cudapy/test_warp_ops.py

+
+        for func in subtest:
+            with self.subTest(func=func.__name__):
+                compile_ptx(func, (int32[:], int32), cc=(10, 0))


I'm a bit surprised this passed on CI on toolkits that pre-date CC 10.0. I'm looking into what's going on here now, but do you know why this didn't fail on older toolkits?

OK, I now recall it's a long-standing behaviour that if we ask for a higher CC than supported, we just use the highest supported CC (because PTX for an older CC can be compiled for hardware of a newer CC).

numba-cuda/numba_cuda/numba/cuda/cudadrv/nvvm.py

Lines 478 to 479 in 9c727b0

# CC higher than supported

return supported_ccs[-1] # Choose the highest

gmarkall

I'm happy with this if you are - whether you want to remove the second assignment of a_type marked on the diff or to proceed with it as-is.

- Fix Invalid NVVM IR emitted when lowering shfl_sync APIs (NVIDIA#231) - Add Bfloat16 Low++ Bindings (NVIDIA#166) - Fix cuda.jit decorator inline (NVIDIA#181) - Feature: cuda specific make_attribute_wrapper (NVIDIA#193) - return a none tuple if no libdevice path is found (NVIDIA#234)

- Local variable debug info deduplication (NVIDIA#222) - Fix package installation for wheels CI (NVIDIA#238) - Fix Invalid NVVM IR emitted when lowering shfl_sync APIs (NVIDIA#231) - Add Bfloat16 Low++ Bindings (NVIDIA#166) - Fix cuda.jit decorator inline (NVIDIA#181) - Feature: cuda specific make_attribute_wrapper (NVIDIA#193) - return a none tuple if no libdevice path is found (NVIDIA#234)

- Local variable debug info deduplication (#222) - Fix package installation for wheels CI (#238) - Fix Invalid NVVM IR emitted when lowering shfl_sync APIs (#231) - Add Bfloat16 Low++ Bindings (#166) - Fix cuda.jit decorator inline (#181) - Feature: cuda specific make_attribute_wrapper (#193) - return a none tuple if no libdevice path is found (#234)

…meter (#606) Fixes #592. Followed similar pattern as #231.

gmarkall and others added 5 commits April 22, 2025 10:43

WIP start

9be71e7

iosdfjsdi

889a5c3

cleanups, fix typing bug

5b6217f

additional cleanup

c412f32

Merge remote-tracking branch 'upstream' into issue-228

d11f31c

isVoid mentioned this pull request Apr 22, 2025

Fix Invalid NVVM IR emitted when lowering shfl_sync APIs #230

Closed

isVoid changed the title ~~Fix Invalid NVVM IR emitted when lowering shfl_sync APIs #230~~ Fix Invalid NVVM IR emitted when lowering shfl_sync APIs Apr 22, 2025

isVoid added 2 commits April 22, 2025 13:47

add specific test on shfl_sync

8181ff7

mor cleanups

97904a3

isVoid marked this pull request as ready for review April 22, 2025 20:53

isVoid requested a review from gmarkall April 22, 2025 21:05

gmarkall requested changes Apr 23, 2025

View reviewed changes

gmarkall reviewed Apr 23, 2025

View reviewed changes

numba_cuda/numba/cuda/tests/cudapy/test_warp_ops.py Outdated Show resolved Hide resolved

gmarkall reviewed Apr 23, 2025

View reviewed changes

numba_cuda/numba/cuda/intrinsics.py Outdated Show resolved Hide resolved

gmarkall reviewed Apr 23, 2025

View reviewed changes

numba_cuda/numba/cuda/intrinsics.py Outdated Show resolved Hide resolved

gmarkall reviewed Apr 23, 2025

View reviewed changes

numba_cuda/numba/cuda/intrinsics.py Outdated Show resolved Hide resolved

gmarkall added the 4 - Waiting on author Waiting for author to respond to review label Apr 23, 2025

gmarkall added 2 commits April 23, 2025 08:27

Use subTest to clearly identify failing shfl_sync types tests

c5dacbb

isVoid added 2 commits April 23, 2025 10:49

add subtest for test_shfl_sync_const_mode_val

f2b5c65

add cc=10,0 test

b359f9a

isVoid commented Apr 23, 2025

View reviewed changes

isVoid mentioned this pull request Apr 23, 2025

[BUG] vote_sync intrinsics should use constant for mode paramter #232

Closed

isVoid requested a review from gmarkall April 23, 2025 19:27

gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels Apr 30, 2025

gmarkall reviewed May 1, 2025

View reviewed changes

gmarkall approved these changes May 1, 2025

View reviewed changes

gmarkall added 4 - Waiting on author Waiting for author to respond to review 5 - Ready to merge Testing and reviews complete, ready to merge and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author 4 - Waiting on author Waiting for author to respond to review labels May 1, 2025

gmarkall merged commit 02df550 into NVIDIA:main May 2, 2025
37 checks passed

gmarkall mentioned this pull request May 2, 2025

Bump version to 0.10.0 #237

Closed

gmarkall mentioned this pull request May 3, 2025

Bump version to 0.10.0 #239

Merged

gmarkall mentioned this pull request Nov 17, 2025

[BUG] Warp vote operations must use a constant int for the mode #592

Closed

kaeun97 mentioned this pull request Nov 22, 2025

fix: warp vote operations must use a constant int for the mode parameter #606

Merged

gmarkall pushed a commit that referenced this pull request Nov 26, 2025

fix: warp vote operations must use a constant int for the mode para…

b06e183

…meter (#606) Fixes #592. Followed similar pattern as #231.

	# CC higher than supported
	return supported_ccs[-1] # Choose the highest

Fix Invalid NVVM IR emitted when lowering shfl_sync APIs #231

Fix Invalid NVVM IR emitted when lowering shfl_sync APIs #231

Uh oh!

Conversation

isVoid commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gmarkall left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

isVoid commented Apr 23, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gmarkall left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

isVoid commented Apr 22, 2025 •

edited

Loading