Skip to content

Conversation

@kaeun97
Copy link
Contributor

@kaeun97 kaeun97 commented Nov 22, 2025

Fixes #592. Followed similar pattern as #231. cc. @gmarkall

@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 22, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@kaeun97 kaeun97 force-pushed the kaeun97/fix-warp-vote-operation branch from d64a29c to 7948745 Compare November 22, 2025 01:45
@kaeun97 kaeun97 changed the title [WIP] fix: warp vote operations must use a constant int for the mode parameter fix: warp vote operations must use a constant int for the mode parameter Nov 22, 2025
@kaeun97 kaeun97 marked this pull request as ready for review November 22, 2025 01:47
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Nov 22, 2025

Greptile Overview

Greptile Summary

Fixes warp vote operations (all_sync, any_sync, eq_sync, ballot_sync) to use constant int for the mode parameter as required by NVVM IR specification. The implementation migrates these functions from @jit device functions to @intrinsic decorators, following the same pattern established in PR #231 for shuffle operations.

Key changes:

  • Moved vote sync implementations from intrinsic_wrapper.py to intrinsics.py as proper intrinsics
  • The mode parameter is now passed as ir.Constant(i32, mode_value) instead of a runtime value
  • Removed obsolete vote_sync_intrinsic stub and its type declarations
  • Added comprehensive test coverage including LLVM IR validation, type validation, and SM 10.0 compatibility tests
  • Deleted intrinsic_wrapper.py as noted in the issue, since it's no longer needed

Confidence Score: 5/5

  • This PR is safe to merge with high confidence
  • The fix directly addresses the reported bug by ensuring the mode parameter is a constant as required by NVVM IR spec. Implementation follows the proven pattern from PR Fix Invalid NVVM IR emitted when lowering shfl_sync APIs #231, includes thorough test coverage (IR validation, type checking, SM 10.0 compatibility), and properly cleans up obsolete code.
  • No files require special attention

Important Files Changed

File Analysis

Filename Score Overview
numba_cuda/numba/cuda/intrinsics.py 5/5 Added intrinsic implementations for all_sync, any_sync, eq_sync, ballot_sync with constant mode parameter, properly following the NVVM IR specification
numba_cuda/numba/cuda/intrinsic_wrapper.py 5/5 Deleted as it's no longer needed after migrating vote sync functions to intrinsics
numba_cuda/numba/cuda/tests/cudapy/test_warp_ops.py 5/5 Added comprehensive tests for constant mode parameter verification, type validation, and SM 10.0 compatibility

Sequence Diagram

sequenceDiagram
    participant User as User Code
    participant Intrinsic as all_sync/any_sync/eq_sync/ballot_sync
    participant VoteSync as vote_sync_intrinsic
    participant LLVM as LLVM IR Builder
    participant NVVM as llvm.nvvm.vote.sync

    User->>Intrinsic: call cuda.all_sync(mask, predicate)
    Note over Intrinsic: @intrinsic decorator<br/>mode_value = 0
    Intrinsic->>VoteSync: vote_sync_intrinsic(typingctx, mask_type, 0, predicate_type)
    Note over VoteSync: Validate types<br/>Define codegen function
    VoteSync->>LLVM: Create ir.Constant(i32, mode_value)
    Note over LLVM: mode is now a constant!<br/>Previously was a runtime value
    LLVM->>LLVM: Convert mask to i32
    LLVM->>LLVM: Convert predicate to i1
    LLVM->>NVVM: call vote_sync(mask_i32, constant mode, predicate_bool)
    Note over NVVM: NVVM IR spec requires<br/>constant mode parameter
    NVVM-->>LLVM: {i32 ballot, i1 result}
    LLVM-->>VoteSync: tuple result
    VoteSync-->>Intrinsic: codegen function
    Intrinsic->>LLVM: extract_value(result_tuple, 1)
    Note over Intrinsic: Extract boolean result<br/>for all_sync/any_sync/eq_sync<br/>or ballot (index 0) for ballot_sync
    LLVM-->>User: boolean or i32 result
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@gmarkall gmarkall added the 3 - Ready for Review Ready for review by team label Nov 24, 2025
@gmarkall
Copy link
Contributor

/ok to test ad6ccd0

Copy link
Contributor

@gmarkall gmarkall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for this PR - this is great work!

I think there is a subtlety around the types that differs a bit from the warp sync intrinsics, and I've outlined my thoughts on it on the diff.

Note that I think CI is good with respect to this PR - the failure is an integration test with a downstream library (Awkward Array) and I think it is a bit sensitive to the versions of its dependencies, so unexpected / unrelated failures sometimes turn up. I'll confirm this and get back to you if there is a real issue though (unlikely, I think).

@gmarkall gmarkall added 4 - Waiting on author Waiting for author to respond to review and removed 3 - Ready for Review Ready for review by team labels Nov 24, 2025
@gmarkall
Copy link
Contributor

gmarkall commented Nov 24, 2025

Note that I think CI is good with respect to this PR - the failure is an integration test with a downstream library (Awkward Array) and I think it is a bit sensitive to the versions of its dependencies, so unexpected / unrelated failures sometimes turn up. I'll confirm this and get back to you if there is a real issue though (unlikely, I think).

I've looked into this and it seems like running the Awkward Array tests with CuPy 13.6 for the version of Awkward that we use causes issues. So the fails are unrelated to this PR. I have a potential fix in #607.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@kaeun97
Copy link
Contributor Author

kaeun97 commented Nov 26, 2025

@gmarkall Thank you for the feedback! I have added type checks for mask and predicate and some tests to validate the checks.

Couple things to note:

  • Although the document states 32-bit integers, as we do in shfl_sync_intrinsic, vote_sync_intrinsic also accepts 64 bit signed/unsigned integers as we truncate later
  • I've added a check to make sure that mask is not boolean (just accepts integers).

Test coverage:

  • Rejects float types for both mask and predicate
  • Rejects boolean type for mask (while allowing it for predicate)
  • Accepts signed/unsigned 32-bit and 64-bit integers for mask and predicate
  • Accepts literal values for mask and predicate

Would appreciate if you can take a look!

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@kaeun97 kaeun97 requested a review from gmarkall November 26, 2025 10:20
@gmarkall
Copy link
Contributor

/ok to test b44106a

@gmarkall gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels Nov 26, 2025
Copy link
Contributor

@gmarkall gmarkall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update - this is looking great!

@gmarkall gmarkall merged commit b06e183 into NVIDIA:main Nov 26, 2025
71 checks passed
gmarkall added a commit to gmarkall/numba-cuda that referenced this pull request Nov 27, 2025
- Revert NVIDIA#536 "perf: remove context threading in various pointer abstractions" (NVIDIA#611)
- fix: empty array type mismatch between host and device (NVIDIA#612)
- fix: warp vote operations must use a constant int for the `mode` parameter (NVIDIA#606)
@gmarkall gmarkall mentioned this pull request Nov 27, 2025
gmarkall added a commit that referenced this pull request Nov 27, 2025
- Revert #536 "perf: remove context threading in various pointer
abstractions" (#611)
- fix: empty array type mismatch between host and device (#612)
- fix: warp vote operations must use a constant int for the `mode`
parameter (#606)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

4 - Waiting on reviewer Waiting for reviewer to respond to author

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Warp vote operations must use a constant int for the mode

2 participants