-
Notifications
You must be signed in to change notification settings - Fork 54
fix: warp vote operations must use a constant int for the mode parameter
#606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: warp vote operations must use a constant int for the mode parameter
#606
Conversation
d64a29c to
7948745
Compare
mode parametermode parameter
Greptile OverviewGreptile SummaryFixes warp vote operations ( Key changes:
Confidence Score: 5/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant User as User Code
participant Intrinsic as all_sync/any_sync/eq_sync/ballot_sync
participant VoteSync as vote_sync_intrinsic
participant LLVM as LLVM IR Builder
participant NVVM as llvm.nvvm.vote.sync
User->>Intrinsic: call cuda.all_sync(mask, predicate)
Note over Intrinsic: @intrinsic decorator<br/>mode_value = 0
Intrinsic->>VoteSync: vote_sync_intrinsic(typingctx, mask_type, 0, predicate_type)
Note over VoteSync: Validate types<br/>Define codegen function
VoteSync->>LLVM: Create ir.Constant(i32, mode_value)
Note over LLVM: mode is now a constant!<br/>Previously was a runtime value
LLVM->>LLVM: Convert mask to i32
LLVM->>LLVM: Convert predicate to i1
LLVM->>NVVM: call vote_sync(mask_i32, constant mode, predicate_bool)
Note over NVVM: NVVM IR spec requires<br/>constant mode parameter
NVVM-->>LLVM: {i32 ballot, i1 result}
LLVM-->>VoteSync: tuple result
VoteSync-->>Intrinsic: codegen function
Intrinsic->>LLVM: extract_value(result_tuple, 1)
Note over Intrinsic: Extract boolean result<br/>for all_sync/any_sync/eq_sync<br/>or ballot (index 0) for ballot_sync
LLVM-->>User: boolean or i32 result
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7 files reviewed, no comments
|
/ok to test ad6ccd0 |
gmarkall
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks for this PR - this is great work!
I think there is a subtlety around the types that differs a bit from the warp sync intrinsics, and I've outlined my thoughts on it on the diff.
Note that I think CI is good with respect to this PR - the failure is an integration test with a downstream library (Awkward Array) and I think it is a bit sensitive to the versions of its dependencies, so unexpected / unrelated failures sometimes turn up. I'll confirm this and get back to you if there is a real issue though (unlikely, I think).
I've looked into this and it seems like running the Awkward Array tests with CuPy 13.6 for the version of Awkward that we use causes issues. So the fails are unrelated to this PR. I have a potential fix in #607. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7 files reviewed, no comments
|
@gmarkall Thank you for the feedback! I have added type checks for Couple things to note:
Test coverage:
Would appreciate if you can take a look! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7 files reviewed, no comments
|
/ok to test b44106a |
gmarkall
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update - this is looking great!
- Revert NVIDIA#536 "perf: remove context threading in various pointer abstractions" (NVIDIA#611) - fix: empty array type mismatch between host and device (NVIDIA#612) - fix: warp vote operations must use a constant int for the `mode` parameter (NVIDIA#606)
Fixes #592. Followed similar pattern as #231. cc. @gmarkall