Add tlx.vote_ballot_sync op that lowers to NVVM::VoteSyncOp by manman-ren · Pull Request #828 · facebookexperimental/triton

manman-ren · 2026-02-02T19:43:21Z

This will be used by rescaling optimization in FA4. Kernel change will be in a separate PR.

njriasan

A couple minor nits but overall this looks great. Thanks!

njriasan · 2026-02-03T16:39:10Z

-for mode in ["fwd", "bwd"]:
-    for causal in [False, True]:
-        for BWD_BLOCK_M1 in [64, 128]:
+for mode in ["fwd"]:  # , "bwd"]:


Is this code fully broken? If not can we update this code to remove the comments?

This is for debugging. Will revert.

njriasan · 2026-02-03T16:40:37Z

-                            tlx.local_store(subslice, acc)
+                        # Perform warp-level ballot vote to check if any thread needs rescaling
+                        # 0xFFFFFFFF means all 32 threads in the warp participate
+                        if RESCALE_OPT:


Minor nit: Might be easier to understand this code if we assert USE_WHERE = False if RESCALE_OPT = False. That way we can even split the logic into a helper function and make it very simple to understand. What do you think?

njriasan · 2026-02-03T16:41:33Z

+                                tlx.local_store(subslice, acc)
+                        else:
+                            # option 2: use a single scalar IfOp
+                            if RESCALE_OPT:


I'd be curious to know if we have performance numbers for this yet? I can report the GB300 numbers today if we update the TritonBench kernel.

I'm curious too. I'm wondering if we should defer the kernel changes to separate PR until we get some perf numbers and numerics reuslts.

Yes I should have included some perf number. I am still working on #820 for the USE_WHERE path. Currently, enabling RESCALE_OPT + not USE_WHERE has some perf win, pending numerical results. This path performs reduction for all rows in the block. I can get rid of the kernel changes or hard-code to RESCALE_OPT being off.

htyu · 2026-02-03T16:54:10Z

+
+    # Run the kernel with 1 warp
+    vote_ballot_kernel[(1, )](output, BLOCK_SIZE=32, num_warps=1)
+    torch.cuda.synchronize()


why needs torch.cuda.synchronize()?

htyu · 2026-02-03T16:57:37Z

+
+@tl.builtin
+def vote_ballot_sync(
+    mask: tl.constexpr,


Is mask required to be constant from PTX pov?

No, it doesn't need to be.

Perhaps change int to int ?

htyu · 2026-02-03T17:11:35Z

+    Returns:
+        If pred is scalar: A 32-bit integer where bit N is set if thread N's
+                          predicate was true and thread N is in the mask.
+        If pred is tensor: A tensor of i32 with the same shape, where each


Same shape with result value?

returns a tensor with the same shape as pred

htyu · 2026-02-03T17:13:52Z

+    C1 = 0.695146143436431884765625
+    C2 = 0.227564394474029541015625
+    C3 = 0.077119089663028717041015625
+


These changes seem unrelate to the PR?

Yes, these are for exp simulation from your original PR. Will revert

htyu

Can you please also update the README? Thanks.

htyu

LGTM!

htyu · 2026-02-05T17:35:57Z

+
+@tl.builtin
+def vote_ballot_sync(
+    mask: tl.constexpr,


Perhaps change int to int ?

meta-codesync · 2026-02-09T21:43:43Z

@manman-ren has imported this pull request. If you are a Meta employee, you can view this in D92753224.

Summary: This will be used by rescaling optimization in FA4. Kernel change will be in a separate PR. Reviewed By: htyu Differential Revision: D92753224 Pulled By: manman-ren

meta-codesync · 2026-02-11T21:03:58Z

@manman-ren has exported this pull request. If you are a Meta employee, you can view the originating Diff in D92753224.

meta-codesync · 2026-02-12T01:53:39Z

@manman-ren merged this pull request in 517ce79.

Summary: This will be used by rescaling optimization in FA4. Kernel change will be in a separate PR. Pull Request resolved: #828 Reviewed By: htyu Differential Revision: D92753224 Pulled By: manman-ren fbshipit-source-id: f3df62dcb5193f0c1022fca41bd9aa2828084de8

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 2, 2026

manman-ren marked this pull request as draft February 2, 2026 19:44

manman-ren marked this pull request as ready for review February 3, 2026 16:11

manman-ren requested review from htyu and njriasan February 3, 2026 16:11

njriasan approved these changes Feb 3, 2026

View reviewed changes

htyu reviewed Feb 3, 2026

View reviewed changes

manman-ren force-pushed the mren/if-tensor-value branch from 4d1660c to 7d29173 Compare February 5, 2026 00:54

htyu approved these changes Feb 5, 2026

View reviewed changes

manman-ren force-pushed the mren/if-tensor-value branch from 7d29173 to a26607a Compare February 6, 2026 23:28

Add tlx.vote_ballot_sync op that lowers to NVVM::VoteSyncOp (#828)

4dfcc3c

Summary: This will be used by rescaling optimization in FA4. Kernel change will be in a separate PR. Reviewed By: htyu Differential Revision: D92753224 Pulled By: manman-ren

facebook-github-bot force-pushed the mren/if-tensor-value branch from a26607a to 4dfcc3c Compare February 11, 2026 21:03

meta-codesync Bot added fb-exported meta-exported labels Feb 11, 2026

meta-codesync Bot closed this in 517ce79 Feb 12, 2026

facebook-github-bot added the Merged label Feb 12, 2026

manman-ren deleted the mren/if-tensor-value branch February 13, 2026 18:41

Conversation

manman-ren commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

njriasan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

htyu left a comment

Choose a reason for hiding this comment

Uh oh!

htyu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

meta-codesync Bot commented Feb 9, 2026

Uh oh!

meta-codesync Bot commented Feb 11, 2026

Uh oh!

meta-codesync Bot commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

manman-ren commented Feb 2, 2026 •

edited

Loading