Optimize tl.where by converting select to branch when lowering to llvm by manman-ren · Pull Request #820 · facebookexperimental/triton

manman-ren · 2026-01-28T21:26:12Z

One use case will be rescaling optimization of FA. When any thread in a warp needs rescaling of correction, correction_rescale will be invoked. We currently have 128 rows, 4 warps, each thread is responsible for one row.
Triton currently doesn't support ifOp on a tensor condition, which is needed for FA4 where should_rescale is a tensor value where it is uniform within a warp. The PR attempts to handle it when lowering to llvm, where we have a per-thread view.

manman-ren · 2026-01-29T23:59:23Z

+            alpha = tl.math.exp2(alpha_)
+            rescale_mask = alpha_ >= -8.0
+            alpha = tl.where(rescale_mask, 1.0, alpha)
+            m_ij = tl.where(rescale_mask, m_i, m_ij)


seems to hit a TLX issue with this change: Pipeline failed while executing [`TritonTLXFixup
python third_party/tlx/tutorials/blackwell-fa-ws-pipelined-persistent_test.py

CC @htyu

The actual error is
error: 'ttng.vote_ballot_sync' op operand #1 must be 1-bit signless integer, but got 'tensor<128x1xi1>'
ballot_result = tlx.vote_ballot_sync(0xFFFFFFFF, pred)
Probably need to make vote_ballot_sync a tensor operation.

manman-ren · 2026-02-02T19:23:49Z

+                            # All elements contain the same warp-level ballot value
+                            # Non-zero means at least one thread has alpha_1 < 1.0
+                            ballot_result = tlx.vote_ballot_sync(0xFFFFFFFF, pred)
+                            should_rescale = ballot_result != 0


alpha_1 is 128x1, we have 4 warps, 128 threads, each thread owns one row. We are voting to get rid of thread divergence within a warp, which means 32 threads will have the same value. The 4 warps can have different value. One way is to unroll this four times, one for each warp, then should_rescale will be 32x1, which is uniform. We can also perform reduction across 128x1 to have one value per 4 warps.
CC @njriasan @htyu @kvbp2k

So if one row needs rescale, the full 32 rows within that warp also need rescale as a result based on the current implementation?

Yes, to avoid warp divergence, all 32 threads in the warp will make the same decision.

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 28, 2026

manman-ren marked this pull request as draft January 28, 2026 21:26

manman-ren force-pushed the mren/add-vote branch from 1889c13 to 87912e7 Compare January 28, 2026 22:51

manman-ren commented Jan 29, 2026

View reviewed changes

manman-ren commented Feb 2, 2026

View reviewed changes

manman-ren changed the title ~~Add tlx.vote_ballot_sync op that lowers to NVVM::VoteSyncOp~~ Handle ifOp on a tensor condition by converting select to branch when lowering to llvm Feb 2, 2026

manman-ren changed the title ~~Handle ifOp on a tensor condition by converting select to branch when lowering to llvm~~ Optimize tl.where by converting select to branch when lowering to llvm Feb 3, 2026

manman-ren force-pushed the mren/add-vote branch from ddfaa3b to d81aabf Compare February 3, 2026 17:26

manman-ren changed the base branch from main to mren/if-tensor-value February 3, 2026 17:27

manman-ren mentioned this pull request Feb 3, 2026

Add tlx.vote_ballot_sync op that lowers to NVVM::VoteSyncOp #828

Closed

add TLX for vote_ballot

7d29173

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

manman-ren force-pushed the mren/if-tensor-value branch from 4d1660c to 7d29173 Compare February 5, 2026 00:54

manman-ren added 2 commits February 4, 2026 17:02

update FA TLX kernel

2e7ee6c

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

from select to branch

2ab1690

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

manman-ren force-pushed the mren/add-vote branch from d81aabf to 2ab1690 Compare February 5, 2026 01:02

manman-ren force-pushed the mren/if-tensor-value branch from 7d29173 to a26607a Compare February 6, 2026 23:28

facebook-github-bot force-pushed the mren/if-tensor-value branch from a26607a to 4dfcc3c Compare February 11, 2026 21:03

manman-ren deleted the branch mren/if-tensor-value February 13, 2026 18:41

manman-ren closed this Feb 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize tl.where by converting select to branch when lowering to llvm#820

Optimize tl.where by converting select to branch when lowering to llvm#820
manman-ren wants to merge 3 commits into
mren/if-tensor-valuefrom
mren/add-vote

manman-ren commented Jan 28, 2026 •

edited

Loading

Uh oh!

manman-ren Jan 29, 2026

Uh oh!

manman-ren Jan 30, 2026

Uh oh!

manman-ren Feb 2, 2026

Uh oh!

htyu Feb 2, 2026 •

edited

Loading

Uh oh!

manman-ren Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

manman-ren commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

manman-ren Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

manman-ren Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

manman-ren Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

htyu Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

manman-ren Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

manman-ren commented Jan 28, 2026 •

edited

Loading

htyu Feb 2, 2026 •

edited

Loading