[Feature] Enable `TRITON_ATTN` for Batch Invariance by frankwang28 · Pull Request #33688 · vllm-project/vllm

frankwang28 · 2026-02-03T11:17:44Z

Purpose

This PR adds TRITON_ATTN support for batch invariance.

Related / parent issue: #27433

Test Plan

Run tests with and without the or is_batch_invariant check in the triton_unified_attention's unified_attention method.

Test Result

Tests are run on a B200 (do not have access to a Hopper GPU to validate there 🙁)

Without:

CUDA_VISIBLE_DEVICES=2 VLLM_TEST_SEED=12345 pytest tests/v1/determinism/test_batch_invariance.py::test_logprobs_bitwise_batch_invariance_bs1_vs_bsN[TRITON_ATTN] -s

...

FAILED tests/v1/determinism/test_batch_invariance.py::test_logprobs_bitwise_batch_invariance_bs1_vs_bsN[TRITON_ATTN] - Failed: Batch invariance violated in 128/128 prompts. See output above for details.
===================================================================== 1 failed, 3 warnings in 80.20s (0:01:20) =====================================================================

CUDA_VISIBLE_DEVICES=2 VLLM_TEST_SEED=12345 VLLM_TEST_MODEL=openai/gpt-oss-120b pytest tests/v1/determinism/test_batch_invariance.py::test_logprobs_bitwise_batch_invariance_bs1_vs_bsN[TRITON_ATTN] -s

...

FAILED tests/v1/determinism/test_batch_invariance.py::test_logprobs_bitwise_batch_invariance_bs1_vs_bsN[TRITON_ATTN] - Failed: Batch invariance violated in 128/128 prompts. See output above for details.
==================================================================== 1 failed, 3 warnings in 268.72s (0:04:28) =====================================================================

With:

CUDA_VISIBLE_DEVICES=2 VLLM_TEST_SEED=12345 pytest tests/v1/determinism/test_batch_invariance.py::test_logprobs_bitwise_batch_invariance_bs1_vs_bsN[TRITON_ATTN] -s

...

========================================================================== 1 passed, 3 warnings in 33.91s ==========================================================================

CUDA_VISIBLE_DEVICES=2 VLLM_TEST_SEED=12345 VLLM_TEST_MODEL=openai/gpt-oss-120b pytest tests/v1/determinism/test_batch_invariance.py::test_logprobs_bitwise_batch_invariance_bs1_vs_bsN[TRITON_ATTN] -s

...

===================================================================== 1 passed, 3 warnings in 65.82s (0:01:05) =====================================================================

Doing some more testing using my own test suite, Triton seems to also be decode invariant (prefilling part of a decoded sequence and decoding the rest of a sequence seems logprob identical).

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: frankwang28 <frank.wbb@hotmail.com>

gemini-code-assist

Code Review

This pull request effectively enables batch invariance for the TRITON_ATTN backend. The changes are well-structured and logical. By adding TRITON_ATTN to the list of decode-invariant backends and forcing the use of the deterministic 2D Triton kernel when batch invariance is enabled, the PR successfully addresses the non-determinism issue. The accompanying test updates ensure that this new capability is properly verified. The code is clean and the changes are correct. Excellent work!

yewentao256

LGTM, thanks for the work!
Since you have tested gpt oss, could you also added it in the doc? https://docs.vllm.ai/en/latest/features/batch_invariance/#tested-models

Signed-off-by: frankwang28 <frank.wbb@hotmail.com>

mergify · 2026-02-03T21:35:32Z

Documentation preview: https://vllm--33688.org.readthedocs.build/en/33688/

Signed-off-by: frankwang28 <frank.wbb@hotmail.com> Signed-off-by: felix01.yu <felix01.yu@vipshop.com>

Signed-off-by: frankwang28 <frank.wbb@hotmail.com>

frankwang28 added 6 commits November 18, 2025 08:23

Merge branch 'main' of https://github.com/frankwang28/vllm

bbd4b7a

Merge branch 'main' of https://github.com/frankwang28/vllm

927f390

Merge branch 'main' of https://github.com/frankwang28/vllm

0e7ed1f

Merge branch 'main' of https://github.com/frankwang28/vllm

4f7686a

Merge branch 'main' of https://github.com/frankwang28/vllm

739ecbc

Enable TRITON_ATTN for batch invariance

24833c2

Signed-off-by: frankwang28 <frank.wbb@hotmail.com>

frankwang28 requested review from tdoublep and yewentao256 as code owners February 3, 2026 11:17

mergify bot added the v1 label Feb 3, 2026

gemini-code-assist bot reviewed Feb 3, 2026

View reviewed changes

Merge branch 'main' into bat-inv-triton-attn

e207179

yewentao256 approved these changes Feb 3, 2026

View reviewed changes

yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 3, 2026

Add gpt-oss to tested models

3bdd86e

Signed-off-by: frankwang28 <frank.wbb@hotmail.com>

mergify bot added the documentation Improvements or additions to documentation label Feb 3, 2026

DarkLight1337 merged commit 45f8fd6 into vllm-project:main Feb 4, 2026
49 of 50 checks passed

gameofdimension pushed a commit to gameofdimension/vllm that referenced this pull request Feb 5, 2026

[Feature] Enable TRITON_ATTN for Batch Invariance (vllm-project#33688)

268fdbe

Signed-off-by: frankwang28 <frank.wbb@hotmail.com> Signed-off-by: felix01.yu <felix01.yu@vipshop.com>

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

[Feature] Enable TRITON_ATTN for Batch Invariance (vllm-project#33688)

02fc8fa

Signed-off-by: frankwang28 <frank.wbb@hotmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Enable `TRITON_ATTN` for Batch Invariance#33688

[Feature] Enable `TRITON_ATTN` for Batch Invariance#33688
DarkLight1337 merged 8 commits intovllm-project:mainfrom
frankwang28:bat-inv-triton-attn

frankwang28 commented Feb 3, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

yewentao256 left a comment

Uh oh!

mergify bot commented Feb 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

frankwang28 commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Feb 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

frankwang28 commented Feb 3, 2026 •

edited

Loading