[Feature] Enable TRITON_ATTN for Batch Invariance#33688
[Feature] Enable TRITON_ATTN for Batch Invariance#33688DarkLight1337 merged 8 commits intovllm-project:mainfrom
TRITON_ATTN for Batch Invariance#33688Conversation
Signed-off-by: frankwang28 <frank.wbb@hotmail.com>
There was a problem hiding this comment.
Code Review
This pull request effectively enables batch invariance for the TRITON_ATTN backend. The changes are well-structured and logical. By adding TRITON_ATTN to the list of decode-invariant backends and forcing the use of the deterministic 2D Triton kernel when batch invariance is enabled, the PR successfully addresses the non-determinism issue. The accompanying test updates ensure that this new capability is properly verified. The code is clean and the changes are correct. Excellent work!
yewentao256
left a comment
There was a problem hiding this comment.
LGTM, thanks for the work!
Since you have tested gpt oss, could you also added it in the doc? https://docs.vllm.ai/en/latest/features/batch_invariance/#tested-models
Signed-off-by: frankwang28 <frank.wbb@hotmail.com>
|
Documentation preview: https://vllm--33688.org.readthedocs.build/en/33688/ |
Signed-off-by: frankwang28 <frank.wbb@hotmail.com> Signed-off-by: felix01.yu <felix01.yu@vipshop.com>
Signed-off-by: frankwang28 <frank.wbb@hotmail.com>
Purpose
This PR adds
TRITON_ATTNsupport for batch invariance.Related / parent issue: #27433
Test Plan
Run tests with and without the
or is_batch_invariantcheck in the triton_unified_attention'sunified_attentionmethod.Test Result
Tests are run on a B200 (do not have access to a Hopper GPU to validate there 🙁)
Without:
With:
Doing some more testing using my own test suite, Triton seems to also be decode invariant (prefilling part of a decoded sequence and decoding the rest of a sequence seems logprob identical).
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.