[ROCm] [VL] [Bugfix] Fix vit flash attn dispatcher logic for ROCm#26104
Merged
vllm-bot merged 6 commits intovllm-project:mainfrom Oct 3, 2025
Merged
[ROCm] [VL] [Bugfix] Fix vit flash attn dispatcher logic for ROCm#26104vllm-bot merged 6 commits intovllm-project:mainfrom
vllm-bot merged 6 commits intovllm-project:mainfrom
Conversation
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Contributor
There was a problem hiding this comment.
Code Review
This pull request refactors the Vision Transformer (ViT) flash attention dispatcher logic to centralize it and fix a bug on the ROCm platform. The changes are consistent across multiple model files, replacing duplicated logic with a call to a new utility function maybe_get_vit_flash_attn_backend. This is a good improvement for maintainability. However, I've found a critical issue in the implementation of check_upstream_fa_availability which could lead to runtime errors.
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
DarkLight1337
approved these changes
Oct 2, 2025
Member
|
cc @Isotr0py |
Isotr0py
approved these changes
Oct 2, 2025
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
auto-merge was automatically disabled
October 2, 2025 17:17
Head branch was pushed to by a user without write access
auto-merge was automatically disabled
October 3, 2025 04:00
Head branch was pushed to by a user without write access
yewentao256
pushed a commit
that referenced
this pull request
Oct 3, 2025
…6104) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>
tomeras91
pushed a commit
to tomeras91/vllm
that referenced
this pull request
Oct 6, 2025
…lm-project#26104) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
karan
pushed a commit
to karan/vllm
that referenced
this pull request
Oct 6, 2025
…lm-project#26104) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Signed-off-by: Karan Goel <3261985+karan@users.noreply.github.com>
lywa1998
pushed a commit
to lywa1998/vllm
that referenced
this pull request
Oct 20, 2025
…lm-project#26104) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
bradleyhd
added a commit
to bradleyhd/vllm
that referenced
this pull request
Oct 23, 2025
Summary: Pull Request resolved: vllm-project#27124 In vllm-project#26104, some changes were made in layer.py that resulted in always trying to switch to FA backend for ViT, even when `VLLM_ATTENTION_BACKEND` is set. This broke Meta's internal AMD pipelines as it is not desired nor expected behavior. With this change, the models that were changed in the offending PR can explicitly opt-in to this behavior. Reviewed By: Prowindy Differential Revision: D84946967
alhridoy
pushed a commit
to alhridoy/vllm
that referenced
this pull request
Oct 24, 2025
…lm-project#26104) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
rtourgeman
pushed a commit
to rtourgeman/vllm
that referenced
this pull request
Nov 10, 2025
…lm-project#26104) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
devpatelio
pushed a commit
to SumanthRH/vllm
that referenced
this pull request
Nov 29, 2025
…lm-project#26104) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
The refactoring of code has causes the vit flash attn dispatcher logic to enter the wrong code path to import
from vllm.vllm_flash_attn import flash_attn_varlen_funcon ROCm platform.Fix incorrect usage of
aiter.flash_attn_varlen_funcinMultiHeadAttentionclass introduced in #23978Test Plan
Evaluate accuracy of all of the models that uses this vit flash attn dispatcher logic on chartqa dataset.
NOTE: The accuracy by no means indicates the actual model performance on benchmark and the accuracy is not evaluate through the same procedure used in the official release.
Bugfix of
MultiHeadAttentionclass is validated throughOpenGVLab/InternVL3_5-8B.Test Result
Flash Attention Backend Comparison: AIter vs Non-AIter
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.