[BugFix][VL] Fix FA selection on Qwen2.5-VL#27790
[BugFix][VL] Fix FA selection on Qwen2.5-VL#27790yeqcharlotte merged 2 commits intovllm-project:mainfrom
Conversation
Signed-off-by: zhewenli <zhewenli@meta.com>
There was a problem hiding this comment.
Code Review
This pull request aims to fix a crash on AMD platforms related to Flash Attention in Qwen2.5-VL. The root cause is that use_upstream_fa is not correctly set before attempting to import flash_attn_varlen_func.
While the change correctly identifies the logic needed to set use_upstream_fa, it is placed after the function call that triggers the ImportError. I've suggested moving the logic to execute before the call to maybe_get_vit_flash_attn_backend to resolve the crash. This ensures the correct flags are set before the problematic import is attempted.
|
cc @tjtanaa @DarkLight1337 @Isotr0py I think the current logic about ViT attention backend selection is a bit convoluted, so we should revisit the logic there and clean them up |
|
Hi @zhewenl, thanks for the report. There's been a lot of activity regarding flash_attn in the last two days. Here's the situation: The pull request fixed the hallucinations about 10 days ago, and it broke again yesterday. I tried to fix it, at least for TORCH.SDPA, but I couldn't test flash_attn until yesterday. Here's the pull request I'm working on: As I mention in the request, I'm now unsure how to correctly select the support backend. I see inconsistencies in the wrapper, the naming convention, and the usage for flash_attn. Would you be willing to join us so we can stabilize everything, both now and in the future? @zhewenl @ywang96 @aarnphm @DarkLight1337 @tjtanaa @lgeiger @Lucaskabela |
@JartX Sounds good, feel free to loop me in the discussion, more than happy to help!! |
|
@zhewenl I just sent you a fork invitation :) |
Signed-off-by: zhewenli <zhewenli@meta.com> Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: zhewenli <zhewenli@meta.com> Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: zhewenli <zhewenli@meta.com> Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: zhewenli <zhewenli@meta.com> Co-authored-by: Roger Wang <hey@rogerw.io>
Purpose
#27190 breaks AMD CI (and also qwen2.5 vl): tests/v1/entrypoints/openai/responses/test_image.py : with _Backend.FLASH_ATTN it did NOT set use_upstream_fa = True(code), so we got ImportError: cannot import name 'flash_attn_varlen_func' from 'vllm.vllm_flash_attn' (unknown location) (failure)
Test Plan
CI: https://buildkite.com/vllm/amd-ci/builds/736