[ROCm] Make Whisper causal attention backend-agnostic#34631
[ROCm] Make Whisper causal attention backend-agnostic#34631laudney wants to merge 1 commit intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request makes the Whisper causal attention backend-agnostic by removing a hardcoded allowlist of backends. This change correctly identifies that the explicit list was redundant, as backend validation is already handled by get_attn_backend. By deleting the allowlist and associated imports, the code is simplified and more maintainable, and it enables support for newer backends on platforms like ROCm without requiring modifications to this file. The changes are sound and represent a good improvement.
|
cc @tjtanaa @AndreasKaratzas can you verify this model/attention backend combination? |
Related PRs (RDNA4/gfx12 series)This PR is part of a series enabling RDNA4 (gfx12) support in vLLM:
Each PR is independent and can be reviewed/merged separately. |
77856e7 to
a3d136a
Compare
|
Hi @laudney, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
a3d136a to
23aef46
Compare
Definitely. @laudney can you please help me save some time? Would it be easy to have all of those PRs into one branch and give me the fork and commit hash of that branch? If you do that I will certainly be able to evaluate your changes and share the results here :) |
|
@AndreasKaratzas Sure! Here you go: Fork: This branch contains all 8 commits from the 5 PRs on top of upstream/main:
Thank you for taking the time to evaluate! |
|
Will let you know as soon as I can the results :) We are trying to address some high priority tasks first for CI to report accurately results, and I'm going to get back to you probably by tomorrow this time. Sry for delays |
|
@AndreasKaratzas No worries at all, take your time! Really appreciate you looking into this. We all want better AMD support in this amazing project, so happy to help if any questions come up during testing. |
Full CI run is up: https://buildkite.com/vllm/amd-ci/builds/4991/steps/canvas |
|
@laudney I am going to take a look at this again tomorrow :) Sry for the delay, have been working on some critical CI-related tasks. |
|
Full CI build as of yesterday: https://buildkite.com/vllm/amd-ci/builds/4991/steps/canvas There were no new regressions observed. At the same time I realize that these changes mostly affect other architectures, not gfx9. Therefore, I should ask testing on gfx11/12 as well for those. Also there are changes inside the |
23aef46 to
acee9b8
Compare
|
Hey, this is approved and rebased on latest main. What else do I need to do to get it merged? |
|
From @AndreasKaratzas
|
I see that this PR has probably been heavily refactored. Specifically, the file that I note in my first comment ( |
Remove the FlashAttentionBackend-only guard in whisper_causal.py so that Voxtral and other Whisper-based models can run on ROCm/RDNA4 with the Triton attention backend. - Remove issubclass(backend, FlashAttentionBackend) check - Delegate get_kv_cache_shape to the underlying backend instead of hardcoding Flash's (2, num_blocks, ...) layout Signed-off-by: L.B.R. <lbr@mmonad.com>
acee9b8 to
4a4f4d8
Compare
|
@AndreasKaratzas Rebased on latest main. Ready for CI. |
Summary
FlashAttentionBackend,AiterFlashAttentionBackend,RocmAttentionBackend,TritonAttentionBackend) fromwhisper_causal.py_SUPPORTED_BACKENDScheckget_attn_backendfrom the attention selector andsubclass_attention_backend_with_overridesto wrap the selected backend — the allowlist was redundant and blocked backends that work fine (e.g. on RDNA4/gfx12)This is a pure deletion (~39 lines removed, 0 added). The
subclass_attention_backend_with_overridesmechanism already validates backend compatibility at a lower level.Test plan