[ROCm] Make Whisper causal attention backend-agnostic by laudney · Pull Request #34631 · vllm-project/vllm

laudney · 2026-02-16T17:07:15Z

Summary

Remove hardcoded backend allowlist (FlashAttentionBackend, AiterFlashAttentionBackend, RocmAttentionBackend, TritonAttentionBackend) from whisper_causal.py
Remove the corresponding explicit imports and _SUPPORTED_BACKENDS check
The model already uses get_attn_backend from the attention selector and subclass_attention_backend_with_overrides to wrap the selected backend — the allowlist was redundant and blocked backends that work fine (e.g. on RDNA4/gfx12)

This is a pure deletion (~39 lines removed, 0 added). The subclass_attention_backend_with_overrides mechanism already validates backend compatibility at a lower level.

Test plan

Whisper causal inference on FlashAttention (CUDA) — no behavior change
Whisper causal inference on ROCm with non-Flash backends
Existing CI should pass (no new code paths)

gemini-code-assist

Code Review

This pull request makes the Whisper causal attention backend-agnostic by removing a hardcoded allowlist of backends. This change correctly identifies that the explicit list was redundant, as backend validation is already handled by get_attn_backend. By deleting the allowlist and associated imports, the code is simplified and more maintainable, and it enables support for newer backends on platforms like ROCm without requiring modifications to this file. The changes are sound and represent a good improvement.

DarkLight1337 · 2026-02-17T04:10:56Z

cc @tjtanaa @AndreasKaratzas can you verify this model/attention backend combination?

laudney · 2026-02-17T20:00:41Z

Related PRs (RDNA4/gfx12 series)

This PR is part of a series enabling RDNA4 (gfx12) support in vLLM:

[ROCm] Enable wvSplitK skinny GEMM kernel for RDNA4/gfx1x decode #34709 — [ROCm] Enable wvSplitK/wvSplitKQ skinny GEMM kernels for RDNA4
[ROCm] Use supports_fp8() for FP8 feature gates instead of arch checks #34740 — [ROCm] Use supports_fp8() for FP8 feature gates instead of arch checks
[ROCm] Enable FP8 KV-cache and relax constraints for RDNA4 custom paged attention #34741 — [ROCm] Enable FP8 KV-cache for RDNA4 custom paged attention
[ROCm] Add MXFP4 inline dequant Triton kernel for RDNA4/gfx12 #34632 — [ROCm] Add MXFP4 inline dequant Triton kernel for RDNA4/gfx12

Each PR is independent and can be reviewed/merged separately.

mergify · 2026-02-17T20:27:07Z

Hi @laudney, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

AndreasKaratzas · 2026-02-17T21:58:21Z

cc @tjtanaa @AndreasKaratzas can you verify this model/attention backend combination?

Definitely. @laudney can you please help me save some time? Would it be easy to have all of those PRs into one branch and give me the fork and commit hash of that branch? If you do that I will certainly be able to evaluate your changes and share the results here :)

laudney · 2026-02-17T23:25:04Z

@AndreasKaratzas Sure! Here you go:

Fork: mmonad/vllm
Branch: feat/rocm-rdna4-combined
Commit: 0a3d6653a0d414a047904f301f9190305250c672

This branch contains all 8 commits from the 5 PRs on top of upstream/main:

21c8125 — [ROCm] Make Whisper causal attention backend-agnostic ([ROCm] Make Whisper causal attention backend-agnostic #34631)
9ae352a — [ROCm] Use supports_fp8() for FP8 feature gates instead of arch checks ([ROCm] Use supports_fp8() for FP8 feature gates instead of arch checks #34740)
0fbf2a8 — [ROCm] Enable FP8 KV-cache and relax constraints for RDNA4 custom paged attention ([ROCm] Enable FP8 KV-cache and relax constraints for RDNA4 custom paged attention #34741)
79b1817 — [ROCm] Enable LLMM1 skinny GEMM kernel for RDNA4/gfx1x decode ([ROCm] Enable wvSplitK skinny GEMM kernel for RDNA4/gfx1x decode #34709)
f838bfd — [ROCm] Enable wvSplitK skinny GEMM kernel for RDNA4/gfx1x decode ([ROCm] Enable wvSplitK skinny GEMM kernel for RDNA4/gfx1x decode #34709)
dd3c6f9 — [ROCm] Enable wvSplitKQ FP8 skinny GEMM kernel for RDNA4/gfx12 decode ([ROCm] Enable wvSplitK skinny GEMM kernel for RDNA4/gfx1x decode #34709)
dd153f6 — [ROCm] Add MXFP4 inline dequant Triton kernel for RDNA4/gfx12 ([ROCm] Add MXFP4 inline dequant Triton kernel for RDNA4/gfx12 #34632)
0a3d665 — Fix MXFP4 dequant kernel: use MoEActivation enum instead of strings ([ROCm] Add MXFP4 inline dequant Triton kernel for RDNA4/gfx12 #34632)

Thank you for taking the time to evaluate!

AndreasKaratzas · 2026-02-17T23:42:39Z

Will let you know as soon as I can the results :) We are trying to address some high priority tasks first for CI to report accurately results, and I'm going to get back to you probably by tomorrow this time. Sry for delays

laudney · 2026-02-18T00:19:49Z

@AndreasKaratzas No worries at all, take your time! Really appreciate you looking into this. We all want better AMD support in this amazing project, so happy to help if any questions come up during testing.

AndreasKaratzas · 2026-02-18T18:47:36Z

@AndreasKaratzas No worries at all, take your time! Really appreciate you looking into this. We all want better AMD support in this amazing project, so happy to help if any questions come up during testing.

Full CI run is up: https://buildkite.com/vllm/amd-ci/builds/4991/steps/canvas

AndreasKaratzas · 2026-02-19T06:45:26Z

@laudney I am going to take a look at this again tomorrow :) Sry for the delay, have been working on some critical CI-related tasks.

AndreasKaratzas · 2026-02-19T18:17:15Z

Full CI build as of yesterday: https://buildkite.com/vllm/amd-ci/builds/4991/steps/canvas

There were no new regressions observed. At the same time I realize that these changes mostly affect other architectures, not gfx9. Therefore, I should ask testing on gfx11/12 as well for those. Also there are changes inside the attention.cu file which means that we should do a perf analysis of that.

cc @tjtanaa @gshtras

laudney · 2026-03-22T12:55:20Z

Hey, this is approved and rebased on latest main. What else do I need to do to get it merged?

DarkLight1337 · 2026-03-23T02:34:17Z

From @AndreasKaratzas

Therefore, I should ask testing on gfx11/12 as well for those. Also there are changes inside the attention.cu file which means that we should do a perf analysis of that.

AndreasKaratzas · 2026-03-23T02:38:11Z

From @AndreasKaratzas

Therefore, I should ask testing on gfx11/12 as well for those. Also there are changes inside the attention.cu file which means that we should do a perf analysis of that.

I see that this PR has probably been heavily refactored. Specifically, the file that I note in my first comment (attention.cu) is not in the diff. So I think I am good with this PR as long as it is passing the tests. @laudney please rebase with latest main so that we do a comparison of failures.

Remove the FlashAttentionBackend-only guard in whisper_causal.py so that Voxtral and other Whisper-based models can run on ROCm/RDNA4 with the Triton attention backend. - Remove issubclass(backend, FlashAttentionBackend) check - Delegate get_kv_cache_shape to the underlying backend instead of hardcoding Flash's (2, num_blocks, ...) layout Signed-off-by: L.B.R. <lbr@mmonad.com>

laudney · 2026-03-23T09:08:26Z

@AndreasKaratzas Rebased on latest main. Ready for CI.

mergify bot added the rocm Related to AMD ROCm label Feb 16, 2026

github-project-automation bot added this to AMD Feb 16, 2026

github-project-automation bot moved this to Todo in AMD Feb 16, 2026

gemini-code-assist bot reviewed Feb 16, 2026

View reviewed changes

laudney force-pushed the fix/whisper-backend-agnostic branch from 77856e7 to a3d136a Compare February 17, 2026 20:22

laudney force-pushed the fix/whisper-backend-agnostic branch from a3d136a to 23aef46 Compare February 17, 2026 20:44

DarkLight1337 approved these changes Feb 18, 2026

View reviewed changes

laudney force-pushed the fix/whisper-backend-agnostic branch from 23aef46 to acee9b8 Compare March 22, 2026 12:48

tjtanaa added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 23, 2026

laudney force-pushed the fix/whisper-backend-agnostic branch from acee9b8 to 4a4f4d8 Compare March 23, 2026 09:08

Uh oh!

Conversation

laudney commented Feb 16, 2026

Summary

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

DarkLight1337 commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

laudney commented Feb 17, 2026

Related PRs (RDNA4/gfx12 series)

Uh oh!

mergify bot commented Feb 17, 2026

Uh oh!

AndreasKaratzas commented Feb 17, 2026

Uh oh!

laudney commented Feb 17, 2026

Uh oh!

AndreasKaratzas commented Feb 17, 2026

Uh oh!

laudney commented Feb 18, 2026

Uh oh!

AndreasKaratzas commented Feb 18, 2026

Uh oh!

AndreasKaratzas commented Feb 19, 2026

Uh oh!

AndreasKaratzas commented Feb 19, 2026

Uh oh!

laudney commented Mar 22, 2026

Uh oh!

DarkLight1337 commented Mar 23, 2026

Uh oh!

AndreasKaratzas commented Mar 23, 2026

Uh oh!

laudney commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

DarkLight1337 commented Feb 17, 2026 •

edited

Loading