[ROCm] Support for Whisper v1 with Aiter Unified Attention and Aiter Flash Attention #28376

apinge · 2025-11-10T03:20:12Z

Purpose

This PR enables broader attention backend support for Whisper v1 on ROCm platform.
Building on the existing Triton backend PR #28346 , it introduces:

Aiter Unified Attention
Aiter Flash Attention

This change depends on modifications from the Triton backend PR. Since both PRs modify the same file (vllm/v1/worker/utils.py).

Test Plan

Whisper v1 on ROCm with Aiter backends requires the latest Aiter version, tested with commit 7639e55

export CUDA_VISIBLE_DEVICES=0,1
export VLLM_USE_ROCM_AITER=1
# test aiter unified attention
export VLLM_ATTENTION_BACKEND=ROCM_AITER_UNIFIED_ATTN
 pytest ./tests/models/multimodal/generation/test_whisper.py 
 # test aiter flash attention
export VLLM_ATTENTION_BACKEND=ROCM_AITER_FA
 pytest ./tests/models/multimodal/generation/test_whisper.py

Test Result

Result of Aiter Unified Attention

============================================================================================ test session starts =============================================================================================
platform linux -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0
rootdir: /root/workspace/vllm_master_251109
configfile: pyproject.toml
plugins: asyncio-1.2.0, anyio-4.11.0
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 3 items

models/multimodal/generation/test_whisper.py ...                                                                                                                                                       [100%]

============================================================================================== warnings summary ==============================================================================================
<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

../../../../usr/local/lib/python3.12/dist-packages/audioread/rawread.py:16
  /usr/local/lib/python3.12/dist-packages/audioread/rawread.py:16: DeprecationWarning: 'aifc' is deprecated and slated for removal in Python 3.13
    import aifc

../../../../usr/local/lib/python3.12/dist-packages/audioread/rawread.py:17
  /usr/local/lib/python3.12/dist-packages/audioread/rawread.py:17: DeprecationWarning: 'audioop' is deprecated and slated for removal in Python 3.13
    import audioop

../../../../usr/local/lib/python3.12/dist-packages/audioread/rawread.py:19
  /usr/local/lib/python3.12/dist-packages/audioread/rawread.py:19: DeprecationWarning: 'sunau' is deprecated and slated for removal in Python 3.13
    import sunau

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
======================================================================================= 3 passed, 5 warnings in 31.29s =======================================================================================
sys:1: DeprecationWarning: builtin type swigvarlink has no __module__ attribute

Result for Aiter Flash Attention

pytest ./models/multimodal/generation/test_whisper.py
============================================================================================ test session starts =============================================================================================
platform linux -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0
rootdir: /root/workspace/vllm_master_251109
configfile: pyproject.toml
plugins: asyncio-1.2.0, anyio-4.11.0
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 3 items

models/multimodal/generation/test_whisper.py ...                                                                                                                                                       [100%]

============================================================================================== warnings summary ==============================================================================================
<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

../../../../usr/local/lib/python3.12/dist-packages/audioread/rawread.py:16
  /usr/local/lib/python3.12/dist-packages/audioread/rawread.py:16: DeprecationWarning: 'aifc' is deprecated and slated for removal in Python 3.13
    import aifc

../../../../usr/local/lib/python3.12/dist-packages/audioread/rawread.py:17
  /usr/local/lib/python3.12/dist-packages/audioread/rawread.py:17: DeprecationWarning: 'audioop' is deprecated and slated for removal in Python 3.13
    import audioop

../../../../usr/local/lib/python3.12/dist-packages/audioread/rawread.py:19
  /usr/local/lib/python3.12/dist-packages/audioread/rawread.py:19: DeprecationWarning: 'sunau' is deprecated and slated for removal in Python 3.13
    import sunau

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
======================================================================================= 3 passed, 5 warnings in 29.38s =======================================================================================
sys:1: DeprecationWarning: builtin type swigvarlink has no __module__ attribute

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

github-actions · 2025-11-10T03:20:20Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

gemini-code-assist

Code Review

This pull request adds support for Whisper v1 on ROCm by enabling Aiter Unified Attention and Aiter Flash Attention for cross-attention workloads. The changes correctly modify attention type validation to allow for the ENCODER_DECODER attention type. However, I've identified a critical issue in the AiterFlashAttentionImpl where it incorrectly attempts to use the paged KV cache as a contiguous tensor for cross-attention when the key and value tensors are not provided. This will result in incorrect behavior and must be addressed.

gemini-code-assist · 2025-11-10T03:21:17Z

vllm/v1/attention/backends/rocm_aiter_fa.py

+        key = key[:num_actual_tokens] if key is not None else key_cache[:num_actual_tokens]
+        value = value[:num_actual_tokens] if value is not None else value_cache[:num_actual_tokens]


The logic to handle key and value being None for cross-attention is incorrect. The key_cache and value_cache are paged tensors, not contiguous tensors of keys and values.

The shape of key_cache is [num_blocks, block_size, num_kv_heads, head_size]. Slicing it with [:num_actual_tokens] incorrectly treats num_actual_tokens as a number of blocks and will result in a tensor with an incorrect shape and contents, leading to errors or wrong results in the attention computation.

For cross-attention where key and value are None, the keys and values must be gathered from the paged KV cache into a contiguous tensor before being passed to attention functions like aiter.flash_attn_varlen_func. The extend_forward method in this same class provides an example of how to do this using cp_mha_gather_cache. A similar approach should be adopted for the prefill and decode paths when key and value are None.

Fixed in latest commit to correctly handle None key/value in cross-attention.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2025-11-10T03:23:33Z

vllm/v1/attention/backends/rocm_attn.py

        RocmAttentionBackend.validate_head_size(head_size)

-        if attn_type != AttentionType.DECODER:
+        if attn_type not in [AttentionType.DECODER, AttentionType.ENCODER_DECODER]:
            raise NotImplementedError(
-                "Encoder self-attention and "
-                "encoder/decoder cross-attention "
-                "are not implemented for "
-                "RocmAttentionImpl"
+                "Encoder self-attention is not implemented for RocmAttentionImpl"
            )

        self.fp8_dtype = current_platform.fp8_dtype()


Handle encoder–decoder calls without key/value tensors

The constructor now accepts AttentionType.ENCODER_DECODER, but forward still assumes key and value are always present. During decoder-side cross attention, later decode steps reuse the encoder KV cache and invoke this path with key=None/value=None. The new guard no longer blocks these calls, so chunked_prefill_paged_decode immediately dereferences key[:num_actual_tokens] and key.shape, raising an exception before any attention is computed. Either revert the constructor restriction or update forward to fall back to the cached tensors when key/value are None.

Useful? React with 👍 / 👎.

tjtanaa · 2025-11-10T04:06:57Z

@apinge I understand that you have stated that we need to use latest AITER commit. Is there any chance that this work with the AITER version in

vllm/docker/Dockerfile.rocm_base

Line 10 in 6b2b9fd

ARG AITER_BRANCH="9716b1b8"

?

apinge · 2025-11-10T07:57:22Z

@apinge I understand that you have stated that we need to use latest AITER commit. Is there any chance that this work with the AITER version in

vllm/docker/Dockerfile.rocm_base

Line 10 in 6b2b9fd

ARG AITER_BRANCH="9716b1b8"

?

This aiter version works fine, provided that #28383 is applied.

mergify · 2025-11-13T05:20:47Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @apinge.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

tjtanaa · 2025-11-13T05:22:31Z

There are the same review questions related to PR #28346 . We will wait for the other PR issue to sort out the issues.

…ash attention Signed-off-by: apinge <[email protected]>

Signed-off-by: apinge <[email protected]>

apinge · 2025-11-14T01:49:22Z

There are the same review questions related to PR #28346 . We will wait for the other PR issue to sort out the issues.

I found that the changes addressing the review questions are causing an accuracy problem — I’ve left a comment in #28346 .

Also, testing shows that the Aiter Flash Attention backend can hit a NaN issue for some prompts, and this issue has been fixed in PR #28670 .

…t#28376 Signed-off-by: Andreas Karatzas <[email protected]>

micah-wil · 2025-11-22T05:05:47Z

Hi @apinge, could you check the CI failure?

cc @tjtanaa

apinge requested a review from gshtras as a code owner November 10, 2025 03:20

mergify bot added rocm Related to AMD ROCm v1 labels Nov 10, 2025

gemini-code-assist bot reviewed Nov 10, 2025

View reviewed changes

chatgpt-codex-connector bot reviewed Nov 10, 2025

View reviewed changes

apinge force-pushed the whisper_v1_aiter branch from b80b469 to 3f33199 Compare November 10, 2025 06:33

This was referenced Nov 10, 2025

[CI Failure][AMD] Encoder-Decoder Models Fail on AMD CI #27442

Closed

[CI/Build] Skip encoder-decoder models on AMD #28156

Draft

heheda12345 requested a review from tjtanaa November 11, 2025 08:21

vllmellm mentioned this pull request Nov 12, 2025

[Bug]: encoder_decoder models (e.g. Whisper) is not working in vLLM 0.11 with ROCm #28184

Open

1 task

mergify bot added the needs-rebase label Nov 13, 2025

apinge added 4 commits November 13, 2025 14:23

add support for whisper v1 using aiter unified attention and aiter fl…

91e4a58

…ash attention Signed-off-by: apinge <[email protected]>

update key and value for the None condition

723bf67

Signed-off-by: apinge <[email protected]>

update format for rocm_aiter_unified_attn.py

1c69959

Signed-off-by: apinge <[email protected]>

update format

110481d

Signed-off-by: apinge <[email protected]>

apinge force-pushed the whisper_v1_aiter branch from 3f33199 to 110481d Compare November 13, 2025 06:32

mergify bot removed the needs-rebase label Nov 13, 2025

Alexei-V-Ivanov-AMD mentioned this pull request Nov 20, 2025

Updating the mirror of test-amd.yaml as of 2025-11-18 #29016

Merged

Alexei-V-Ivanov-AMD requested a review from LucasWilkinson November 20, 2025 18:00

charlifu mentioned this pull request Nov 20, 2025

[ROCm][CI] Fix tests/compile unit tests #28895

Open

SageMoore approved these changes Nov 20, 2025

View reviewed changes

tjtanaa approved these changes Nov 21, 2025

View reviewed changes

tjtanaa added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 21, 2025

AndreasKaratzas added a commit to ROCm/vllm that referenced this pull request Nov 21, 2025

[ROCm][CI] Keeping AITER FA attention for whisper pending vllm-projec…

e8334d9

…t#28376 Signed-off-by: Andreas Karatzas <[email protected]>

AndreasKaratzas mentioned this pull request Nov 21, 2025

[ROCm][CI] Fixes tests for pytorch nightly and python only builds #28979

Open

Merge branch 'main' into whisper_v1_aiter

8eeee0d

DarkLight1337 enabled auto-merge (squash) November 22, 2025 13:01

		key = key[:num_actual_tokens] if key is not None else key_cache[:num_actual_tokens]
		value = value[:num_actual_tokens] if value is not None else value_cache[:num_actual_tokens]

Uh oh!

[ROCm] Support for Whisper v1 with Aiter Unified Attention and Aiter Flash Attention #28376

Are you sure you want to change the base?

[ROCm] Support for Whisper v1 with Aiter Unified Attention and Aiter Flash Attention #28376

Conversation

apinge commented Nov 10, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Result of Aiter Unified Attention

Result for Aiter Flash Attention

Uh oh!

github-actions bot commented Nov 10, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

apinge Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

tjtanaa commented Nov 10, 2025

Uh oh!

apinge commented Nov 10, 2025

Uh oh!

mergify bot commented Nov 13, 2025

Uh oh!

tjtanaa commented Nov 13, 2025

Uh oh!

apinge commented Nov 14, 2025

Uh oh!

micah-wil commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

apinge commented Nov 10, 2025 •

edited by github-actions bot

Loading

micah-wil commented Nov 22, 2025 •

edited

Loading