[CPU] Support for Whisper by aditew01 · Pull Request #30062 · vllm-project/vllm

aditew01 · 2025-12-04T13:46:53Z

Purpose

Enable support for Whisper for CPU backend

Test Plan

Added tests to tests/models/multimodal/generation/test_whisper.py -> should be enabled in run-cpu-test.sh
python examples/offline_inference/audio_language.py -m whisper
Test Result

python examples/offline_inference/audio_language.py -m whisper
...

INFO 12-04 13:43:46 [llm.py:343] Supported tasks: ['transcription']
Adding requests: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:04<00:00,  4.37s/it]
Processed prompts: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:15<00:00, 15.88s/it, est. speed input: 0.06 toks/s, output: 2.90 toks/s]
 The first words I spoke in the original phonograph, a little piece of practical poetry. Mary had a little lamb, its streets were quite as slow, and everywhere that Mary went the lamb was sure to go.

VLLM_CPU_KVCACHE_SPACE=32 pytest -x -v -s  tests/models/multimodal/generation/test_whisper.py -m cpu_model

 The first words I spoke in the original phonograph, a little piece of practical poetry. Mary had a little lamb, its feet were quite as slow, and everywhere that Mary went, the lamb was sure to go.
 And the 0-1 pitch on the way to Edgar Martinez. Swung on the line. Now the left field line for a base hit. Here comes Joy. Here is Junior to third base. They're going to wave him in. The throw to the plate will be late. The Mariners are going to play for the American League Championship. I don't believe it. It just continues. My, oh, my.
 The first words I spoke in the original phonograph, a little piece of practical poetry. Mary had a little lamb, its feet were quite as slow, and everywhere that Mary went, the lamb was sure to go.
 And the 0-1 pitch on the way to Edgar Martinez. Swung on the line. Now the left field line for a base hit. Here comes Joy. Here is Junior to third base. They're going to wave him in. The throw to the plate will be late. The Mariners are going to play for the American League Championship. I don't believe it. It just continues. My, oh, my.
 The first words I spoke in the original phonograph, a little piece of practical poetry. Mary had a little lamb, its feet were quite as slow, and everywhere that Mary went, the lamb was sure to go.
 And the 0-1 pitch on the way to Edgar Martinez. Swung on the line. Now the left field line for a base hit. Here comes Joy. Here is Junior to third base. They're going to wave him in. The throw to the plate will be late. The Mariners are going to play for the American League Championship. I don't believe it. It just continues. My, oh, my.
 The first words I spoke in the original phonograph, a little piece of practical poetry. Mary had a little lamb, its feet were quite as slow, and everywhere that Mary went, the lamb was sure to go.
 And the 0-1 pitch on the way to Edgar Martinez. Swung on the line. Now the left field line for a base hit. Here comes Joy. Here is Junior to third base. They're going to wave him in. The throw to the plate will be late. The Mariners are going to play for the American League Championship. I don't believe it. It just continues. My, oh, my.
**PASSED**

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>

gemini-code-assist

Code Review

This pull request adds CPU support for the Whisper model by enabling non-causal attention in the CPU backend. The changes are logical and include adding necessary tests. However, I've identified a critical issue where the causal flag is not correctly handled for non-causal attention types, which could lead to incorrect model outputs. Please see the detailed comment for more information.

vllm/v1/attention/backends/cpu_attn.py

chatgpt-codex-connector · 2025-12-04T13:52:37Z

💡 Codex Review

https://github.com/vllm-project/vllm/blob/734c32c8757999c5ff3eca602befe19cfc102ce3/vllm/v1/attention/backends/cpu_attn.py#L182-L186
Encoder-decoder CPU attention uses causal mask

The CPU backend now advertises support for AttentionType.ENCODER_DECODER, but scheduler metadata is still built with causal taken directly from CommonAttentionMetadata (initialized as True for all KV cache groups in vllm/v1/worker/gpu/attn_utils.py:167-179). For encoder–decoder cross-attention this flag should be False; leaving it True forces cpu_attention_with_kv_cache to treat cross-attention as causal (sliding_window_right gets clamped to zero and queries start at seq_len - q_len), so decoder tokens will only attend to a truncated slice of the encoder memory when running encoder–decoder models on CPU (e.g., Whisper), producing incorrect outputs.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>

aditew01 · 2025-12-04T15:20:00Z

cc: @fadara01 @cfRod @bigPYJ1151

Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>

aditew01 · 2025-12-04T16:25:26Z

this should close this issue: #29861

bigPYJ1151

Thanks for your work! Some nits to resolve.

tests/models/multimodal/generation/test_whisper.py

vllm/v1/attention/backends/cpu_attn.py

vllm/v1/worker/utils.py

fadara01 · 2025-12-05T10:18:59Z

vllm/v1/attention/backends/cpu_attn.py

        block_table_tensor = common_attn_metadata.block_table_tensor
        slot_mapping = common_attn_metadata.slot_mapping
-        causal = common_attn_metadata.causal
+        if self.is_cross_attention:


you can run vLLM's precommit hook locally as follows:

python -m pip install -U pre-commit python -m pre_commit run --all-files

fadara01 · 2025-12-05T10:20:00Z

tests/models/multimodal/generation/test_whisper.py

+@pytest.mark.parametrize("model", ["openai/whisper-large-v3"])
+@pytest.mark.parametrize("dtype", ["bfloat16", "half"])
+@pytest.mark.cpu_model
+def test_whisper_cpu(vllm_runner, model, dtype):


can we add this test to https://github.com/vllm-project/vllm/blob/main/.buildkite/scripts/hardware_ci/run-cpu-test-arm.sh

fadara01

Great work!
just a few nits

Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>

fadara01 · 2025-12-05T11:54:43Z

Arm CI running here: https://buildkite.com/vllm/ci/builds/42114#019aee5b-2446-48f6-b203-bbd9d2c52a29

Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>

fadara01 · 2025-12-05T16:01:28Z

triggered a new set of CI builds, third time is the charm 🤞

aditew01 · 2025-12-05T16:24:52Z

LGTM: https://buildkite.com/vllm/ci/builds/42136/steps/canvas?sid=019aef3c-53e5-4d81-8224-00c40b1b7b24

tests/models/multimodal/generation/test_whisper.py

aditew01 · 2025-12-10T08:52:29Z

@bigPYJ1151 i've rebased to latest - can you please enable the necessary jobs for the merge?

aditew01 · 2025-12-10T12:13:59Z

@bigPYJ1151 the failure for multi-modal-processor-test-cpu seems unrelated?

[2025-12-10T11:12:51Z] ERROR: Could not find a version that satisfies the requirement decord (from mantis-vl) (from versions: none)
--
[2025-12-10T11:12:51Z] ERROR: No matching distribution found for decord
[2025-12-10T11:12:52Z] 🚨 Error: The command exited with status 1
 ```

bigPYJ1151 · 2025-12-10T12:21:32Z

@aditew01 Yes, I suspect this is due to the ARM CPU nightly images are pushed to the x86 repo as I saw:

 WARNING: The requested image's platform (linux/arm64/v8) does not match the detected host platform (linux/amd64/v4) and no specific platform was requested

There is a fix vllm-project/ci-infra#243

aditew01 · 2025-12-10T12:23:18Z

Is that a blocker for merge?

aditew01 · 2025-12-10T12:26:49Z

Brill! thanks!

bigPYJ1151 · 2025-12-10T12:27:28Z

I have asked for force merge.

Signed-off-by: Aditya Tewari <aditya.tewari@arm.com> Signed-off-by: Ubuntu <mjtaheri68@gmail.com>

Signed-off-by: Aditya Tewari <aditya.tewari@arm.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

aditew01 requested review from DarkLight1337, LucasWilkinson, bigPYJ1151 and ywang96 as code owners December 4, 2025 13:46

aditew01 force-pushed the adi/whisper branch from f549bae to 734c32c Compare December 4, 2025 13:47

mergify bot added multi-modality Related to multi-modality (#4194) v1 labels Dec 4, 2025

enable encoder-decoder support - whisper should work with these changes

52a4f36

Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>

aditew01 force-pushed the adi/whisper branch from 734c32c to 52a4f36 Compare December 4, 2025 13:49

gemini-code-assist bot reviewed Dec 4, 2025

View reviewed changes

vllm/v1/attention/backends/cpu_attn.py Show resolved Hide resolved

Address review comments

9165cbf

Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>

aditew01 force-pushed the adi/whisper branch from 9165cbf to 31d37c6 Compare December 4, 2025 15:19

aditew01 added 2 commits December 4, 2025 16:04

enable encoder-decoder support - whisper should work with these changes

cd1ecfb

Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>

Address review comments

72f95ed

Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>

aditew01 force-pushed the adi/whisper branch from 31d37c6 to 72f95ed Compare December 4, 2025 16:04

bigPYJ1151 reviewed Dec 5, 2025

View reviewed changes

tests/models/multimodal/generation/test_whisper.py Outdated Show resolved Hide resolved

vllm/v1/attention/backends/cpu_attn.py Outdated Show resolved Hide resolved

vllm/v1/worker/utils.py Show resolved Hide resolved

fadara01 reviewed Dec 5, 2025

View reviewed changes

fadara01 suggested changes Dec 5, 2025

View reviewed changes

aditew01 added 3 commits December 5, 2025 11:42

address review comments

f037ce0

Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>

Merge remote-tracking branch 'adi/adi/whisper' into adi/whisper

c60b802

Merge branch 'main' into adi/whisper

e57ce4b

mergify bot added the ci/build label Dec 5, 2025

fadara01 approved these changes Dec 5, 2025

View reviewed changes

missed update

eed969a

Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>

aditew01 force-pushed the adi/whisper branch from 8da786b to eed969a Compare December 5, 2025 12:14

aditew01 requested review from bigPYJ1151 and fadara01 December 5, 2025 12:20

fadara01 approved these changes Dec 5, 2025

View reviewed changes

fix whisper test

ec88ad6

Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>

LucasWilkinson assigned bigPYJ1151 Dec 5, 2025

bigPYJ1151 reviewed Dec 9, 2025

View reviewed changes

tests/models/multimodal/generation/test_whisper.py Show resolved Hide resolved

aditew01 requested a review from bigPYJ1151 December 9, 2025 12:19

bigPYJ1151 approved these changes Dec 10, 2025

View reviewed changes

bigPYJ1151 enabled auto-merge (squash) December 10, 2025 07:30

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 10, 2025

Merge branch 'main' into adi/whisper

56546db

vllm-bot merged commit cebda2a into vllm-project:main Dec 10, 2025
55 of 57 checks passed

fadara01 mentioned this pull request Dec 11, 2025

[Feature]: [CPU Backend] Enable support for Whisper #29861

Closed

1 task

bbrowning mentioned this pull request Dec 11, 2025

[Bugfix] Multiple fixes for gpt-oss Chat Completion prompting #28729

Merged

Majid-Taheri pushed a commit to Majid-Taheri/vllm that referenced this pull request Dec 23, 2025

[CPU] Support for Whisper (vllm-project#30062)

e63cf09

Signed-off-by: Aditya Tewari <aditya.tewari@arm.com> Signed-off-by: Ubuntu <mjtaheri68@gmail.com>

dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026

[CPU] Support for Whisper (vllm-project#30062)

3f82861

Signed-off-by: Aditya Tewari <aditya.tewari@arm.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

Uh oh!

Conversation

aditew01 commented Dec 4, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot commented Dec 4, 2025

💡 Codex Review

Uh oh!

aditew01 commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aditew01 commented Dec 4, 2025

Uh oh!

bigPYJ1151 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fadara01 Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

fadara01 Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

aditew01 Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

fadara01 left a comment

Choose a reason for hiding this comment

Uh oh!

fadara01 commented Dec 5, 2025

Uh oh!

fadara01 commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aditew01 commented Dec 5, 2025

Uh oh!

Uh oh!

aditew01 commented Dec 10, 2025

Uh oh!

aditew01 commented Dec 10, 2025

Uh oh!

bigPYJ1151 commented Dec 10, 2025

Uh oh!

aditew01 commented Dec 10, 2025

Uh oh!

aditew01 commented Dec 10, 2025

Uh oh!

bigPYJ1151 commented Dec 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

aditew01 commented Dec 4, 2025 •

edited by github-actions bot

Loading

aditew01 commented Dec 4, 2025 •

edited

Loading

fadara01 commented Dec 5, 2025 •

edited

Loading