Skip to content

[CPU] Support for Whisper#30062

Merged
vllm-bot merged 10 commits intovllm-project:mainfrom
aditew01:adi/whisper
Dec 10, 2025
Merged

[CPU] Support for Whisper#30062
vllm-bot merged 10 commits intovllm-project:mainfrom
aditew01:adi/whisper

Conversation

@aditew01
Copy link
Contributor

@aditew01 aditew01 commented Dec 4, 2025

Purpose

Enable support for Whisper for CPU backend

Test Plan

  • Added tests to tests/models/multimodal/generation/test_whisper.py -> should be enabled in run-cpu-test.sh

  • python examples/offline_inference/audio_language.py -m whisper

  • Test Result

python examples/offline_inference/audio_language.py -m whisper
...

INFO 12-04 13:43:46 [llm.py:343] Supported tasks: ['transcription']
Adding requests: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:04<00:00,  4.37s/it]
Processed prompts: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:15<00:00, 15.88s/it, est. speed input: 0.06 toks/s, output: 2.90 toks/s]
 The first words I spoke in the original phonograph, a little piece of practical poetry. Mary had a little lamb, its streets were quite as slow, and everywhere that Mary went the lamb was sure to go.
VLLM_CPU_KVCACHE_SPACE=32 pytest -x -v -s  tests/models/multimodal/generation/test_whisper.py -m cpu_model

 The first words I spoke in the original phonograph, a little piece of practical poetry. Mary had a little lamb, its feet were quite as slow, and everywhere that Mary went, the lamb was sure to go.
 And the 0-1 pitch on the way to Edgar Martinez. Swung on the line. Now the left field line for a base hit. Here comes Joy. Here is Junior to third base. They're going to wave him in. The throw to the plate will be late. The Mariners are going to play for the American League Championship. I don't believe it. It just continues. My, oh, my.
 The first words I spoke in the original phonograph, a little piece of practical poetry. Mary had a little lamb, its feet were quite as slow, and everywhere that Mary went, the lamb was sure to go.
 And the 0-1 pitch on the way to Edgar Martinez. Swung on the line. Now the left field line for a base hit. Here comes Joy. Here is Junior to third base. They're going to wave him in. The throw to the plate will be late. The Mariners are going to play for the American League Championship. I don't believe it. It just continues. My, oh, my.
 The first words I spoke in the original phonograph, a little piece of practical poetry. Mary had a little lamb, its feet were quite as slow, and everywhere that Mary went, the lamb was sure to go.
 And the 0-1 pitch on the way to Edgar Martinez. Swung on the line. Now the left field line for a base hit. Here comes Joy. Here is Junior to third base. They're going to wave him in. The throw to the plate will be late. The Mariners are going to play for the American League Championship. I don't believe it. It just continues. My, oh, my.
 The first words I spoke in the original phonograph, a little piece of practical poetry. Mary had a little lamb, its feet were quite as slow, and everywhere that Mary went, the lamb was sure to go.
 And the 0-1 pitch on the way to Edgar Martinez. Swung on the line. Now the left field line for a base hit. Here comes Joy. Here is Junior to third base. They're going to wave him in. The throw to the plate will be late. The Mariners are going to play for the American League Championship. I don't believe it. It just continues. My, oh, my.
**PASSED**



Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds CPU support for the Whisper model by enabling non-causal attention in the CPU backend. The changes are logical and include adding necessary tests. However, I've identified a critical issue where the causal flag is not correctly handled for non-causal attention types, which could lead to incorrect model outputs. Please see the detailed comment for more information.

@chatgpt-codex-connector
Copy link

💡 Codex Review

https://github.com/vllm-project/vllm/blob/734c32c8757999c5ff3eca602befe19cfc102ce3/vllm/v1/attention/backends/cpu_attn.py#L182-L186
P1 Badge Encoder-decoder CPU attention uses causal mask

The CPU backend now advertises support for AttentionType.ENCODER_DECODER, but scheduler metadata is still built with causal taken directly from CommonAttentionMetadata (initialized as True for all KV cache groups in vllm/v1/worker/gpu/attn_utils.py:167-179). For encoder–decoder cross-attention this flag should be False; leaving it True forces cpu_attention_with_kv_cache to treat cross-attention as causal (sliding_window_right gets clamped to zero and queries start at seq_len - q_len), so decoder tokens will only attend to a truncated slice of the encoder memory when running encoder–decoder models on CPU (e.g., Whisper), producing incorrect outputs.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>
@aditew01
Copy link
Contributor Author

aditew01 commented Dec 4, 2025

cc: @fadara01 @cfRod @bigPYJ1151

Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>
Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>
@aditew01
Copy link
Contributor Author

aditew01 commented Dec 4, 2025

this should close this issue: #29861

Copy link
Member

@bigPYJ1151 bigPYJ1151 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work! Some nits to resolve.

block_table_tensor = common_attn_metadata.block_table_tensor
slot_mapping = common_attn_metadata.slot_mapping
causal = common_attn_metadata.causal
if self.is_cross_attention:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can run vLLM's precommit hook locally as follows:

python -m pip install -U pre-commit
python -m pre_commit run --all-files

@pytest.mark.parametrize("model", ["openai/whisper-large-v3"])
@pytest.mark.parametrize("dtype", ["bfloat16", "half"])
@pytest.mark.cpu_model
def test_whisper_cpu(vllm_runner, model, dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@fadara01 fadara01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!
just a few nits

@mergify mergify bot added the ci/build label Dec 5, 2025
@fadara01
Copy link
Contributor

fadara01 commented Dec 5, 2025

Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>
Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>
@fadara01
Copy link
Contributor

fadara01 commented Dec 5, 2025

triggered a new set of CI builds, third time is the charm 🤞

@aditew01
Copy link
Contributor Author

aditew01 commented Dec 5, 2025

LGTM: https://buildkite.com/vllm/ci/builds/42136/steps/canvas?sid=019aef3c-53e5-4d81-8224-00c40b1b7b24

@aditew01 aditew01 requested a review from bigPYJ1151 December 9, 2025 12:19
@bigPYJ1151 bigPYJ1151 enabled auto-merge (squash) December 10, 2025 07:30
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 10, 2025
@aditew01
Copy link
Contributor Author

@bigPYJ1151 i've rebased to latest - can you please enable the necessary jobs for the merge?

@aditew01
Copy link
Contributor Author

@bigPYJ1151 the failure for multi-modal-processor-test-cpu seems unrelated?

[2025-12-10T11:12:51Z] ERROR: Could not find a version that satisfies the requirement decord (from mantis-vl) (from versions: none)
--
[2025-12-10T11:12:51Z] ERROR: No matching distribution found for decord
[2025-12-10T11:12:52Z] 🚨 Error: The command exited with status 1
 ```

@bigPYJ1151
Copy link
Member

@aditew01 Yes, I suspect this is due to the ARM CPU nightly images are pushed to the x86 repo as I saw:

 WARNING: The requested image's platform (linux/arm64/v8) does not match the detected host platform (linux/amd64/v4) and no specific platform was requested

There is a fix vllm-project/ci-infra#243

@aditew01
Copy link
Contributor Author

Is that a blocker for merge?

@aditew01
Copy link
Contributor Author

Brill! thanks!

@bigPYJ1151
Copy link
Member

I have asked for force merge.

@vllm-bot vllm-bot merged commit cebda2a into vllm-project:main Dec 10, 2025
55 of 57 checks passed
Majid-Taheri pushed a commit to Majid-Taheri/vllm that referenced this pull request Dec 23, 2025
Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>
Signed-off-by: Ubuntu <mjtaheri68@gmail.com>
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build multi-modality Related to multi-modality (#4194) ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants