[CPU] Support for Whisper#30062
Conversation
Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>
There was a problem hiding this comment.
Code Review
This pull request adds CPU support for the Whisper model by enabling non-causal attention in the CPU backend. The changes are logical and include adding necessary tests. However, I've identified a critical issue where the causal flag is not correctly handled for non-causal attention types, which could lead to incorrect model outputs. Please see the detailed comment for more information.
💡 Codex Reviewhttps://github.com/vllm-project/vllm/blob/734c32c8757999c5ff3eca602befe19cfc102ce3/vllm/v1/attention/backends/cpu_attn.py#L182-L186 The CPU backend now advertises support for ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>
Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>
Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>
|
this should close this issue: #29861 |
bigPYJ1151
left a comment
There was a problem hiding this comment.
Thanks for your work! Some nits to resolve.
| block_table_tensor = common_attn_metadata.block_table_tensor | ||
| slot_mapping = common_attn_metadata.slot_mapping | ||
| causal = common_attn_metadata.causal | ||
| if self.is_cross_attention: |
There was a problem hiding this comment.
you can run vLLM's precommit hook locally as follows:
python -m pip install -U pre-commit
python -m pre_commit run --all-files
| @pytest.mark.parametrize("model", ["openai/whisper-large-v3"]) | ||
| @pytest.mark.parametrize("dtype", ["bfloat16", "half"]) | ||
| @pytest.mark.cpu_model | ||
| def test_whisper_cpu(vllm_runner, model, dtype): |
There was a problem hiding this comment.
fadara01
left a comment
There was a problem hiding this comment.
Great work!
just a few nits
Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>
|
Arm CI running here: https://buildkite.com/vllm/ci/builds/42114#019aee5b-2446-48f6-b203-bbd9d2c52a29 |
Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>
Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>
|
triggered a new set of CI builds, third time is the charm 🤞 |
|
@bigPYJ1151 i've rebased to latest - can you please enable the necessary jobs for the merge? |
|
@bigPYJ1151 the failure for |
|
@aditew01 Yes, I suspect this is due to the ARM CPU nightly images are pushed to the x86 repo as I saw: There is a fix vllm-project/ci-infra#243 |
|
Is that a blocker for merge? |
|
Brill! thanks! |
|
I have asked for force merge. |
Signed-off-by: Aditya Tewari <aditya.tewari@arm.com> Signed-off-by: Ubuntu <mjtaheri68@gmail.com>
Signed-off-by: Aditya Tewari <aditya.tewari@arm.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
Purpose
Enable support for Whisper for CPU backend
Test Plan
Added tests to
tests/models/multimodal/generation/test_whisper.py-> should be enabled in run-cpu-test.shpython examples/offline_inference/audio_language.py -m whisperTest Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.