[Multimodal][Qwen3 Omni] Make Qwen3 Omni work with audio-in-video inputs in V1 engine. #27721
[Multimodal][Qwen3 Omni] Make Qwen3 Omni work with audio-in-video inputs in V1 engine. #27721ywang96 merged 20 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
|
Documentation preview: https://vllm--27721.org.readthedocs.build/en/27721/ |
|
This pull request has merge conflicts that must be resolved before it can be |
vllm/attention/layer.py
Outdated
| return _Backend.TORCH_SDPA, None | ||
|
|
||
| elif current_platform.is_cuda(): | ||
| return _Backend.TORCH_SDPA, None |
There was a problem hiding this comment.
2 questions:
- What is the reason of using Torch_SDPA ?
- if we return immediately should the if section be removed ?
There was a problem hiding this comment.
This will be removed when I clean up this PR. I was having issue with flash-attn on my devgpu so this is a local hack to get qwen3 running.
| assert "audio" in mm_item_counts | ||
| mm_item_counts["audio"] -= mm_item_counts["video"] | ||
| super()._validate_mm_placeholders(mm_placeholders, mm_item_counts) | ||
| # def _validate_mm_placeholders( |
There was a problem hiding this comment.
it feels the intent was good to have the validation of the placeholders , should we just have one doing the verification we need ?
| filtered_updates, | ||
| ) | ||
| # Derive audio placeholders from video placeholders | ||
| mm_placeholders = self._derive_audio_from_video_placeholders( |
There was a problem hiding this comment.
nice, that can be very useful
There was a problem hiding this comment.
Hope you folks can see this: when I try https://github.com/QwenLM/Qwen3-Omni/blob/main/web_demo.py with this fix, and use vllm as backend, and put system prompt as "what does this man say?"
python web_demo.py -c ../Qwen3-Omni-30B-A3B-Instruct --server-name localhost
I still cannot get correct result.
However, if I switch to transformers as backend:
python web_demo.py -c ../Qwen3-Omni-30B-A3B-Instruct --server-name localhost --use-transformers --flash-attn2
I get reasonable output.
So it seems there is still more work to do.
There was a problem hiding this comment.
💡 Codex Review
vllm/vllm/model_executor/models/qwen2_5_omni_thinker.py
Lines 395 to 413 in 9e6f9f0
Qwen2_5OmniThinkerMultiModalProcessor._maybe_apply_prompt_updates still invokes _validate_mm_placeholders(..., use_audio_in_video=use_audio_in_video) even though this class’s override of _validate_mm_placeholders was deleted in this change. The inherited implementation only accepts two positional parameters, so this call will now raise TypeError: _validate_mm_placeholders() got an unexpected keyword argument 'use_audio_in_video' whenever multimodal inputs are processed, crashing Qwen2.5 Omni inference before any work is done.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
will this PR automatically support online inference? |
Signed-off-by: Roger Wang <hey@rogerw.io>
|
@huachenheli Thanks for working on this - I'll review this PR and push it to the finishing line. |
|
Curious if we could land a version for V1 @ywang96 ; thanks in advance ! |
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
…uts in V1 engine. (vllm-project#27721) Signed-off-by: Chenheli Hua <huachenheli@outlook.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io>
…uts in V1 engine. (vllm-project#27721) Signed-off-by: Chenheli Hua <huachenheli@outlook.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io>
|
thank u for doing it! But can it support qwen2.5 omni? |
Purpose
FIX #22268
FIX #22364
CLOSE #23888
CLOSE #25473
CLOSE #28046
Test Plan
Sanity checked with https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-Omni/demo/draw.mp4
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.