Skip to content

[WIP] Support Multimodal Input for Qwen2.5 Omni#57

Closed
tzhouam wants to merge 25 commits intovllm-project:mainfrom
tzhouam:feat/Qwen2_5_Omni_Multimodal_Input
Closed

[WIP] Support Multimodal Input for Qwen2.5 Omni#57
tzhouam wants to merge 25 commits intovllm-project:mainfrom
tzhouam:feat/Qwen2_5_Omni_Multimodal_Input

Conversation

@tzhouam
Copy link
Copy Markdown
Collaborator

@tzhouam tzhouam commented Nov 11, 2025

Purpose

This PR is to support the multimodal input for Qwen2.5 Omni (Qwen3 Omni should be similar), where only support one request with audio and should be done later.

Test Plan

Get into the example folder

cd examples/offline_inference/qwen_2_5_omni

Modify in the cmd below as your path of vllm_omni. Then run.

export PYTHONPATH=<YOUR OWN PYTHON PATH>:$PYTHONPATH
python end2end.py --model Qwen/Qwen2.5-Omni-7B \
                                 --voice-type "m02" \
                                 --dit-ckpt none \
                                 --bigvgan-ckpt none \
                                 --output-wav output_audio \
                                 --prompt_type audio-in-video-v2 \
                                 --init-sleep-seconds 0 \
                                 --audio-in-video-source  https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2.5-Omni/draw_small.mp4

Test Result

Request ID: 0, Text saved to outputs/00000.txt
Request ID: 0, Saved audio to outputs/output_0.wav
[rank0]:[W1111 07:21:50.971399308 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank0]:[W1111 07:21:50.976901248 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank0]:[W1111 07:21:50.209218036 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/hsliuustc0106/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

…Added detailed logging for prompt loading failures, file writing issues, and stage shutdown processes. Introduced a new logging utility for orchestrator metrics and streamlined stats handling in the PipelinedOmniLLM class.
…ded support for loading prompts from a .pt file, introduced new command-line arguments for initialization and output handling, and improved error handling for prompt loading. Removed deprecated files related to previous implementations.
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
…r loading prompts from a .pt file.

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
…r loading prompts from a .pt file.

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
…pport for loading prompts from a .pt file."

This reverts commit c18fc40.
…e and support for loading prompts from a .pt file.""

This reverts commit 5f3f53a.
…fcc2

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
…pdating stage configuration to remove unnecessary process flags. This simplifies the YAML configuration for the Qwen2.5 Omni model.

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
@Gaohan123
Copy link
Copy Markdown
Collaborator

Closed as its extended version PR #76 has been merged.

@Gaohan123 Gaohan123 closed this Nov 24, 2025
@tzhouam tzhouam deleted the feat/Qwen2_5_Omni_Multimodal_Input branch November 30, 2025 04:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants