[WIP] Support Multimodal Input for Qwen2.5 Omni by tzhouam · Pull Request #57 · vllm-project/vllm-omni

tzhouam · 2025-11-11T07:24:40Z

Purpose

This PR is to support the multimodal input for Qwen2.5 Omni (Qwen3 Omni should be similar), where only support one request with audio and should be done later.

Test Plan

Get into the example folder

cd examples/offline_inference/qwen_2_5_omni

Modify in the cmd below as your path of vllm_omni. Then run.

export PYTHONPATH=<YOUR OWN PYTHON PATH>:$PYTHONPATH
python end2end.py --model Qwen/Qwen2.5-Omni-7B \
                                 --voice-type "m02" \
                                 --dit-ckpt none \
                                 --bigvgan-ckpt none \
                                 --output-wav output_audio \
                                 --prompt_type audio-in-video-v2 \
                                 --init-sleep-seconds 0 \
                                 --audio-in-video-source  https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2.5-Omni/draw_small.mp4

Test Result

Request ID: 0, Text saved to outputs/00000.txt
Request ID: 0, Saved audio to outputs/output_0.wav
[rank0]:[W1111 07:21:50.971399308 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank0]:[W1111 07:21:50.976901248 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank0]:[W1111 07:21:50.209218036 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/hsliuustc0106/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

…Added detailed logging for prompt loading failures, file writing issues, and stage shutdown processes. Introduced a new logging utility for orchestrator metrics and streamlined stats handling in the PipelinedOmniLLM class.

…ded support for loading prompts from a .pt file, introduced new command-line arguments for initialization and output handling, and improved error handling for prompt loading. Removed deprecated files related to previous implementations.

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

…r loading prompts from a .pt file. Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

…pport for loading prompts from a .pt file." This reverts commit c18fc40.

…e and support for loading prompts from a .pt file."" This reverts commit 5f3f53a.

…houam/vllm-omni into feat/multi-request-stream-new

…fcc2 Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

…houam/vllm-omni into feat/multi-request-stream-new

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

…ultimodal_Input

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

…pdating stage configuration to remove unnecessary process flags. This simplifies the YAML configuration for the Qwen2.5 Omni model. Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Gaohan123 · 2025-11-24T09:08:32Z

Closed as its extended version PR #76 has been merged.

tzhouam added 25 commits November 7, 2025 15:48

Port changes from feat/multi-request-stream (vs main)

6a7a278

debug ar model runner

3c2ad1b

remove text output for audio output

08aef1f

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

add back the sleep time for the stage init to avoid single card conflict

36ada04

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

add prompt file

0ef1fef

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Update run script to include initialization sleep time and support fo…

519c140

…r loading prompts from a .pt file. Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Update run script to include initialization sleep time and support fo…

c18fc40

…r loading prompts from a .pt file. Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Revert "Update run script to include initialization sleep time and su…

5f3f53a

…pport for loading prompts from a .pt file." This reverts commit c18fc40.

Revert "Revert "Update run script to include initialization sleep tim…

da38527

…e and support for loading prompts from a .pt file."" This reverts commit 5f3f53a.

Merge branch 'feat/multi-request-stream-new' of https://github.com/tz…

edefcc2

…houam/vllm-omni into feat/multi-request-stream-new

Squash 7 commits: 36ada04 0ef1fef 519c140 c18fc40 5f3f53a da38527 ede…

c83d223

…fcc2 Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Merge branch 'feat/multi-request-stream-new' of https://github.com/tz…

5052bfa

…houam/vllm-omni into feat/multi-request-stream-new

modify code based on review suggestions

edae58b

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

add data download instructions

b943abb

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

rename util

12cc2cd

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

rename util further rename imports

bc69d8e

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Debug: allow Qwen for Audio Inputs

95f232a

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Make Example Easier To Use

c12bd00

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Debug and Update doc

26fcae5

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Merge branch 'feat/multi-request-stream-new' into feat/Qwen2_5_Omni_M…

acbae81

…ultimodal_Input

Support input videos

31d63b0

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Debug

cdd9919

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Enhance video input handling by adding audio support in prompts and u…

ee577d4

…pdating stage configuration to remove unnecessary process flags. This simplifies the YAML configuration for the Qwen2.5 Omni model. Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Gaohan123 mentioned this pull request Nov 20, 2025

[Feature] support multimodal inputs with multiple requests #76

Merged

5 tasks

Gaohan123 closed this Nov 24, 2025

tzhouam deleted the feat/Qwen2_5_Omni_Multimodal_Input branch November 30, 2025 04:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Support Multimodal Input for Qwen2.5 Omni#57

[WIP] Support Multimodal Input for Qwen2.5 Omni#57
tzhouam wants to merge 25 commits intovllm-project:mainfrom
tzhouam:feat/Qwen2_5_Omni_Multimodal_Input

tzhouam commented Nov 11, 2025 •

edited

Loading

Uh oh!

Gaohan123 commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tzhouam commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

Gaohan123 commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tzhouam commented Nov 11, 2025 •

edited

Loading