[Feature] Support online inference by Gaohan123 · Pull Request #64 · vllm-project/vllm-omni

Gaohan123 · 2025-11-14T02:50:33Z

Purpose

This PR supports online serving for vllm-omni, which resolves #38 and part of #33 . It has been firstly verified on Qwen2.5-omni.

Test Plan

Here I both tests offline and online inference. First please follow README.md in root repo to finish installation.

Online inference

vllm serve Qwen/Qwen2.5-Omni-7B --omni --port 8091

Follow README.md in examples/online_serving

Offline inference

Follow README.md in examples/offline_inference/qwen2_5_omni

Test Result

Online inference

For request from python:

python openai_chat_completion_client_for_multimodal_generation.py
Chat completion output from text: Well, it usually has input modules for data, processing units like neural networks or algorithms, output for generated audio, and scalability through parallel computing or distributed systems.If you want to know more about any part of this, feel free to ask.
Audio saved to audio_0.wav

For request from curl:

bash run_curl_multimodal_generation.sh
Output of request: "Well, it usually has input modules for data, processing units like neural networks or algorithms, output for generated audio, and scalability through parallel computing or distributed systems.If you want to know more about any part of this, feel free to ask."

Offline inference

For single request:

Request ID: 0, Text saved to outputs/00000.txt
Request ID: 0, Saved audio to outputs/output_0.wav

For multiple requests:

Request ID: 0, Text saved to outputs/00000.txt
Request ID: 1, Text saved to outputs/00001.txt
Request ID: 2, Text saved to outputs/00002.txt
...
Request ID: 0, Saved audio to outputs/output_0.wav
Request ID: 1, Saved audio to outputs/output_1.wav
Request ID: 2, Saved audio to outputs/output_2.wav
...

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/hsliuustc0106/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: Gaohan123 <gaohan19@huawei.com>

tzhouam · 2025-11-17T06:43:52Z

looks good to me

Signed-off-by: Gaohan123 <gaohan19@huawei.com>

Gaohan123 force-pushed the online_serve branch from 2c6a26e to 5955658 Compare November 15, 2025 13:31

Gaohan123 changed the title ~~[WIP] Support online inference~~ Support online inference Nov 15, 2025

Gaohan123 changed the title ~~Support online inference~~ [Feature] Support online inference Nov 15, 2025

Gaohan123 force-pushed the online_serve branch 2 times, most recently from 81196c1 to f886492 Compare November 16, 2025 07:47

Gaohan123 requested review from hsliuustc0106 and tzhouam and removed request for hsliuustc0106 and tzhouam November 16, 2025 07:49

Gaohan123 added 7 commits November 16, 2025 17:54

draft support online serving

c547a8c

Signed-off-by: Gaohan123 <gaohan19@huawei.com>

run whole online serving pipeline successfully

9304557

Signed-off-by: Gaohan123 <gaohan19@huawei.com>

Remove vllm_omni/entrypoints/cli/run_server.sh

d29ea0d

Signed-off-by: Gaohan123 <gaohan19@huawei.com>

fix a typo

a3c236f

Signed-off-by: Gaohan123 <gaohan19@huawei.com>

support input custom sampling_param_list

513437f

Signed-off-by: Gaohan123 <gaohan19@huawei.com>

fix path problem

bb9b450

Signed-off-by: Gaohan123 <gaohan19@huawei.com>

support input custom stage config file

77a72bb

Signed-off-by: Gaohan123 <gaohan19@huawei.com>

Gaohan123 force-pushed the online_serve branch from 4fff8cf to 77a72bb Compare November 16, 2025 09:59

Gaohan123 added 2 commits November 16, 2025 18:35

pause git actions for document

d8e7149

Signed-off-by: Gaohan123 <gaohan19@huawei.com>

fix git action failed problem

9e6b4e6

Signed-off-by: Gaohan123 <gaohan19@huawei.com>

tzhouam merged commit c5a8930 into vllm-project:main Nov 17, 2025
1 check passed

SamitHuang mentioned this pull request Nov 17, 2025

[Feature] Add Gradio Demo for Qwen2.5Omni #60

Merged

8 tasks

hsliuustc0106 mentioned this pull request Nov 18, 2025

[Roadmap]: getting ready to v0.11.0rc1 release #33

Closed

18 tasks

Gaohan123 deleted the online_serve branch November 18, 2025 13:12

princepride pushed a commit to princepride/vllm-omni that referenced this pull request Jan 10, 2026

[Feature] Support online inference (vllm-project#64)

3c2fa91

Signed-off-by: Gaohan123 <gaohan19@huawei.com>

linyueqian mentioned this pull request Apr 10, 2026

[Config Refactor][2/N] Pipeline + Deploy Config Schema #2383

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support online inference#64

[Feature] Support online inference#64
tzhouam merged 9 commits intovllm-project:mainfrom
Gaohan123:online_serve

Gaohan123 commented Nov 14, 2025 •

edited

Loading

Uh oh!

tzhouam commented Nov 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Gaohan123 commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Online inference

Offline inference

Test Result

Online inference

Offline inference

Uh oh!

tzhouam commented Nov 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Gaohan123 commented Nov 14, 2025 •

edited

Loading