Skip to content

[Feature] Support online inference#64

Merged
tzhouam merged 9 commits intovllm-project:mainfrom
Gaohan123:online_serve
Nov 17, 2025
Merged

[Feature] Support online inference#64
tzhouam merged 9 commits intovllm-project:mainfrom
Gaohan123:online_serve

Conversation

@Gaohan123
Copy link
Copy Markdown
Collaborator

@Gaohan123 Gaohan123 commented Nov 14, 2025

Purpose

This PR supports online serving for vllm-omni, which resolves #38 and part of #33 . It has been firstly verified on Qwen2.5-omni.

Test Plan

Here I both tests offline and online inference. First please follow README.md in root repo to finish installation.

Online inference

vllm serve Qwen/Qwen2.5-Omni-7B --omni --port 8091

Follow README.md in examples/online_serving

Offline inference

Follow README.md in examples/offline_inference/qwen2_5_omni

Test Result

Online inference

For request from python:

python openai_chat_completion_client_for_multimodal_generation.py
Chat completion output from text: Well, it usually has input modules for data, processing units like neural networks or algorithms, output for generated audio, and scalability through parallel computing or distributed systems.If you want to know more about any part of this, feel free to ask.
Audio saved to audio_0.wav

For request from curl:

bash run_curl_multimodal_generation.sh
Output of request: "Well, it usually has input modules for data, processing units like neural networks or algorithms, output for generated audio, and scalability through parallel computing or distributed systems.If you want to know more about any part of this, feel free to ask."

Offline inference

For single request:

Request ID: 0, Text saved to outputs/00000.txt
Request ID: 0, Saved audio to outputs/output_0.wav

For multiple requests:

Request ID: 0, Text saved to outputs/00000.txt
Request ID: 1, Text saved to outputs/00001.txt
Request ID: 2, Text saved to outputs/00002.txt
...
Request ID: 0, Saved audio to outputs/output_0.wav
Request ID: 1, Saved audio to outputs/output_1.wav
Request ID: 2, Saved audio to outputs/output_2.wav
...

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/hsliuustc0106/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@Gaohan123 Gaohan123 changed the title [WIP] Support online inference Support online inference Nov 15, 2025
@Gaohan123 Gaohan123 changed the title Support online inference [Feature] Support online inference Nov 15, 2025
@Gaohan123 Gaohan123 force-pushed the online_serve branch 2 times, most recently from 81196c1 to f886492 Compare November 16, 2025 07:47
@Gaohan123 Gaohan123 requested review from hsliuustc0106 and tzhouam and removed request for hsliuustc0106 and tzhouam November 16, 2025 07:49
Signed-off-by: Gaohan123 <gaohan19@huawei.com>
Signed-off-by: Gaohan123 <gaohan19@huawei.com>
Signed-off-by: Gaohan123 <gaohan19@huawei.com>
Signed-off-by: Gaohan123 <gaohan19@huawei.com>
Signed-off-by: Gaohan123 <gaohan19@huawei.com>
Signed-off-by: Gaohan123 <gaohan19@huawei.com>
Signed-off-by: Gaohan123 <gaohan19@huawei.com>
Signed-off-by: Gaohan123 <gaohan19@huawei.com>
Signed-off-by: Gaohan123 <gaohan19@huawei.com>
@tzhouam
Copy link
Copy Markdown
Collaborator

tzhouam commented Nov 17, 2025

looks good to me

@tzhouam tzhouam merged commit c5a8930 into vllm-project:main Nov 17, 2025
1 check passed
@Gaohan123 Gaohan123 deleted the online_serve branch November 18, 2025 13:12
princepride pushed a commit to princepride/vllm-omni that referenced this pull request Jan 10, 2026
Signed-off-by: Gaohan123 <gaohan19@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Support online inference

2 participants