[Doc][Bagel] Add BAGEL-7B-MoT documentation and edit the default stage configuration#987
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 144ecab1d8
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
54f6501 to
fd93af3
Compare
There was a problem hiding this comment.
Pull request overview
This pull request adds comprehensive documentation, example scripts, and configuration files for the BAGEL-7B-MoT multimodal model in vLLM-Omni. The PR addresses issue #936 by providing complete deployment guides for both online serving and offline inference modes.
Changes:
- Added single-GPU configuration file (
bagel_single_gpu.yaml) and updated dual-GPU config memory utilization - Created example Python scripts for text-to-image and image-to-text online serving
- Added comprehensive README documentation for both online serving and offline inference examples
- Added user guide documentation and shell scripts for various inference modes
Reviewed changes
Copilot reviewed 14 out of 16 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
vllm_omni/model_executor/stage_configs/bagel_single_gpu.yaml |
New single-GPU configuration with reduced memory utilization (0.40/0.50) |
vllm_omni/model_executor/stage_configs/bagel.yaml |
Increased GPU memory utilization from 0.4 to 0.8 for dual-GPU setup |
examples/online_serving/bagel/t2i.py |
Text-to-image example using OpenAI SDK |
examples/online_serving/bagel/i2t.py |
Image-to-text example with hardcoded path issue |
examples/online_serving/bagel/README.md |
Comprehensive online serving documentation |
examples/offline_inference/bagel/README.md |
Detailed offline inference guide with setup instructions |
examples/offline_inference/bagel/run_t2i.sh |
Shell script for text-to-image inference |
examples/offline_inference/bagel/run_t2t.sh |
Shell script for text-to-text inference |
examples/offline_inference/bagel/run_i2t.sh |
Shell script for image-to-text inference |
examples/offline_inference/bagel/run_t2t_multiple_prompt.sh |
Batch text-to-text inference script |
examples/offline_inference/bagel/text_prompts_10.txt |
Sample text prompts file |
examples/online_serving/bagel/cat.jpg |
Sample image for examples |
docs/user_guide/examples/online_serving/bagel.md |
User guide for online serving |
docs/user_guide/examples/offline_inference/bagel.md |
User guide for offline inference |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
e332929 to
ab16373
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 11 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
b779ce2 to
a019b81
Compare
|
PTAL❤️ @princepride |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - If you encounter warnings about flash_attn, try to install lower version like 2.8.1 with the command below. | ||
|
|
||
| ``` | ||
| uv pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.1/flash_attn-2.8.1+cu12torch2.9cxx11abiTRUE-cp312-cp312-linux_x86_64.whl |
There was a problem hiding this comment.
Bagel directly use vLLM flash-attn, I think we don't need install extra flash-attn
72e2539 to
49fe0df
Compare
|
I have deleted the unnecessary content. PTAL again. Thank you very much!❤️ |
princepride
left a comment
There was a problem hiding this comment.
A little need change.
|
Thank you for your advice.❤️ However, both qwen2.5_omni and qwen3_omni are written this way. @princepride |
Okay |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 14 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
49fe0df to
801d59a
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The prompt was split into multiple requests because it was not enclosed in quotes. Signed-off-by: Ding Zuhao <e1583181@u.nus.edu> Signed-off-by: jzz <e1583181@u.nus.edu>
…al online serving and offline inference. Signed-off-by: jzz <e1583181@u.nus.edu>
- Add offline_inference and online_serving README files for BAGEL model - Add docs for both offline and online serving examples - Create i2t.py and t2i.py example scripts using OpenAI SDK - Fix broken links with local Windows paths - Fix typos and grammar issues (Staged->Stages, add articles) - Add language identifiers to code blocks (bash, python) - Fix inline comments that would break shell commands Signed-off-by: jzz <e1583181@u.nus.edu>
Increased GPU memory utilization from 0.4 to 0.8 for model stages. Signed-off-by: Ding Zuhao <e1583181@u.nus.edu> Signed-off-by: jzz <e1583181@u.nus.edu>
Signed-off-by: jzz <e1583181@u.nus.edu>
Signed-off-by: jzz <e1583181@u.nus.edu>
Signed-off-by: jzz <e1583181@u.nus.edu>
Signed-off-by: jzz <e1583181@u.nus.edu>
Signed-off-by: jzz <e1583181@u.nus.edu>
Signed-off-by: jzz <e1583181@u.nus.edu>
Signed-off-by: jzz <e1583181@u.nus.edu>
Signed-off-by: jzz <e1583181@u.nus.edu>
Signed-off-by: jzz <e1583181@u.nus.edu>
f9db4eb to
6e88ea9
Compare
Head branch was pushed to by a user without write access
b68531f to
cfd81dc
Compare
Signed-off-by: jzz <e1583181@u.nus.edu>
9de79f8 to
086bad5
Compare
|
Could you please merge them for me again? I have passed the CI. Thank you very much! @hsliuustc0106 ❤️ |
|
Special thanks to my co-author @princepride (wangzhipeng628@gmail.com) for the significant contribution to this work.❤️ |
…e configuration (vllm-project#987) Signed-off-by: Ding Zuhao <e1583181@u.nus.edu> Signed-off-by: jzz <e1583181@u.nus.edu>
* [Frontend][Model] Support batch request with refined OmniDiffusionReq… (#797) Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com> * [Model]: add FLUX.1-dev model (#853) * [BugFix] ignore mm data from stages to async omni (#954) Signed-off-by: dengyunyang <584797741@qq.com> * Revert "[BugFix] ignore mm data from stages to async omni" (#1023) * [Bugfix] Modify output to model_runner_output (#1026) Signed-off-by: gcanlin <canlinguosdu@gmail.com> * [Feature] Support cache-dit for Wan 2.2 inference (#1021) Signed-off-by: samithuang <285365963@qq.com> Signed-off-by: Samit <285365963@qq.com> * [Doc]Format profiling doc (#993) Signed-off-by: lishunyang <lishunyang12@163.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * [Hardware] Support platforms and plugin system (#774) Signed-off-by: gcanlin <canlinguosdu@gmail.com> * [Core]: KV Cache Transfer Encapsulation (#979) Signed-off-by: princepride <wangzhipeng628@gmail.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com> * [Test]Delete skip mark for amd ci test and fix CI failure (#927) Signed-off-by: wangyu31577 <wangyu31577@hundsun.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: wangyu31577 <wangyu31577@hundsun.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Bugfix][Doc]Specify Qwen3-TTS model name for each task type (#1036) Signed-off-by: Kyle Huang <yellowsea@gmail.com> * [Misc] pin version of fa3-fwd (#1051) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> * [CI] [ROCm] Add more AMD CI tests (#1039) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> * [Bugfix] fix qwen image layerd in dummy run (#1027) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> * [BugFix] Fix noisy output without setting a seed in Qwen Image (#1043) Signed-off-by: natureofnature <wzliu@connect.hku.hk> * [bugfix] remove vllm speech route (#1060) Signed-off-by: linyueqian <linyueqian@outlook.com> * [Debug] Update GLM-Image Pipeline (#1049) Co-authored-by: root <root@hk01dgx028.cm.cluster> * [Diffusion][Bugfix] Fix the flash_attn backends selection logic (#983) Signed-off-by: mxuax <mxuax@connect.ust.hk> Signed-off-by: XU Mingshi <91017482+mxuax@users.noreply.github.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [BugFix] Fix the accuracy issue of multimodal input. (#1020) Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com> Co-authored-by: Rein Yang <ruiruyang2@gmail.com> * [Bugfix] Set VaeImageProcessor `do_convert_rgb` True (#1032) Signed-off-by: gcanlin <canlinguosdu@gmail.com> * [feat]: adapt batch request for flux (#1028) Signed-off-by: wuzhongjian wuzhongjian_yewu@cmss.chinamobile.com * [CI] Change Qwen3 Omni stage placement strategy (#1072) Signed-off-by: ZeldaHuang <hzm414167@alibaba-inc.com> * [BugFix] Fix to use correct attn backend (#1038) Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com> * [Perf] Qwen3 Omni talker mtp optimization (#1005) Signed-off-by: ZeldaHuang <hzm414167@alibaba-inc.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Wan2.2] Optimize memory usage with conditional transformer loading (#980) Signed-off-by: Lin, Fanli <fanli.lin@intel.com> Signed-off-by: Samit <285365963@qq.com> Co-authored-by: Samit <285365963@qq.com> * [Feat] Support XPU Backend in vLLM-Omni (#191) Signed-off-by: Fanli Lin <fanli.lin@intel.com> Signed-off-by: Fanli Lin <fanli0116@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * [Fix] stabilize diffusion images LoRA E2E across CI drift (#1075) Signed-off-by: dongbo910220 <1275604947@qq.com> * [Bugfix][Test] Re-enable the log simple tests (#1065) Signed-off-by: gcanlin <canlinguosdu@gmail.com> * [Bugfix] pr conflict fix, bugfix ignore mm data from stages to async omni (#1025) Signed-off-by: dengyunyang <584797741@qq.com> * [Doc][Bagel] Add BAGEL-7B-MoT documentation and edit the default stage configuration (#987) Signed-off-by: Ding Zuhao <e1583181@u.nus.edu> Signed-off-by: jzz <e1583181@u.nus.edu> * [Fix] Increase max wait time for server readiness to accommodate model loading (#1089) Signed-off-by: Andy Zhou <46011930+AndyZhou952@users.noreply.github.com> * [Benchmark] Add vLLM-Omni Omni model online benchmark (#780) Signed-off-by: wangyu31577 <wangyu31577@hundsun.com> Signed-off-by: wangyu <53896905+yenuo26@users.noreply.github.com> Co-authored-by: wangyu31577 <wangyu31577@hundsun.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Bugfix] Remove Mooncake/Yuanrong connector import warning (#1091) Signed-off-by: natureofnature <wzliu@connect.hku.hk> * fix: UnboundLocalError for role in streaming audio/image responses (#784) Signed-off-by: Pierre Le Guen <26087574+PierreLeGuen@users.noreply.github.com> * [Misc] update wechat image (#1096) * [Feature] Support DiT Layerwise (Blockwise) CPU Offloading (#858) Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * [BugFix] Modify max_tokens and modify the log and fix #1103 (#1097) Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [BugFix] Fix modulate_index shape error in Qwen-Image-Edit Task (#1100) Signed-off-by: mxuax <mxuax@connect.ust.hk> Signed-off-by: XU Mingshi <91017482+mxuax@users.noreply.github.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Platform] Add supports_torch_inductor interface (#1108) Signed-off-by: gcanlin <canlinguosdu@gmail.com> * [BugFix] Fix Qwen3 Omni talker mtp torch.compile startup error (#1104) Signed-off-by: ram16g <anlianfengjie@163.com> Signed-off-by: ZeldaHuang <hzm414167@alibaba-inc.com> Co-authored-by: ram16g <anlianfengjie@163.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Bugfix] fix request_id of image generation in api server (#1112) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Perf]: CFG parallel abstraction (#851) Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [BugFix] Fix Qwen3 TTS 0.6B profile run hang (#995) (#1082) * [CI] [ROCm] Quick fix amd ci (#1116) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> * [Bugfix] fix benchmark audio timing error and add benchmark test (#1109) Signed-off-by: wangyu31577 <wangyu31577@hundsun.com> Co-authored-by: wangyu31577 <wangyu31577@hundsun.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Bugfix][Qwen3TTS] Load speaker_id/voices from model configuration (#1079) Signed-off-by: pablo <juanz9312@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com> * [NPU] Align with GPUModelRunner (#1114) Signed-off-by: gcanlin <canlinguosdu@gmail.com> * [FEATURE] /v1/images/edit interface (#1101) Signed-off-by: dengyunyang <584797741@qq.com> * [Bugfix] Fix NPU SDPA attention mask shape and semantics (#1031) Signed-off-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: muziyuhui666 <111362884+muziyuhui666@users.noreply.github.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [TeaCache]: Add Coefficient Estimation (#940) Signed-off-by: princepride <wangzhipeng628@gmail.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [CI]: Bagel E2E Smoked Test (#1074) Signed-off-by: princepride <wangzhipeng628@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Misc] Bump version to 0.14.0 (#1128) Signed-off-by: Roger Wang <hey@rogerw.io> * [Doc] First stable release of vLLM-Omni (#1129) Signed-off-by: Roger Wang <hey@rogerw.io> * [Misc] Align error handling with upstream vLLM v0.14.0 (#1122) Signed-off-by: anna <lee.anna@navercorp.com> Co-authored-by: anna <lee.anna@navercorp.com> * [Feature] add Tensor Parallelism to LongCat-Image(-Edit) (#926) Signed-off-by: Rustam Khadipash <16683750+hadipash@users.noreply.github.com> * [CI] Temporarily remove slow tests. (#1143) Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com> Signed-off-by: princepride <wangzhipeng628@gmail.com> Co-authored-by: princepride <wangzhipeng628@gmail.com> * [CI] Refactor test_sequence_parallel.py and add a warmup run for more accurate performance stat (#1165) Signed-off-by: mxuax <mxuax@connect.ust.hk> Signed-off-by: XU Mingshi <91017482+mxuax@users.noreply.github.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * Dev/rebase v0.15.0 (#1159) Signed-off-by: Taichang Zhou <tzhouam@connect.ust.hk> Signed-off-by: tzhouam <tzhouam@connect.ust.hk> Signed-off-by: princepride <wangzhipeng628@gmail.com> Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com> * Docs update paper link (#1169) Signed-off-by: hsliu <liuhongsheng4@huawei.com> Signed-off-by: hsliu_ustc <hsliu_ustc@noreply.gitcode.com> Co-authored-by: hsliu_ustc <hsliu_ustc@noreply.gitcode.com> * [Debug] Clear Dockerfile.ci to accelerate build image (#1172) Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * [Debug] Correct Unreasonable Long Timeout (#1175) Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * [Doc]Fix - Align with repo. (#1176) Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com> * [Bugfix][Qwen-Image-Edit] Add a warning log for none negative_prompt (#1170) Signed-off-by: gcanlin <canlinguosdu@gmail.com> * [Bugfix] fix qwen image oom (#1168) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> * [Hardware] Disable compile of diffusion on XPU (#1148) Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> * [Doc] Fix vLLM version in user docs (#1179) Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com> * [Refactor] Refactor async chunk and fix the shape mismatch issue (#1151) Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com> * bugfix: /images/edits endpoint fails pipeline data format check (#1141) Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Perf] resolving prolonged `cudastreamsynchronize` execution in z image processing (#1105) Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * [Bugfix] modify RTF use audio_e2e/audio_duration (#1157) Signed-off-by: wangyu31577 <wangyu31577@hundsun.com> Co-authored-by: wangyu31577 <wangyu31577@hundsun.com> * [Doc] Highlight paper & slides. (#1186) Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com> * [chore] Remove zmq context initialize (#1187) Signed-off-by: xiedeyantu <czjourney@163.com> * [NPU] Update Dockerfile and docs for v0.14.0 (#671) Signed-off-by: gcanlin <canlinguosdu@gmail.com> * [Bugfix] E2E metric incorrect qwen3-omni with async chunk feature (#1018) Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com> Signed-off-by: Junhong Liu <ljh_lbj@163.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Doc] opt doc (#1118) Signed-off-by: David Chen <530634352@qq.com> * [Bugfix] Fix tp+sp accuracy, incorrect process group mapping (#1178) Signed-off-by: David Chen <530634352@qq.com> * [Feature] Enable use_audio_in_video for Qwen 3 Omni Online (#1198) Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * [Bugfix] async_chunk rebase v0.15.0 (#1195) Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com> * [feature]: support flux cache_dit (#1145) Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com> * [CI] Add CI branch coverage calculation, fix statement coverage results and add log before test for buildkite log group (#1120) Signed-off-by: wangyu31577 <wangyu31577@hundsun.com> Co-authored-by: wangyu31577 <wangyu31577@hundsun.com> * [Wan 2.2][Diffusion] Add TP Support (#964) Signed-off-by: weichen <calvin_zhu0210@outlook.com> * [Hardware] [Feat] Setup platform dependent package installation (#1046) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: PopSoda2002 <zhouhp.me@gmail.com> Co-authored-by: gcanlin <canlinguosdu@gmail.com> * [XPU] Fix XPU UTs for basic coverage (#1164) Signed-off-by: Yan Ma <yan.ma@intel.com> * [Test] Add BuildKite test-full script for full CI. (#867) Signed-off-by: wangyu31577 <wangyu31577@hundsun.com> Co-authored-by: wangyu31577 <wangyu31577@hundsun.com> * [Refactor] Reuse upstream Qwen3MoeSparseMoeBlock (#1202) Signed-off-by: gcanlin <canlinguosdu@gmail.com> * [Bugfix] Fix wan2.2 ti2v (#1221) Signed-off-by: mxuax <mxuax@connect.ust.hk> Signed-off-by: XU Mingshi <91017482+mxuax@users.noreply.github.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Bugfix] Fix '--max-generated-image-size' cli args type (#1249) Signed-off-by: ApsarasX <apsarax@outlook.com> * [Bugfix] Ensure seed=0 is correctly handled in image edit (#1248) Signed-off-by: ApsarasX <apsarax@outlook.com> * [Docs] Add example image download step to Image-To-Video examples (#1258) Signed-off-by: lishunyang <lishunyang12@163.com> * [Bugfix] Fix padding bug in 12Hz tokenizer ConvTranspose1d decode (#1241) Signed-off-by: linyueqian <linyueqian@outlook.com> * [bugfix] Fix multimodal_output property to check completion outputs where audio data is attached (#1203) Signed-off-by: linyueqian <linyueqian@outlook.com> * [Doc] Update QA relevant to quantization (#1257) Signed-off-by: lishunyang <lishunyang12@163.com> * [Bugfix] Fix Doc link Rrror (#1263) Signed-off-by: lishunyang <lishunyang12@163.com> * Process-Scoped GPU Memory Accounting (#1204) Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com> * [ComfyUI]: ComfyUI integration (#1113) Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com> * fix: add diffusion offload args to OmniConfig group instead of serve_parser (#1271) Signed-off-by: Chenguang ZHENG <645327136@qq.com> * [Doc] Adding models/pipelines/features Tutorial (#1196) Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: dongbo910220 <32610838+dongbo910220@users.noreply.github.com> * [CI] Add env variable check for nightly CI (#1281) Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com> * [CI] Add pytest markers to current tests and update the doc. (#577) Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Diffusion][Perf] Remove Redundant Communication Cost by Refining SP Hook Design (#1275) Signed-off-by: mxuax <mxuax@connect.ust.hk> Signed-off-by: XU Mingshi <91017482+mxuax@users.noreply.github.com> * [Feature] Opt metrics structure (#891) Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com> Signed-off-by: Junhong Liu <ljh_lbj@163.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Test] Add example test cases for omni online (#1086) Signed-off-by: wangyu31577 <wangyu31577@hundsun.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Signed-off-by: yenuo26 <410167048@qq.com> Co-authored-by: wangyu31577 <wangyu31577@hundsun.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * [CI] Reduce the time for Diffusion Sequence Parallelism Test (#1283) Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com> * [Model] SupportHunyuanImage3 Diffusion Model in vllm-omni (#1085) Signed-off-by: Semmer2 <semmer@live.cn> * [Chore] Update copyright year. (#1256) Signed-off-by: lishunyang <lishunyang12@163.com> * [feature]: support Flux.1-dev CFG-Parallel (#1269) * [Bugfix] Fix 'NoneType' AttributeError in stable-diffusion model detect (#1254) Signed-off-by: Yan Ma <yan.ma@intel.com> * [Doc] Update Qwen3-TTS docs for consistency with Omni examples (#1226) Signed-off-by: linyueqian <linyueqian@outlook.com> Signed-off-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Fix]Ensure HuggingFace downloads complete before initialization. (#1213) Signed-off-by: zhou zhuoxin <zhouzhuoxin1508@outlook.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * [BugFix] Fixed the issue where ignore_eos was not working. (#1286) Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com> * [Test] Add e2e tests for Qwen3-TTS speech endpoint (#1206) Signed-off-by: linyueqian <linyueqian@outlook.com> Signed-off-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com> * [Feat]: support VAE patch parallelism (#756) Signed-off-by: dongbo910220 <1275604947@qq.com> Co-authored-by: hsliuustc0106 <liuhongsheng4@huawei.com> * [CI] Disable Qwen3-TTS E2E Test in pipeline.yml (#1306) Signed-off-by: Gao Han <hgaoaf@connect.ust.hk> * [Misc] Add per-request generator_device to online image gen and edit (#1183) Signed-off-by: gcanlin <canlinguosdu@gmail.com> * [Bagel]: Support TP (#1293) Signed-off-by: princepride <wangzhipeng628@gmail.com> * [Bugfix] Fix image edit RoPE crash when explicit height/width are provided (#1265) Signed-off-by: lishunyang <lishunyang12@163.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Doc] Sync (#1216) Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com> * [Bugfix] fix precision issues of qwen3-omni when enable async_chunk without system prompt (#1288) Signed-off-by: Rein Yang <ruiruyang2@gmail.com> * [Debug] Add trigger to concurrent stage init (#1274) Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * [Bugfix][Qwen3-TTS] Fix task type (#1317) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> * Unifying CLI Argument Naming Style (#1309) Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com> * [Bugfix][Qwen3-TTS] Preserve original model ID in omni_snapshot_download (#1318) * [CI] Run nightly tests. (#1333) Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com> * [Feature]: FP8 Quantization Support for DiT (#1034) Signed-off-by: lishunyang <lishunyang12@163.com> Signed-off-by: SYLAR <125541396+lishunyang12@users.noreply.github.com> * Fix yield token metrics and opt metrics record stats (#1292) * [Test] L2 & L3 Test Case Stratification Design for Omni Model (#1272) Signed-off-by: wangyu31577 <wangyu31577@hundsun.com> Signed-off-by: yenuo26 <410167048@qq.com> Signed-off-by: wangyu <53896905+yenuo26@users.noreply.github.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: wangyu31577 <wangyu31577@hundsun.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Pref] Support Qwen3 Omni code2wav batch infernce with async chunk (#1246) Signed-off-by: ZeldaHuang <hzm414167@alibaba-inc.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Signed-off-by: Ziming Huang <1520787127@qq.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * update qwen3-omni & qwen2.5-onmi openai client (#1304) Signed-off-by: Rein Yang <ruiruyang2@gmail.com> * [Feature] Support Wan2.2 T2V and I2V Online Serving with OpenAI /v1/videos API (#1073) Signed-off-by: samithuang <285365963@qq.com> Signed-off-by: Samit <285365963@qq.com> Signed-off-by: SamitHuang <285365963@qq.com> Co-authored-by: Flora Feng <4florafeng@gmail.com> * [Feature] add Tensor Parallelism to SD_3.5 (#1336) Signed-off-by: GG-li <3226868735@qq.com> * [Feature]async scheduling to overlap chunk IO and compute (#951) Signed-off-by: CHEN <116010019@link.cuhk.edu.cn> Co-authored-by: Bhanu068 <voutharoja.bhanu06@gmail.com> Co-authored-by: Gao Han <gaohan19@huawei.com> * [Bugfix] reused metrics to modify the API Server token statistics in Stream Response (#1301) Signed-off-by: John Liu BUAA <liukecheng97@gmail.com> * Refactor CPU Offloading Backend Pattern (#1223) Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com> Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com> Signed-off-by: Samit <285365963@qq.com> Co-authored-by: Samit <285365963@qq.com> * [DOC] Doc for CI test - Details about five level stucture and some other files. (#1167) Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com> Co-authored-by: yenuo26 <410167048@qq.com> * [Bugfix] remove Tongyi-MAI/Z-Image-Turbo related test from L2 ci (#1348) Signed-off-by: dengyunyang <584797741@qq.com> * [Misc] wechat image update (#1354) Signed-off-by: David Chen <530634352@qq.com> * [Misc] Support WorkerWrapperBase and CustomPipeline for Diffusion Worker (#764) Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com> * [Feature][Bugfix] Add CFG feature to Bagel (#1310) Signed-off-by: Ding Zuhao <e1583181@u.nus.edu> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com> Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com> * [Feature]: Diffusion sleep to use process level memory calculation (#1276) Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com> Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com> Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com> * change qwen3-omni open cudagraph by default (#1352) Signed-off-by: Rein Yang <ruiruyang2@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [XPU] Update Bagel's flash_attn_varlen_func to fa utils (#1295) Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> * [Test] Add Omni Model Performance Benchmark Test (#1321) Signed-off-by: yenuo26 <410167048@qq.com> Signed-off-by: wangyu <53896905+yenuo26@users.noreply.github.com> * [BugFix]: Revert utils change (#1369) Signed-off-by: princepride <wangzhipeng628@gmail.com> * [Rebase] Rebase to vllm v0.16.0 (#1357) Signed-off-by: Taichang Zhou <tzhouam@connect.ust.hk> Signed-off-by: tzhouam <tzhouam@connect.ust.hk> Signed-off-by: princepride <wangzhipeng628@gmail.com> Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Isotr0py <Isotr0py@outlook.com> Co-authored-by: ZJY0516 <zhu.jiangyun@foxmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com> * [Test] Fix expansion and example test case for qwen3-omni (#1358) Signed-off-by: yenuo26 <410167048@qq.com> * [v0.16.0][BUG FIX]Fix hunyuan MOE after update to 0.16.0 (#1401) Signed-off-by: Chendi Xue <chendi.xue@intel.com> * [0.16.0] remove cuda hard-code for Hunyuan Image3 (#1402) Signed-off-by: Chendi Xue <chendi.xue@intel.com> * [XPU] Add XPU Dockerfile and related docs (#1162) Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Daniel Huang <daniel1.huang@intel.com> Co-authored-by: Daniel Huang <daniel1.huang@intel.com> * [Bugfix] Fix Hardcoded Datatypes in Z-image (#1393) Signed-off-by: Alex Brooks <albrooks@redhat.com> * [Feature] : Support disaggregated inference pipeline for Qwen3_TTS (#1161) Signed-off-by: Sy03 <1370724210@qq.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Feature] Add automated PR reviewer bot with GLM integration (#1424) Signed-off-by: hsliu <liuhongsheng4@huawei.com> Signed-off-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * [Misc] Add Qwen2.5-Omni-3B model support to Gradio demo (#1382) Signed-off-by: UsamaKenway <usamakenway@gmail.com> * [misc] Feature/pr reviewer auto trigger&update model (#1431) Signed-off-by: hsliu <liuhongsheng4@huawei.com> Signed-off-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Hunter Liu <hunter@liu.sh> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * Revert "[misc] Feature/pr reviewer auto trigger&update model" (#1432) * [Doc] Update GPU installation commands (#1434) * [ROCM] [CI] fix dockerfile.rocm to support nightly build and also fix amd ci v0.16.0rc1 (#1380) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> * [Feature][BAGEL] Combine multi-branch cfg into a single batch to accelerate inference. (#1429) Signed-off-by: Ding Zuhao <e1583181@u.nus.edu> Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com> * [Feat]: add ASCII art logo for vLLM-Omni (#1430) * [Bug] [Bagel] Fix kv transfer bug (#1437) Signed-off-by: Ding Zuhao <e1583181@u.nus.edu> Co-authored-by: Wang Zhipeng: princepride <wangzhipeng628@gmail.com> * [CI] Set L2 & L3 tests running conditions. (#1344) Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com> * [Feature] vLLM-Omni RDMA connector (#1019) Signed-off-by: natureofnature <wzliu@connect.hku.hk> * [Minor][Refactor] Pass seq_token_counts explicitly (#1425) Signed-off-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Misc] Extend Diffusion Benchmark script to other backends (#875) Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Feature] Support Stage Based Deployment CLI (#939) Signed-off-by: wuhang <wuhang6@huawei.com> Signed-off-by: princepride <wangzhipeng628@gmail.com> Signed-off-by: wuhang <whlbx@hotmail.com> Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * [Doc] Optimize vLLM-Omni metrics documentation (#1311) Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com> Signed-off-by: Junhong Liu <ljh_lbj@163.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Bugfix] Forward all vllm-omni serve command parameters to model (#985) Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com> Signed-off-by: Junhong Liu <ljh_lbj@163.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Doc]: Add bagel single/multi node usage with mooncake document (#1450) * [Qwen3TTS][Feat] Code2Wav batched decoding (#1426) Signed-off-by: pablo <pablo@agigo.ai> Co-authored-by: pablo <pablo@agigo.ai> * [CI] Remove overwhelming debug log (#1463) Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * [Misc] update wechat image (#1464) Signed-off-by: David Chen <530634352@qq.com> * [Doc] Refine Diffusion Tutorial Documents (#1305) Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com> * [Bugfix] Robust Audio Data Handling in _create_audio_choice (#1222) Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com> * [Bugfix]: Fix merging updated additional information to ensure dict type (#1296) Signed-off-by: Shijin Zhang <75300765+Dovis01@users.noreply.github.com> * [Model]Add new nextstep_1(Diffusion) model(only T2I) (#612) Signed-off-by: Dong Wang <dongw2019@gmail.com> Signed-off-by: sniper35 <dongw2019@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Bugfix] Add TTS configuration options (#1177) Signed-off-by: Yanick Schraner <yanick.schraner@bs.ch> * [Debug] Multi-Request for Qwen 3 Omni use_audio_in_video (#1433) Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * [Bugfix] Fix case-sensitive task_type matching in Qwen3TTSModelForGeneration (#1455) Signed-off-by: Sangchun Ha <seomk9896@gmail.com> * [BugFix] process request.num_cached_tokens if it equals to the initial value (#1468) Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com> Co-authored-by: Gao Han <gaohan19@huawei.com> * [Bugfix] Fix SDPA attention mask dtype and shape (Fix #857) (#1349) Signed-off-by: jader <yjader@foxmail.com> * [Test] Reduce Perf test case and fix modify stage config (#1449) Signed-off-by: yenuo26 <410167048@qq.com> * [NPU] Upgrade to v0.16.0 (#1375) Signed-off-by: gcanlin <canlinguosdu@gmail.com> * [CI] Update Dockerfile for vllm-omni CI image and remove obsolete dep… (#1491) Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * [Fix][Chore] Qwen3-TTS Modeling Minor Code Sanity Improvements (#1482) Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com> * [Bugfix] Fix tuple/list KV cache extraction crash (#1405) Signed-off-by: junuxyz <216036880+junuxyz@users.noreply.github.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Doc] format lora related docs for the user's end (#1009) Signed-off-by: AndyZhou952 <jzhoubc@connect.ust.hk> Signed-off-by: Andy Zhou <46011930+AndyZhou952@users.noreply.github.com> * [Feature] Support Wan2.2 output with irregular shapes (#1279) Signed-off-by: gcanlin <canlinguosdu@gmail.com> * [Misc] Migrate L1 tests to use pytest-mock (#1315) Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com> Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com> * [Bugfix] Fix LoRA Scaling on Active Adapters (#1421) Signed-off-by: Alex Brooks <albrooks@redhat.com> * [Bugfix] fix record audio generated frame in offline infer (#1312) Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com> Signed-off-by: Junhong Liu <ljh_lbj@163.com> * [Model] Support OmniGen2 (#513) Signed-off-by: Yupu <feng.yu.pu0330@gmail.com> * [Bugfix][Qwen3TTS] (#1289) Signed-off-by: pablo <juanz9312@gmail.com> Co-authored-by: Gao Han <gaohan19@huawei.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * Use pull through cache image for H100 pool (#1518) Signed-off-by: Kevin H. Luu <khluu000@gmail.com> * [ROCm] [CI] [Docker] Point to use the latest vLLM v0.16.0 stable version (#1500) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> * [Bugfix] fix offline text_to_image error from #1009 (#1515) Signed-off-by: David Chen <530634352@qq.com> * [XPU] Enable FLASH_ATTN on XPU (#1332) Signed-off-by: Yan Ma <yan.ma@intel.com> * Revert gpu_1 job to use regular image (#1521) Signed-off-by: Kevin H. Luu <khluu000@gmail.com> * [Chore] remove unused logger in omni_diffusion (#531) (#1509) Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com> Co-authored-by: Gao Han <gaohan19@huawei.com> * [Qwen3TTS][Feat] Streaming output (#1438) Signed-off-by: pablo <pablo@agigo.ai> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: pablo <pablo@agigo.ai> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Bugfix] Race condition in MultiprocExecutor when concurent access to Scheduler (#1448) Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Doc][Test][Misc] ComfyUI test, more screenshot, and code cleaning (#1435) Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com> Signed-off-by: Samit <285365963@qq.com> Co-authored-by: Samit <285365963@qq.com> * [Performance]Qwen3-Omni performance optimization (#1378) Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com> * [Feature] Support HSDP for diffusion models (#1339) Signed-off-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [CI] fixed CI timeout (#1460) Signed-off-by: zhumingjue <zhumingjue@huawei.com> Signed-off-by: zhumingjue138 <zhumingjue@huawei.com> * [Bugfix] Use uds for zmq address if not set --stage-id (#1522) Signed-off-by: wuhang <wuhang6@huawei.com> * [BugFix] Restore talker's config (#1524) Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Canlin Guo <961750412@qq.com> * [XPU] fix qwen_omni after rebase to v0.16.0 (#1416) Signed-off-by: Chendi Xue <chendi.xue@intel.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Platform] Enable layerwise offload on all hardware (#1492) Signed-off-by: gcanlin <canlinguosdu@gmail.com> * diffusion: enable VAE patch parallel for SD3.5 (#1428) Signed-off-by: dongbo910220 <1275604947@qq.com> * [Perf] GLM Image (#920) Signed-off-by: JaredforReal <w13431838023@gmail.com> Signed-off-by: Jared Wen <w13431838023@gmail.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [skip ci][Doc] add design docs for async chunk in qwen3-omni (#962) Signed-off-by: Rein Yang <ruiruyang2@gmail.com> * feat(qwen3-tts): Add CUDA Graph support for speech tokenizer decoder (#1205) Signed-off-by: xulusjb <fdukeshik@gmail.com> Co-authored-by: xulusjb <fdukeshik@gmail.com> * [New Model]: XiaomiMiMo/MiMo-Audio-7B-Instruct support (#750) Signed-off-by: wangyu31577 <wangyu31577@hundsun.com> Signed-off-by: 齐保元 <qibaoyuan@xiaomi.com> Signed-off-by: hsliu <liuhongsheng4@huawei.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Signed-off-by: GG-li <3226868735@qq.com> Signed-off-by: Sihao Li <111170255+GG-li@users.noreply.github.com> Signed-off-by: XU Mingshi <91017482+mxuax@users.noreply.github.com> Signed-off-by: mxuax <mxuax@connect.ust.hk> Signed-off-by: Baoyuan Qi <qibaoyuan@126.com> Signed-off-by: gcanlin <canlinguosdu@gmail.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com> Signed-off-by: dongbo910220 <1275604947@qq.com> Signed-off-by: dongbo910220 <32610838+dongbo910220@users.noreply.github.com> Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com> Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Signed-off-by: baoyuan qi <qibaoyuan@126.com> Signed-off-by: tzhouam <tzhouam@connect.ust.hk> Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: Prajwal A <prajwalanagani@gmail.com> Signed-off-by: Shijin Zhang <75300765+Dovis01@users.noreply.github.com> Signed-off-by: 丁宁 <nndding@gmail.com> Signed-off-by: SHIJIN ZHANG <75300765+Dovis01@users.noreply.github.com> Signed-off-by: dingning<dingning7@xiaomi.com> Signed-off-by: dingning <dingning7@xiaomi.com> Signed-off-by: dingning <dingning@xiaomi.com> Co-authored-by: wangyu <53896905+yenuo26@users.noreply.github.com> Co-authored-by: wangyu31577 <wangyu31577@hundsun.com> Co-authored-by: Zhang Shijin <zhangshijin@xiaomi.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Sihao Li <111170255+GG-li@users.noreply.github.com> Co-authored-by: XU Mingshi <91017482+mxuax@users.noreply.github.com> Co-authored-by: Canlin Guo <canlinguosdu@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: JohnJan <wuzhongjian_yewu@cmss.chinamobile.com> Co-authored-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com> Co-authored-by: dongbo910220 <32610838+dongbo910220@users.noreply.github.com> Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com> Co-authored-by: Junhong Liu <ljh_lbj@163.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com> Co-authored-by: shijin zhang <zsj1364226740@gmail.com> Co-authored-by: Zhou Taichang <tzhouam@connect.ust.hk> Co-authored-by: root <root@hk01dgx028.cm.cluster> Co-authored-by: Prajwal A <34590600+LawJarp-A@users.noreply.github.com> Co-authored-by: Shijin Zhang <75300765+Dovis01@users.noreply.github.com> Co-authored-by: dingning <dingning7@xiaomi.com> Co-authored-by: ning ding <nndding@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * [Feature]: Native GGUF Quantization Support for DiT (#1285) Signed-off-by: David Chen <530634352@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * Add benchmark for `v1/audio/speech` non-streaming (#1408) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Version] Auto generate version using `setuptool_scm` (#1224) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> * [Feat] : Support Async chunk cleanup (#1087) Signed-off-by: Sy03 <1370724210@qq.com> * [Profiler] Support online profiling (#1136) Signed-off-by: gcanlin <canlinguosdu@gmail.com> Signed-off-by: Canlin Guo <961750412@qq.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com> * [Bugfix] Fix redundant finished req status updating on OmniGenerationScheduler (#1510) Signed-off-by: shijin zhang <75300765+Dovis01@users.noreply.github.com> Co-authored-by: 齐保元 <qibaoyuan@xiaomi.com> * [XPU][NPU][ROCM] enable cpu_offloading flag for non_cuda (#1488) Signed-off-by: gcanlin <canlinguosdu@gmail.com> Signed-off-by: Chendi Xue <chendi.xue@intel.com> Co-authored-by: gcanlin <canlinguosdu@gmail.com> * [Chore] Cleanup dead code in GGUF DiT code path (#1533) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> * [Doc] Update installation instructions for vllm 0.16.0 (#1505) Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * [Doc] [skip ci]Sync. (#1363) Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com> Co-authored-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com> * [CI][skip ci]Update H100 image link based on #1518 (#1538) Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com> * Fix no embed text spk tokens (#1540) Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com> * [Debug] Merge vllm pull 35368 (#1534) Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * [Docs] update async chunk docs diagram [skip ci] (#1530) Signed-off-by: Rein Yang <ruiruyang2@gmail.com> * fix(qwen3-tts): fix Base ICL voice clone producing corrupted audio (#1554) Signed-off-by: linyueqian <linyueqian@outlook.com> * [NPU][Bugfix] Align GPU side and recover qwen3-tts (#1564) Signed-off-by: gcanlin <canlinguosdu@gmail.com> * [BugFix] Fix unexpected crash when init OmniDiffusion (#1562) Signed-off-by: Semmer2 <semmer@live.cn> * [CI] Modify some CI test cases to run on L4 environment to reduce H100 resource usage. (#1543) Signed-off-by: yenuo26 <410167048@qq.com> Signed-off-by: wangyu <53896905+yenuo26@users.noreply.github.com> * [BugFix]: fix a lot of bug (#1565) Signed-off-by: princepride <wangzhipeng628@gmail.com> * feat: add HyperCLOVAX-SEED-Omni-8B support Model files: - vllm_omni/diffusion/models/hyperclovax_vision/: vision decoder pipeline (HyperCLOVAXVisionPipeline) using flow matching diffusion + VisionTransformer - vllm_omni/diffusion/models/hyperclovax_audio/: audio decoder pipeline (HyperCLOVAXAudioPipeline) using Unit-BigVGAN codec - vllm_omni/model_executor/stage_input_processors/hyperclovax_seed_omni.py: thinker2vision_decoder and thinker2audio_decoder — extract discrete tokens from LLM output; truncate/pad vision codes to 729 (27x27) for decoder Registry: - vllm_omni/diffusion/registry.py: register HyperCLOVAXVisionPipeline and HyperCLOVAXAudioPipeline with post-process functions Stage config: - vllm_omni/model_executor/stage_configs/hcx_omni.yaml: 3-stage config Stage 0: LLM thinker (TP=4, GPUs 0-3), Stage 1: vision decoder (GPU 4), Stage 2: audio decoder (GPU 5) Bug fixes for HyperCLOVAX compatibility: - diffusion/request.py: add extra dict field to OmniDiffusionRequest so vision_tokens/audio_tokens from stage input processors reach the pipeline - entrypoints/async_omni_diffusion.py: extract OmniTokensPrompt.additional_information into OmniDiffusionRequest.extra before creating request - entrypoints/omni_stage.py: skip empty engine inputs (text-only requests where thinker2vision_decoder/thinker2audio_decoder return []) - entrypoints/async_omni.py: handle skipped sentinel in _process_single_result so text-only requests complete without crashing on Stage 1/2 * fix: correct decoder params and HCX porting fixes - hcx_omni.yaml: guidance_scale 3.5→0.75, num_inference_steps 30→50 (matches OmniServe production defaults; 3.5 caused over-amplified autoguidance → shrunken/degraded output images) - omni_stage.py: skip empty engine inputs for text-only requests - async_omni_diffusion.py: extract OmniTokensPrompt.additional_information into OmniDiffusionRequest.extra (audio_tokens/vision_tokens) - registry.py: HCX Omni diffusion model registration fix Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: HyperCLOVAX-SEED-Omni-8B stage pipeline and entrypoint fixes * fix: change guidance_scale from 9.0 to 0.75 (autoguidance scale, OmniServe default) * feat: add audio decoder Stage 2 to hcx_omni pipeline - Wire HyperCLOVAXAudioPipeline as Stage 2 in hcx_omni.yaml - GPU 5 assigned for audio decoder (Unit-BigVGAN / NCCosybigvganDecoder) - Add runtime edge 0->2 (thinker -> audio decoder) - Implement post-generation PCM chunk streaming for audio output (4800 samples / 200ms per SSE event @ 24kHz, int16 base64-encoded) Refs: github.com/vllm-project/vllm-omni/pull/869 (already incorporated) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: vllm version compatibility for HyperCLOVAX audio decoder startup - config/model.py: try/except fallback for AttentionBackendEnum import (vllm.v1.attention.backends.registry absent in older vllm builds) - pipeline_hyperclovax_audio.py: return actual named_parameters() from load_weights() when using MAR checkpoint so diffusers_loader strict check passes (weights loaded eagerly in __init__ via MAR extraction) - qwen3_omni_moe_thinker.py, qwen2_5_omni_thinker.py: try/except stubs for check_interleaved_audio_video and merge_interleaved_embeddings which are absent in older vllm qwen2_5_omni_thinker; these symbols are only exercised by Qwen models, not HyperCLOVAX Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: add edge 1→2 and correct model key in hcx_omni.yaml Stage 2 - Add runtime edge from:1 to:2 (required for Stage-2 connector init; without it AsyncOrchestrator cannot route to audio decoder at runtime) - Change model_subdir to model for Stage-2 engine_args to match total-poc working reference config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: audio S2S output - handle diffusion outputs in _create_audio_choice HyperCLOVAXAudioPipeline (diffusion) stores audio in multimodal_output directly (OmniRequestOutput.from_diffusion), not in outputs[0].multimodal_output like LLM pipelines. Fix three locations: 1. _create_audio_choice (non-streaming): use omni_outputs.multimodal_output when final_res.outputs is empty (diffusion path). 2. Streaming audio path: same fix for _final_res.outputs[0]. 3. Both loops (for output in final_res.outputs): fall back to single synthetic choice at index 0 when outputs list is empty. 4. Handle bytes audio output from HyperCLOVAXAudioPipeline post-process (returns WAV bytes, not tensors like Qwen3-Omni). Also fixes audio input (A2T) regression: skip diffusion prompt extraction when mm_data has audio content (added in previous session). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: parse WAV bytes with soundfile for uniform PCM chunk streaming HyperCLOVAXAudioPipeline returns WAV bytes including 44-byte header. The previous byte-offset splitting included the header in the first chunk, corrupting it. Fix: parse with soundfile to get float32 PCM, then convert to int16 chunks uniformly regardless of source type (bytes or tensor). Verified: 136 audio chunks x 200ms = 27.04s audio streamed correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: zero-shot TTS with speaker embedding from input audio - serving_chat.py: extract last input_audio base64 from request messages and inject as ref_audio_b64 into engine_prompt dict - thinker2audio_decoder: read ref_audio_b64 from prompt and pass as ref_audio_tokens to Stage 2 (HyperCLOVAXAudioPipeline) - hcx_omni.yaml: switch Stage 2 to NCZSCosybigvganDecoder.mar (zero-shot) which uses ECAPA-TDNN speaker encoder instead of finetuned ID lookup Pipeline: input audio -> ECAPA-TDNN -> speaker embedding -> BigVGAN synthesis matching the voice characteristics of the original speaker. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: wire audio decoder Stage 2 to hcx_omni pipeline and fix S2S flow - Add Stage 2 (HyperCLOVAXAudioPipeline / NCZSCosybigvganDecoder) to hcx_omni.yaml with GPU 5, gpu_memory_utilization 0.4, edge 0->2 from thinker - Fix thinker2audio_decoder: correct audio token range (128606-135167), remap to [0, 6561) for BigVGAN input, handle empty token case gracefully - Fix pipeline_hyperclovax_audio.py post_process_func signature and incorporate PR#869 BUG FIX patches for stable audio generation * fix: use finetuned audio decoder and fix transformers_modules deserialization - hcx_omni.yaml: switch Stage 2 from NCZSCosybigvganDecoder (zero-shot, ECAPA-TDNN) to NCCosybigvganDecoder (finetuned, nn.Embedding speaker id). Zero-shot decoder required ref_audio (mel spectrogram) which is unavailable for text-only requests and incompatible with finetuned decoder path. - pipeline_hyperclovax_audio.py: guard ref_audio processing with 'not self.bigvgan.finetune' — finetuned decoder has no ECAPA-TDNN encoder, so passing ref_audio bytes would crash with 'expected 100 channels'. - omni_stage.py: add HuggingFace modules cache (~/.cache/huggingface/modules) to sys.path before queue.get_nowait() in try_collect(). Stage-0 pickles outputs containing custom classes from transformers_modules (trust_remote_code), but the API server process doesn't have this path, causing deserialization failures that silently drop Stage-0 outputs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: restore zero-shot speaker cloning with fallback for text-only requests - hcx_omni.yaml: revert to NCZSCosybigvganDecoder.mar (zero-shot ECAPA-TDNN) for voice-preserving S2S synthesis. NCCosybigvganDecoder used a fixed integer speaker_id and lost the input speaker's voice. - pipeline_hyperclovax_audio.py: add zero-mel fallback branch for finetune=False + ref_audio=None case. When a text-only request arrives (no input audio → no ref_audio), ECAPA-TDNN receives a zero mel tensor [1, num_mels, 64] instead of crashing with 'expected 100 channels'. S2S requests always have ref_audio so the zero-shot cloning path is unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: add stage config yaml for HCX audio decoder Signed-off-by: Hyunjoon Jeong <hyunjoon.jeong@navercorp.com> * feat: add HyperCLOVAX-SEED-Omni 8B model as vllm-omni executor Signed-off-by: Hyunjoon Jeong <hyunjoon.jeong@navercorp.com> * feat: add HCX audio decoder pipeline Signed-off-by: Hyunjoon Jeong <hyunjoon.jeong@navercorp.com> * fix: modify exception for HCX audio decoder (GAN) Signed-off-by: Hyunjoon Jeong <hyunjoon.jeong@navercorp.com> * fix: default temperature set to 0, and pipeline model evaluation mode Signed-off-by: Hyunjoon Jeong <hyunjoon.jeong@navercorp.com> --------- Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com> Signed-off-by: dengyunyang <584797741@qq.com> Signed-off-by: gcanlin <canlinguosdu@gmail.com> Signed-off-by: samithuang <285365963@qq.com> Signed-off-by: Samit <285365963@qq.com> Signed-off-by: lishunyang <lishunyang12@163.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Signed-off-by: princepride <wangzhipeng628@gmail.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com> Signed-off-by: wangyu31577 <wangyu31577@hundsun.com> Signed-off-by: Kyle Huang <yellowsea@gmail.com> Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Signed-off-by: natureofnature <wzliu@connect.hku.hk> Signed-off-by: linyueqian <linyueqian@outlook.com> Signed-off-by: mxuax <mxuax@connect.ust.hk> Signed-off-by: XU Mingshi <91017482+mxuax@users.noreply.github.com> Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com> Signed-off-by: wuzhongjian wuzhongjian_yewu@cmss.chinamobile.com Signed-off-by: ZeldaHuang <hzm414167@alibaba-inc.com> Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com> Signed-off-by: Lin, Fanli <fanli.lin@intel.com> Signed-off-by: Fanli Lin <fanli.lin@intel.com> Signed-off-by: Fanli Lin <fanli0116@gmail.com> Signed-off-by: dongbo910220 <1275604947@qq.com> Signed-off-by: Ding Zuhao <e1583181@u.nus.edu> Signed-off-by: jzz <e1583181@u.nus.edu> Signed-off-by: Andy Zhou <46011930+AndyZhou952@users.noreply.github.com> Signed-off-by: wangyu <53896905+yenuo26@users.noreply.github.com> Signed-off-by: Pierre Le Guen <26087574+PierreLeGuen@users.noreply.github.com> Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com> Signed-off-by: ram16g <anlianfengjie@163.com> Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com> Signed-off-by: pablo <juanz9312@gmail.com> Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: anna <lee.anna@navercorp.com> Signed-off-by: Rustam Khadipash <16683750+hadipash@users.noreply.github.com> Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com> Signed-off-by: Taichang Zhou <tzhouam@connect.ust.hk> Signed-off-by: tzhouam <tzhouam@connect.ust.hk> Signed-off-by: hsliu <liuhongsheng4@huawei.com> Signed-off-by: hsliu_ustc <hsliu_ustc@noreply.gitcode.com> Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com> Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com> Signed-off-by: xiedeyantu <czjourney@163.com> Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com> Signed-off-by: Junhong Liu <ljh_lbj@163.com> Signed-off-by: David Chen <530634352@qq.com> Signed-off-by: weichen <calvin_zhu0210@outlook.com> Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: ApsarasX <apsarax@outlook.com> Signed-off-by: Chenguang ZHENG <645327136@qq.com> Signed-off-by: yenuo26 <410167048@qq.com> Signed-off-by: Semmer2 <semmer@live.cn> Signed-off-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com> Signed-off-by: zhou zhuoxin <zhouzhuoxin1508@outlook.com> Signed-off-by: Gao Han <hgaoaf@connect.ust.hk> Signed-off-by: Rein Yang <ruiruyang2@gmail.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Signed-off-by: SYLAR <125541396+lishunyang12@users.noreply.github.com> Signed-off-by: Ziming Huang <1520787127@qq.com> Signed-off-by: SamitHuang <285365963@qq.com> Signed-off-by: GG-li <3226868735@qq.com> Signed-off-by: CHEN <116010019@link.cuhk.edu.cn> Signed-off-by: John Liu BUAA <liukecheng97@gmail.com> Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com> Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com> Signed-off-by: Chendi Xue <chendi.xue@intel.com> Signed-off-by: Daniel Huang <daniel1.huang@intel.com> Signed-off-by: Alex Brooks <albrooks@redhat.com> Signed-off-by: Sy03 <1370724210@qq.com> Signed-off-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: UsamaKenway <usamakenway@gmail.com> Signed-off-by: Hunter Liu <hunter@liu.sh> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: wuhang <wuhang6@huawei.com> Signed-off-by: wuhang <whlbx@hotmail.com> Signed-off-by: pablo <pablo@agigo.ai> Signed-off-by: Shijin Zhang <75300765+Dovis01@users.noreply.github.com> Signed-off-by: Dong Wang <dongw2019@gmail.com> Signed-off-by: sniper35 <dongw2019@gmail.com> Signed-off-by: Yanick Schraner <yanick.schraner@bs.ch> Signed-off-by: Sangchun Ha <seomk9896@gmail.com> Signed-off-by: jader <yjader@foxmail.com> Signed-off-by: junuxyz <216036880+junuxyz@users.noreply.github.com> Signed-off-by: AndyZhou952 <jzhoubc@connect.ust.hk> Signed-off-by: Yupu <feng.yu.pu0330@gmail.com> Signed-off-by: Kevin H. Luu <khluu000@gmail.com> Signed-off-by: zhumingjue <zhumingjue@huawei.com> Signed-off-by: zhumingjue138 <zhumingjue@huawei.com> Signed-off-by: JaredforReal <w13431838023@gmail.com> Signed-off-by: Jared Wen <w13431838023@gmail.com> Signed-off-by: xulusjb <fdukeshik@gmail.com> Signed-off-by: 齐保元 <qibaoyuan@xiaomi.com> Signed-off-by: Sihao Li <111170255+GG-li@users.noreply.github.com> Signed-off-by: Baoyuan Qi <qibaoyuan@126.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com> Signed-off-by: dongbo910220 <32610838+dongbo910220@users.noreply.github.com> Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com> Signed-off-by: baoyuan qi <qibaoyuan@126.com> Signed-off-by: Prajwal A <prajwalanagani@gmail.com> Signed-off-by: 丁宁 <nndding@gmail.com> Signed-off-by: SHIJIN ZHANG <75300765+Dovis01@users.noreply.github.com> Signed-off-by: dingning<dingning7@xiaomi.com> Signed-off-by: dingning <dingning7@xiaomi.com> Signed-off-by: dingning <dingning@xiaomi.com> Signed-off-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com> Signed-off-by: Canlin Guo <961750412@qq.com> Signed-off-by: shijin zhang <75300765+Dovis01@users.noreply.github.com> Signed-off-by: Hyunjoon Jeong <hyunjoon.jeong@navercorp.com> Signed-off-by: Hyunjoon Jeong <with1015@unist.ac.kr> Co-authored-by: Zeyu Huang | 黃澤宇 <11222265+fhfuih@users.noreply.github.com> Co-authored-by: JohnJan <wuzhongjian_yewu@cmss.chinamobile.com> Co-authored-by: dengyunyang <584797741@qq.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Canlin Guo <canlinguosdu@gmail.com> Co-authored-by: Samit <285365963@qq.com> Co-authored-by: SYLAR <125541396+lishunyang12@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com> Co-authored-by: wangyu <53896905+yenuo26@users.noreply.github.com> Co-authored-by: wangyu31577 <wangyu31577@hundsun.com> Co-authored-by: kYLe <yellowsea@gmail.com> Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com> Co-authored-by: NATURE <wzliu@connect.hku.hk> Co-authored-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com> Co-authored-by: Zhou Taichang <tzhouam@connect.ust.hk> Co-authored-by: root <root@hk01dgx028.cm.cluster> Co-authored-by: XU Mingshi <91017482+mxuax@users.noreply.github.com> Co-authored-by: amy-why-3459 <wuhaiyan17@huawei.com> Co-authored-by: Rein Yang <ruiruyang2@gmail.com> Co-authored-by: Ziming Huang <hzm414167@alibaba-inc.com> Co-authored-by: dsinghvi <divyanshsinghvi@gmail.com> Co-authored-by: Fanli Lin <fanli.lin@intel.com> Co-authored-by: dongbo910220 <32610838+dongbo910220@users.noreply.github.com> Co-authored-by: Ding Zuhao <e1583181@u.nus.edu> Co-authored-by: Andy Zhou <46011930+AndyZhou952@users.noreply.github.com> Co-authored-by: Pierre LE GUEN <26087574+PierreLeGuen@users.noreply.github.com> Co-authored-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com> Co-authored-by: Yuanheng Zhao <54058983+yuanheng-zhao@users.noreply.github.com> Co-authored-by: ram16g <anlianfengjie@163.com> Co-authored-by: Didan Deng <33117903+wtomin@users.noreply.github.com> Co-authored-by: Markus / Mark <46672778+marksverdhei@users.noreply.github.com> Co-authored-by: Juan Pablo Zuluaga <46724788+JuanPZuluaga@users.noreply.github.com> Co-authored-by: muziyuhui666 <111362884+muziyuhui666@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: ceanna93 <fairyanna@naver.com> Co-authored-by: anna <lee.anna@navercorp.com> Co-authored-by: Rustam Khadipash <16683750+hadipash@users.noreply.github.com> Co-authored-by: Alicia <115451386+congw729@users.noreply.github.com> Co-authored-by: hsliu_ustc <hsliu_ustc@noreply.gitcode.com> Co-authored-by: liuzhenwei <zhenweiliu@habana.ai> Co-authored-by: erfgss <97771661+erfgss@users.noreply.github.com> Co-authored-by: Jensen <czjourney@163.com> Co-authored-by: Junhong Liu <ljh_lbj@163.com> Co-authored-by: weichen <calvin_zhu0210@outlook.com> Co-authored-by: PopSoda2002 <zhouhp.me@gmail.com> Co-authored-by: Yan Ma <yan.ma@intel.com> Co-authored-by: ApsarasX <apsarax@outlook.com> Co-authored-by: Chenguang Zheng <645327136@qq.com> Co-authored-by: Jiaping Wu <53215702+ElleElleWu@users.noreply.github.com> Co-authored-by: zhou zhuoxin <zhouzhuoxin1508@outlook.com> Co-authored-by: Gao Han <gaohan19@huawei.com> Co-authored-by: rein yang <73573651+R2-Y@users.noreply.github.com> Co-authored-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Co-authored-by: Flora Feng <4florafeng@gmail.com> Co-authored-by: Sihao Li <111170255+GG-li@users.noreply.github.com> Co-authored-by: ChenWenjing <54166744+Shirley125@users.noreply.github.com> Co-authored-by: Bhanu068 <voutharoja.bhanu06@gmail.com> Co-authored-by: John Liu BUAA <liukecheng97@gmail.com> Co-authored-by: yenuo26 <410167048@qq.com> Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com> Co-authored-by: liuzhenwei <zhenwei.liu@intel.com> Co-authored-by: Isotr0py <Isotr0py@outlook.com> Co-authored-by: ZJY0516 <zhu.jiangyun@foxmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Chendi.Xue <chendi.xue@intel.com> Co-authored-by: Daniel Huang <daniel1.huang@intel.com> Co-authored-by: Alex Brooks <albrooks@redhat.com> Co-authored-by: Sy03 <1370724210@qq.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: UsamaKenway <56207634+UsamaKenway@users.noreply.github.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: wuhang <wuhang6@huawei.com> Co-authored-by: pablo <pablo@agigo.ai> Co-authored-by: SHIJIN ZHANG <75300765+Dovis01@users.noreply.github.com> Co-authored-by: Dong W <89223086+sniper35@users.noreply.github.com> Co-authored-by: Yanick Schraner <yanick.schraner@gmail.com> Co-authored-by: Sangchun Ha <seomk9896@naver.com> Co-authored-by: 亦瑾 <76905040+yJader@users.noreply.github.com> Co-authored-by: junuxyz <216036880+junuxyz@users.noreply.github.com> Co-authored-by: Yupu <feng.yu.pu0330@gmail.com> Co-authored-by: Kevin H. Luu <khluu000@gmail.com> Co-authored-by: zhumingjue138 <zhumingjue@huawei.com> Co-authored-by: Canlin Guo <961750412@qq.com> Co-authored-by: Jared Wen <w13431838023@gmail.com> Co-authored-by: Xu Lu <572605156@qq.com> Co-authored-by: xulusjb <fdukeshik@gmail.com> Co-authored-by: Baoyuan Qi <qibaoyuan@xiaomi.com> Co-authored-by: Zhang Shijin <zhangshijin@xiaomi.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: shijin zhang <zsj1364226740@gmail.com> Co-authored-by: Prajwal A <34590600+LawJarp-A@users.noreply.github.com> Co-authored-by: dingning <dingning7@xiaomi.com> Co-authored-by: ning ding <nndding@gmail.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com> Co-authored-by: Ting FU <futing10@huawei.com> Co-authored-by: developer-account <irteam@vllm-omni-dev-0.vllm-omni-dev.p-nb13557.svc.cluster.local> Co-authored-by: Hyunjoon Jeong <hyunjoon.jeong@navercorp.com>


PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Add comprehensive documentation and test example scripts in it for running BAGEL-7B-MoT model in vLLM-Omni.
#936
Test Plan
Tested on dual NVIDIA RTX 5000 Ada GPUs (32GB each) and one NVIDIA A100(80GB).
Container: runpod/pytorch:1.0.2-cu1281-torch280-ubuntu2404
Run all the commands in the README
Test Result
All pass
Changes
Documentation
Configuration
devicesof [vllm_omni/model_executor/stage_configs/bagel.yaml]Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)
@princepride PTAL ❤️