* [Frontend][Model] Support batch request with refined OmniDiffusionReq… (#797)
Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>
* [Model]: add FLUX.1-dev model (#853)
* [BugFix] ignore mm data from stages to async omni (#954)
Signed-off-by: dengyunyang <584797741@qq.com>
* Revert "[BugFix] ignore mm data from stages to async omni" (#1023)
* [Bugfix] Modify output to model_runner_output (#1026)
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
* [Feature] Support cache-dit for Wan 2.2 inference (#1021)
Signed-off-by: samithuang <285365963@qq.com>
Signed-off-by: Samit <285365963@qq.com>
* [Doc]Format profiling doc (#993)
Signed-off-by: lishunyang <lishunyang12@163.com>
Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* [Hardware] Support platforms and plugin system (#774)
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
* [Core]: KV Cache Transfer Encapsulation (#979)
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
* [Test]Delete skip mark for amd ci test and fix CI failure (#927)
Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>
Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Co-authored-by: wangyu31577 <wangyu31577@hundsun.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* [Bugfix][Doc]Specify Qwen3-TTS model name for each task type (#1036)
Signed-off-by: Kyle Huang <yellowsea@gmail.com>
* [Misc] pin version of fa3-fwd (#1051)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
* [CI] [ROCm] Add more AMD CI tests (#1039)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
* [Bugfix] fix qwen image layerd in dummy run (#1027)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
* [BugFix] Fix noisy output without setting a seed in Qwen Image (#1043)
Signed-off-by: natureofnature <wzliu@connect.hku.hk>
* [bugfix] remove vllm speech route (#1060)
Signed-off-by: linyueqian <linyueqian@outlook.com>
* [Debug] Update GLM-Image Pipeline (#1049)
Co-authored-by: root <root@hk01dgx028.cm.cluster>
* [Diffusion][Bugfix] Fix the flash_attn backends selection logic (#983)
Signed-off-by: mxuax <mxuax@connect.ust.hk>
Signed-off-by: XU Mingshi <91017482+mxuax@users.noreply.github.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* [BugFix] Fix the accuracy issue of multimodal input. (#1020)
Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>
Co-authored-by: Rein Yang <ruiruyang2@gmail.com>
* [Bugfix] Set VaeImageProcessor `do_convert_rgb` True (#1032)
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
* [feat]: adapt batch request for flux (#1028)
Signed-off-by: wuzhongjian wuzhongjian_yewu@cmss.chinamobile.com
* [CI] Change Qwen3 Omni stage placement strategy (#1072)
Signed-off-by: ZeldaHuang <hzm414167@alibaba-inc.com>
* [BugFix] Fix to use correct attn backend (#1038)
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
* [Perf] Qwen3 Omni talker mtp optimization (#1005)
Signed-off-by: ZeldaHuang <hzm414167@alibaba-inc.com>
Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* [Wan2.2] Optimize memory usage with conditional transformer loading (#980)
Signed-off-by: Lin, Fanli <fanli.lin@intel.com>
Signed-off-by: Samit <285365963@qq.com>
Co-authored-by: Samit <285365963@qq.com>
* [Feat] Support XPU Backend in vLLM-Omni (#191)
Signed-off-by: Fanli Lin <fanli.lin@intel.com>
Signed-off-by: Fanli Lin <fanli0116@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* [Fix] stabilize diffusion images LoRA E2E across CI drift (#1075)
Signed-off-by: dongbo910220 <1275604947@qq.com>
* [Bugfix][Test] Re-enable the log simple tests (#1065)
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
* [Bugfix] pr conflict fix, bugfix ignore mm data from stages to async omni (#1025)
Signed-off-by: dengyunyang <584797741@qq.com>
* [Doc][Bagel] Add BAGEL-7B-MoT documentation and edit the default stage configuration (#987)
Signed-off-by: Ding Zuhao <e1583181@u.nus.edu>
Signed-off-by: jzz <e1583181@u.nus.edu>
* [Fix] Increase max wait time for server readiness to accommodate model loading (#1089)
Signed-off-by: Andy Zhou <46011930+AndyZhou952@users.noreply.github.com>
* [Benchmark] Add vLLM-Omni Omni model online benchmark (#780)
Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>
Signed-off-by: wangyu <53896905+yenuo26@users.noreply.github.com>
Co-authored-by: wangyu31577 <wangyu31577@hundsun.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* [Bugfix] Remove Mooncake/Yuanrong connector import warning (#1091)
Signed-off-by: natureofnature <wzliu@connect.hku.hk>
* fix: UnboundLocalError for role in streaming audio/image responses (#784)
Signed-off-by: Pierre Le Guen <26087574+PierreLeGuen@users.noreply.github.com>
* [Misc] update wechat image (#1096)
* [Feature] Support DiT Layerwise (Blockwise) CPU Offloading (#858)
Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* [BugFix] Modify max_tokens and modify the log and fix #1103 (#1097)
Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* [BugFix] Fix modulate_index shape error in Qwen-Image-Edit Task (#1100)
Signed-off-by: mxuax <mxuax@connect.ust.hk>
Signed-off-by: XU Mingshi <91017482+mxuax@users.noreply.github.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* [Platform] Add supports_torch_inductor interface (#1108)
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
* [BugFix] Fix Qwen3 Omni talker mtp torch.compile startup error (#1104)
Signed-off-by: ram16g <anlianfengjie@163.com>
Signed-off-by: ZeldaHuang <hzm414167@alibaba-inc.com>
Co-authored-by: ram16g <anlianfengjie@163.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* [Bugfix] fix request_id of image generation in api server (#1112)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* [Perf]: CFG parallel abstraction (#851)
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* [BugFix] Fix Qwen3 TTS 0.6B profile run hang (#995) (#1082)
* [CI] [ROCm] Quick fix amd ci (#1116)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
* [Bugfix] fix benchmark audio timing error and add benchmark test (#1109)
Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>
Co-authored-by: wangyu31577 <wangyu31577@hundsun.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* [Bugfix][Qwen3TTS] Load speaker_id/voices from model configuration (#1079)
Signed-off-by: pablo <juanz9312@gmail.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Co-authored-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com>
* [NPU] Align with GPUModelRunner (#1114)
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
* [FEATURE] /v1/images/edit interface (#1101)
Signed-off-by: dengyunyang <584797741@qq.com>
* [Bugfix] Fix NPU SDPA attention mask shape and semantics (#1031)
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Co-authored-by: muziyuhui666 <111362884+muziyuhui666@users.noreply.github.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* [TeaCache]: Add Coefficient Estimation (#940)
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* [CI]: Bagel E2E Smoked Test (#1074)
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* [Misc] Bump version to 0.14.0 (#1128)
Signed-off-by: Roger Wang <hey@rogerw.io>
* [Doc] First stable release of vLLM-Omni (#1129)
Signed-off-by: Roger Wang <hey@rogerw.io>
* [Misc] Align error handling with upstream vLLM v0.14.0 (#1122)
Signed-off-by: anna <lee.anna@navercorp.com>
Co-authored-by: anna <lee.anna@navercorp.com>
* [Feature] add Tensor Parallelism to LongCat-Image(-Edit) (#926)
Signed-off-by: Rustam Khadipash <16683750+hadipash@users.noreply.github.com>
* [CI] Temporarily remove slow tests. (#1143)
Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Co-authored-by: princepride <wangzhipeng628@gmail.com>
* [CI] Refactor test_sequence_parallel.py and add a warmup run for more accurate performance stat (#1165)
Signed-off-by: mxuax <mxuax@connect.ust.hk>
Signed-off-by: XU Mingshi <91017482+mxuax@users.noreply.github.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* Dev/rebase v0.15.0 (#1159)
Signed-off-by: Taichang Zhou <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com>
* Docs update paper link (#1169)
Signed-off-by: hsliu <liuhongsheng4@huawei.com>
Signed-off-by: hsliu_ustc <hsliu_ustc@noreply.gitcode.com>
Co-authored-by: hsliu_ustc <hsliu_ustc@noreply.gitcode.com>
* [Debug] Clear Dockerfile.ci to accelerate build image (#1172)
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
* [Debug] Correct Unreasonable Long Timeout (#1175)
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
* [Doc]Fix - Align with repo. (#1176)
Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com>
* [Bugfix][Qwen-Image-Edit] Add a warning log for none negative_prompt (#1170)
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
* [Bugfix] fix qwen image oom (#1168)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
* [Hardware] Disable compile of diffusion on XPU (#1148)
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
* [Doc] Fix vLLM version in user docs (#1179)
Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>
* [Refactor] Refactor async chunk and fix the shape mismatch issue (#1151)
Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>
* bugfix: /images/edits endpoint fails pipeline data format check (#1141)
Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* [Perf] resolving prolonged `cudastreamsynchronize` execution in z image processing (#1105)
Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* [Bugfix] modify RTF use audio_e2e/audio_duration (#1157)
Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>
Co-authored-by: wangyu31577 <wangyu31577@hundsun.com>
* [Doc] Highlight paper & slides. (#1186)
Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com>
* [chore] Remove zmq context initialize (#1187)
Signed-off-by: xiedeyantu <czjourney@163.com>
* [NPU] Update Dockerfile and docs for v0.14.0 (#671)
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
* [Bugfix] E2E metric incorrect qwen3-omni with async chunk feature (#1018)
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <ljh_lbj@163.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* [Doc] opt doc (#1118)
Signed-off-by: David Chen <530634352@qq.com>
* [Bugfix] Fix tp+sp accuracy, incorrect process group mapping (#1178)
Signed-off-by: David Chen <530634352@qq.com>
* [Feature] Enable use_audio_in_video for Qwen 3 Omni Online (#1198)
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
* [Bugfix] async_chunk rebase v0.15.0 (#1195)
Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>
* [feature]: support flux cache_dit (#1145)
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
* [CI] Add CI branch coverage calculation, fix statement coverage results and add log before test for buildkite log group (#1120)
Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>
Co-authored-by: wangyu31577 <wangyu31577@hundsun.com>
* [Wan 2.2][Diffusion] Add TP Support (#964)
Signed-off-by: weichen <calvin_zhu0210@outlook.com>
* [Hardware] [Feat] Setup platform dependent package installation (#1046)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: PopSoda2002 <zhouhp.me@gmail.com>
Co-authored-by: gcanlin <canlinguosdu@gmail.com>
* [XPU] Fix XPU UTs for basic coverage (#1164)
Signed-off-by: Yan Ma <yan.ma@intel.com>
* [Test] Add BuildKite test-full script for full CI. (#867)
Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>
Co-authored-by: wangyu31577 <wangyu31577@hundsun.com>
* [Refactor] Reuse upstream Qwen3MoeSparseMoeBlock (#1202)
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
* [Bugfix] Fix wan2.2 ti2v (#1221)
Signed-off-by: mxuax <mxuax@connect.ust.hk>
Signed-off-by: XU Mingshi <91017482+mxuax@users.noreply.github.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* [Bugfix] Fix '--max-generated-image-size' cli args type (#1249)
Signed-off-by: ApsarasX <apsarax@outlook.com>
* [Bugfix] Ensure seed=0 is correctly handled in image edit (#1248)
Signed-off-by: ApsarasX <apsarax@outlook.com>
* [Docs] Add example image download step to Image-To-Video examples (#1258)
Signed-off-by: lishunyang <lishunyang12@163.com>
* [Bugfix] Fix padding bug in 12Hz tokenizer ConvTranspose1d decode (#1241)
Signed-off-by: linyueqian <linyueqian@outlook.com>
* [bugfix] Fix multimodal_output property to check completion outputs where audio data is attached (#1203)
Signed-off-by: linyueqian <linyueqian@outlook.com>
* [Doc] Update QA relevant to quantization (#1257)
Signed-off-by: lishunyang <lishunyang12@163.com>
* [Bugfix] Fix Doc link Rrror (#1263)
Signed-off-by: lishunyang <lishunyang12@163.com>
* Process-Scoped GPU Memory Accounting (#1204)
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
* [ComfyUI]: ComfyUI integration (#1113)
Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>
* fix: add diffusion offload args to OmniConfig group instead of serve_parser (#1271)
Signed-off-by: Chenguang ZHENG <645327136@qq.com>
* [Doc] Adding models/pipelines/features Tutorial (#1196)
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: dongbo910220 <32610838+dongbo910220@users.noreply.github.com>
* [CI] Add env variable check for nightly CI (#1281)
Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com>
* [CI] Add pytest markers to current tests and update the doc. (#577)
Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* [Diffusion][Perf] Remove Redundant Communication Cost by Refining SP Hook Design (#1275)
Signed-off-by: mxuax <mxuax@connect.ust.hk>
Signed-off-by: XU Mingshi <91017482+mxuax@users.noreply.github.com>
* [Feature] Opt metrics structure (#891)
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <ljh_lbj@163.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* [Test] Add example test cases for omni online (#1086)
Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>
Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Signed-off-by: yenuo26 <410167048@qq.com>
Co-authored-by: wangyu31577 <wangyu31577@hundsun.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* [CI] Reduce the time for Diffusion Sequence Parallelism Test (#1283)
Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com>
* [Model] SupportHunyuanImage3 Diffusion Model in vllm-omni (#1085)
Signed-off-by: Semmer2 <semmer@live.cn>
* [Chore] Update copyright year. (#1256)
Signed-off-by: lishunyang <lishunyang12@163.com>
* [feature]: support Flux.1-dev CFG-Parallel (#1269)
* [Bugfix] Fix 'NoneType' AttributeError in stable-diffusion model detect (#1254)
Signed-off-by: Yan Ma <yan.ma@intel.com>
* [Doc] Update Qwen3-TTS docs for consistency with Omni examples (#1226)
Signed-off-by: linyueqian <linyueqian@outlook.com>
Signed-off-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* [Fix]Ensure HuggingFace downloads complete before initialization. (#1213)
Signed-off-by: zhou zhuoxin <zhouzhuoxin1508@outlook.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* [BugFix] Fixed the issue where ignore_eos was not working. (#1286)
Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>
* [Test] Add e2e tests for Qwen3-TTS speech endpoint (#1206)
Signed-off-by: linyueqian <linyueqian@outlook.com>
Signed-off-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com>
* [Feat]: support VAE patch parallelism (#756)
Signed-off-by: dongbo910220 <1275604947@qq.com>
Co-authored-by: hsliuustc0106 <liuhongsheng4@huawei.com>
* [CI] Disable Qwen3-TTS E2E Test in pipeline.yml (#1306)
Signed-off-by: Gao Han <hgaoaf@connect.ust.hk>
* [Misc] Add per-request generator_device to online image gen and edit (#1183)
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
* [Bagel]: Support TP (#1293)
Signed-off-by: princepride <wangzhipeng628@gmail.com>
* [Bugfix] Fix image edit RoPE crash when explicit height/width are provided (#1265)
Signed-off-by: lishunyang <lishunyang12@163.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* [Doc] Sync (#1216)
Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com>
* [Bugfix] fix precision issues of qwen3-omni when enable async_chunk without system prompt (#1288)
Signed-off-by: Rein Yang <ruiruyang2@gmail.com>
* [Debug] Add trigger to concurrent stage init (#1274)
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
* [Bugfix][Qwen3-TTS] Fix task type (#1317)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
* Unifying CLI Argument Naming Style (#1309)
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
* [Bugfix][Qwen3-TTS] Preserve original model ID in omni_snapshot_download (#1318)
* [CI] Run nightly tests. (#1333)
Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com>
* [Feature]: FP8 Quantization Support for DiT (#1034)
Signed-off-by: lishunyang <lishunyang12@163.com>
Signed-off-by: SYLAR <125541396+lishunyang12@users.noreply.github.com>
* Fix yield token metrics and opt metrics record stats (#1292)
* [Test] L2 & L3 Test Case Stratification Design for Omni Model (#1272)
Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>
Signed-off-by: yenuo26 <410167048@qq.com>
Signed-off-by: wangyu <53896905+yenuo26@users.noreply.github.com>
Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Co-authored-by: wangyu31577 <wangyu31577@hundsun.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* [Pref] Support Qwen3 Omni code2wav batch infernce with async chunk (#1246)
Signed-off-by: ZeldaHuang <hzm414167@alibaba-inc.com>
Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Signed-off-by: Ziming Huang <1520787127@qq.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* update qwen3-omni & qwen2.5-onmi openai client (#1304)
Signed-off-by: Rein Yang <ruiruyang2@gmail.com>
* [Feature] Support Wan2.2 T2V and I2V Online Serving with OpenAI /v1/videos API (#1073)
Signed-off-by: samithuang <285365963@qq.com>
Signed-off-by: Samit <285365963@qq.com>
Signed-off-by: SamitHuang <285365963@qq.com>
Co-authored-by: Flora Feng <4florafeng@gmail.com>
* [Feature] add Tensor Parallelism to SD_3.5 (#1336)
Signed-off-by: GG-li <3226868735@qq.com>
* [Feature]async scheduling to overlap chunk IO and compute (#951)
Signed-off-by: CHEN <116010019@link.cuhk.edu.cn>
Co-authored-by: Bhanu068 <voutharoja.bhanu06@gmail.com>
Co-authored-by: Gao Han <gaohan19@huawei.com>
* [Bugfix] reused metrics to modify the API Server token statistics in Stream Response (#1301)
Signed-off-by: John Liu BUAA <liukecheng97@gmail.com>
* Refactor CPU Offloading Backend Pattern (#1223)
Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>
Signed-off-by: Samit <285365963@qq.com>
Co-authored-by: Samit <285365963@qq.com>
* [DOC] Doc for CI test - Details about five level stucture and some other files. (#1167)
Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com>
Co-authored-by: yenuo26 <410167048@qq.com>
* [Bugfix] remove Tongyi-MAI/Z-Image-Turbo related test from L2 ci (#1348)
Signed-off-by: dengyunyang <584797741@qq.com>
* [Misc] wechat image update (#1354)
Signed-off-by: David Chen <530634352@qq.com>
* [Misc] Support WorkerWrapperBase and CustomPipeline for Diffusion Worker (#764)
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
* [Feature][Bugfix] Add CFG feature to Bagel (#1310)
Signed-off-by: Ding Zuhao <e1583181@u.nus.edu>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com>
* [Feature]: Diffusion sleep to use process level memory calculation (#1276)
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com>
* change qwen3-omni open cudagraph by default (#1352)
Signed-off-by: Rein Yang <ruiruyang2@gmail.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* [XPU] Update Bagel's flash_attn_varlen_func to fa utils (#1295)
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
* [Test] Add Omni Model Performance Benchmark Test (#1321)
Signed-off-by: yenuo26 <410167048@qq.com>
Signed-off-by: wangyu <53896905+yenuo26@users.noreply.github.com>
* [BugFix]: Revert utils change (#1369)
Signed-off-by: princepride <wangzhipeng628@gmail.com>
* [Rebase] Rebase to vllm v0.16.0 (#1357)
Signed-off-by: Taichang Zhou <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Co-authored-by: Isotr0py <Isotr0py@outlook.com>
Co-authored-by: ZJY0516 <zhu.jiangyun@foxmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
* [Test] Fix expansion and example test case for qwen3-omni (#1358)
Signed-off-by: yenuo26 <410167048@qq.com>
* [v0.16.0][BUG FIX]Fix hunyuan MOE after update to 0.16.0 (#1401)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
* [0.16.0] remove cuda hard-code for Hunyuan Image3 (#1402)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
* [XPU] Add XPU Dockerfile and related docs (#1162)
Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
Co-authored-by: Daniel Huang <daniel1.huang@intel.com>
* [Bugfix] Fix Hardcoded Datatypes in Z-image (#1393)
Signed-off-by: Alex Brooks <albrooks@redhat.com>
* [Feature] : Support disaggregated inference pipeline for Qwen3_TTS (#1161)
Signed-off-by: Sy03 <1370724210@qq.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* [Feature] Add automated PR reviewer bot with GLM integration (#1424)
Signed-off-by: hsliu <liuhongsheng4@huawei.com>
Signed-off-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* [Misc] Add Qwen2.5-Omni-3B model support to Gradio demo (#1382)
Signed-off-by: UsamaKenway <usamakenway@gmail.com>
* [misc] Feature/pr reviewer auto trigger&update model (#1431)
Signed-off-by: hsliu <liuhongsheng4@huawei.com>
Signed-off-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Hunter Liu <hunter@liu.sh>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Revert "[misc] Feature/pr reviewer auto trigger&update model" (#1432)
* [Doc] Update GPU installation commands (#1434)
* [ROCM] [CI] fix dockerfile.rocm to support nightly build and also fix amd ci v0.16.0rc1 (#1380)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
* [Feature][BAGEL] Combine multi-branch cfg into a single batch to accelerate inference. (#1429)
Signed-off-by: Ding Zuhao <e1583181@u.nus.edu>
Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com>
* [Feat]: add ASCII art logo for vLLM-Omni (#1430)
* [Bug] [Bagel] Fix kv transfer bug (#1437)
Signed-off-by: Ding Zuhao <e1583181@u.nus.edu>
Co-authored-by: Wang Zhipeng: princepride <wangzhipeng628@gmail.com>
* [CI] Set L2 & L3 tests running conditions. (#1344)
Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com>
* [Feature] vLLM-Omni RDMA connector (#1019)
Signed-off-by: natureofnature <wzliu@connect.hku.hk>
* [Minor][Refactor] Pass seq_token_counts explicitly (#1425)
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* [Misc] Extend Diffusion Benchmark script to other backends (#875)
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* [Feature] Support Stage Based Deployment CLI (#939)
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: wuhang <whlbx@hotmail.com>
Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* [Doc] Optimize vLLM-Omni metrics documentation (#1311)
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <ljh_lbj@163.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* [Bugfix] Forward all vllm-omni serve command parameters to model (#985)
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <ljh_lbj@163.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* [Doc]: Add bagel single/multi node usage with mooncake document (#1450)
* [Qwen3TTS][Feat] Code2Wav batched decoding (#1426)
Signed-off-by: pablo <pablo@agigo.ai>
Co-authored-by: pablo <pablo@agigo.ai>
* [CI] Remove overwhelming debug log (#1463)
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
* [Misc] update wechat image (#1464)
Signed-off-by: David Chen <530634352@qq.com>
* [Doc] Refine Diffusion Tutorial Documents (#1305)
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
* [Bugfix] Robust Audio Data Handling in _create_audio_choice (#1222)
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
* [Bugfix]: Fix merging updated additional information to ensure dict type (#1296)
Signed-off-by: Shijin Zhang <75300765+Dovis01@users.noreply.github.com>
* [Model]Add new nextstep_1(Diffusion) model(only T2I) (#612)
Signed-off-by: Dong Wang <dongw2019@gmail.com>
Signed-off-by: sniper35 <dongw2019@gmail.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* [Bugfix] Add TTS configuration options (#1177)
Signed-off-by: Yanick Schraner <yanick.schraner@bs.ch>
* [Debug] Multi-Request for Qwen 3 Omni use_audio_in_video (#1433)
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
* [Bugfix] Fix case-sensitive task_type matching in Qwen3TTSModelForGeneration (#1455)
Signed-off-by: Sangchun Ha <seomk9896@gmail.com>
* [BugFix] process request.num_cached_tokens if it equals to the initial value (#1468)
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Co-authored-by: Gao Han <gaohan19@huawei.com>
* [Bugfix] Fix SDPA attention mask dtype and shape (Fix #857) (#1349)
Signed-off-by: jader <yjader@foxmail.com>
* [Test] Reduce Perf test case and fix modify stage config (#1449)
Signed-off-by: yenuo26 <410167048@qq.com>
* [NPU] Upgrade to v0.16.0 (#1375)
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
* [CI] Update Dockerfile for vllm-omni CI image and remove obsolete dep… (#1491)
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
* [Fix][Chore] Qwen3-TTS Modeling Minor Code Sanity Improvements (#1482)
Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
* [Bugfix] Fix tuple/list KV cache extraction crash (#1405)
Signed-off-by: junuxyz <216036880+junuxyz@users.noreply.github.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* [Doc] format lora related docs for the user's end (#1009)
Signed-off-by: AndyZhou952 <jzhoubc@connect.ust.hk>
Signed-off-by: Andy Zhou <46011930+AndyZhou952@users.noreply.github.com>
* [Feature] Support Wan2.2 output with irregular shapes (#1279)
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
* [Misc] Migrate L1 tests to use pytest-mock (#1315)
Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>
Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
* [Bugfix] Fix LoRA Scaling on Active Adapters (#1421)
Signed-off-by: Alex Brooks <albrooks@redhat.com>
* [Bugfix] fix record audio generated frame in offline infer (#1312)
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <ljh_lbj@163.com>
* [Model] Support OmniGen2 (#513)
Signed-off-by: Yupu <feng.yu.pu0330@gmail.com>
* [Bugfix][Qwen3TTS] (#1289)
Signed-off-by: pablo <juanz9312@gmail.com>
Co-authored-by: Gao Han <gaohan19@huawei.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* Use pull through cache image for H100 pool (#1518)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
* [ROCm] [CI] [Docker] Point to use the latest vLLM v0.16.0 stable version (#1500)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
* [Bugfix] fix offline text_to_image error from #1009 (#1515)
Signed-off-by: David Chen <530634352@qq.com>
* [XPU] Enable FLASH_ATTN on XPU (#1332)
Signed-off-by: Yan Ma <yan.ma@intel.com>
* Revert gpu_1 job to use regular image (#1521)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
* [Chore] remove unused logger in omni_diffusion (#531) (#1509)
Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>
Co-authored-by: Gao Han <gaohan19@huawei.com>
* [Qwen3TTS][Feat] Streaming output (#1438)
Signed-off-by: pablo <pablo@agigo.ai>
Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Co-authored-by: pablo <pablo@agigo.ai>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* [Bugfix] Race condition in MultiprocExecutor when concurent access to Scheduler (#1448)
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* [Doc][Test][Misc] ComfyUI test, more screenshot, and code cleaning (#1435)
Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>
Signed-off-by: Samit <285365963@qq.com>
Co-authored-by: Samit <285365963@qq.com>
* [Performance]Qwen3-Omni performance optimization (#1378)
Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>
* [Feature] Support HSDP for diffusion models (#1339)
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* [CI] fixed CI timeout (#1460)
Signed-off-by: zhumingjue <zhumingjue@huawei.com>
Signed-off-by: zhumingjue138 <zhumingjue@huawei.com>
* [Bugfix] Use uds for zmq address if not set --stage-id (#1522)
Signed-off-by: wuhang <wuhang6@huawei.com>
* [BugFix] Restore talker's config (#1524)
Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>
Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Co-authored-by: Canlin Guo <961750412@qq.com>
* [XPU] fix qwen_omni after rebase to v0.16.0 (#1416)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* [Platform] Enable layerwise offload on all hardware (#1492)
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
* diffusion: enable VAE patch parallel for SD3.5 (#1428)
Signed-off-by: dongbo910220 <1275604947@qq.com>
* [Perf] GLM Image (#920)
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: Jared Wen <w13431838023@gmail.com>
Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* [skip ci][Doc] add design docs for async chunk in qwen3-omni (#962)
Signed-off-by: Rein Yang <ruiruyang2@gmail.com>
* feat(qwen3-tts): Add CUDA Graph support for speech tokenizer decoder (#1205)
Signed-off-by: xulusjb <fdukeshik@gmail.com>
Co-authored-by: xulusjb <fdukeshik@gmail.com>
* [New Model]: XiaomiMiMo/MiMo-Audio-7B-Instruct support (#750)
Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>
Signed-off-by: 齐保元 <qibaoyuan@xiaomi.com>
Signed-off-by: hsliu <liuhongsheng4@huawei.com>
Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Signed-off-by: GG-li <3226868735@qq.com>
Signed-off-by: Sihao Li <111170255+GG-li@users.noreply.github.com>
Signed-off-by: XU Mingshi <91017482+mxuax@users.noreply.github.com>
Signed-off-by: mxuax <mxuax@connect.ust.hk>
Signed-off-by: Baoyuan Qi <qibaoyuan@126.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com>
Signed-off-by: dongbo910220 <1275604947@qq.com>
Signed-off-by: dongbo910220 <32610838+dongbo910220@users.noreply.github.com>
Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: baoyuan qi <qibaoyuan@126.com>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: Prajwal A <prajwalanagani@gmail.com>
Signed-off-by: Shijin Zhang <75300765+Dovis01@users.noreply.github.com>
Signed-off-by: 丁宁 <nndding@gmail.com>
Signed-off-by: SHIJIN ZHANG <75300765+Dovis01@users.noreply.github.com>
Signed-off-by: dingning<dingning7@xiaomi.com>
Signed-off-by: dingning <dingning7@xiaomi.com>
Signed-off-by: dingning <dingning@xiaomi.com>
Co-authored-by: wangyu <53896905+yenuo26@users.noreply.github.com>
Co-authored-by: wangyu31577 <wangyu31577@hundsun.com>
Co-authored-by: Zhang Shijin <zhangshijin@xiaomi.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Co-authored-by: Sihao Li <111170255+GG-li@users.noreply.github.com>
Co-authored-by: XU Mingshi <91017482+mxuax@users.noreply.github.com>
Co-authored-by: Canlin Guo <canlinguosdu@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: JohnJan <wuzhongjian_yewu@cmss.chinamobile.com>
Co-authored-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com>
Co-authored-by: dongbo910220 <32610838+dongbo910220@users.noreply.github.com>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
Co-authored-by: Junhong Liu <ljh_lbj@163.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
Co-authored-by: shijin zhang <zsj1364226740@gmail.com>
Co-authored-by: Zhou Taichang <tzhouam@connect.ust.hk>
Co-authored-by: root <root@hk01dgx028.cm.cluster>
Co-authored-by: Prajwal A <34590600+LawJarp-A@users.noreply.github.com>
Co-authored-by: Shijin Zhang <75300765+Dovis01@users.noreply.github.com>
Co-authored-by: dingning <dingning7@xiaomi.com>
Co-authored-by: ning ding <nndding@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* [Feature]: Native GGUF Quantization Support for DiT (#1285)
Signed-off-by: David Chen <530634352@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com>
Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* Add benchmark for `v1/audio/speech` non-streaming (#1408)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
* [Version] Auto generate version using `setuptool_scm` (#1224)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
* [Feat] : Support Async chunk cleanup (#1087)
Signed-off-by: Sy03 <1370724210@qq.com>
* [Profiler] Support online profiling (#1136)
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: Canlin Guo <961750412@qq.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
* [Bugfix] Fix redundant finished req status updating on OmniGenerationScheduler (#1510)
Signed-off-by: shijin zhang <75300765+Dovis01@users.noreply.github.com>
Co-authored-by: 齐保元 <qibaoyuan@xiaomi.com>
* [XPU][NPU][ROCM] enable cpu_offloading flag for non_cuda (#1488)
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Co-authored-by: gcanlin <canlinguosdu@gmail.com>
* [Chore] Cleanup dead code in GGUF DiT code path (#1533)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
* [Doc] Update installation instructions for vllm 0.16.0 (#1505)
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
* [Doc] [skip ci]Sync. (#1363)
Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com>
Co-authored-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com>
* [CI][skip ci]Update H100 image link based on #1518 (#1538)
Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com>
* Fix no embed text spk tokens (#1540)
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
* [Debug] Merge vllm pull 35368 (#1534)
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
* [Docs] update async chunk docs diagram [skip ci] (#1530)
Signed-off-by: Rein Yang <ruiruyang2@gmail.com>
* fix(qwen3-tts): fix Base ICL voice clone producing corrupted audio (#1554)
Signed-off-by: linyueqian <linyueqian@outlook.com>
* [NPU][Bugfix] Align GPU side and recover qwen3-tts (#1564)
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
* [BugFix] Fix unexpected crash when init OmniDiffusion (#1562)
Signed-off-by: Semmer2 <semmer@live.cn>
* [CI] Modify some CI test cases to run on L4 environment to reduce H100 resource usage. (#1543)
Signed-off-by: yenuo26 <410167048@qq.com>
Signed-off-by: wangyu <53896905+yenuo26@users.noreply.github.com>
* [BugFix]: fix a lot of bug (#1565)
Signed-off-by: princepride <wangzhipeng628@gmail.com>
* feat: add HyperCLOVAX-SEED-Omni-8B support
Model files:
- vllm_omni/diffusion/models/hyperclovax_vision/: vision decoder pipeline
(HyperCLOVAXVisionPipeline) using flow matching diffusion + VisionTransformer
- vllm_omni/diffusion/models/hyperclovax_audio/: audio decoder pipeline
(HyperCLOVAXAudioPipeline) using Unit-BigVGAN codec
- vllm_omni/model_executor/stage_input_processors/hyperclovax_seed_omni.py:
thinker2vision_decoder and thinker2audio_decoder — extract discrete tokens from
LLM output; truncate/pad vision codes to 729 (27x27) for decoder
Registry:
- vllm_omni/diffusion/registry.py: register HyperCLOVAXVisionPipeline and
HyperCLOVAXAudioPipeline with post-process functions
Stage config:
- vllm_omni/model_executor/stage_configs/hcx_omni.yaml: 3-stage config
Stage 0: LLM thinker (TP=4, GPUs 0-3), Stage 1: vision decoder (GPU 4),
Stage 2: audio decoder (GPU 5)
Bug fixes for HyperCLOVAX compatibility:
- diffusion/request.py: add extra dict field to OmniDiffusionRequest so
vision_tokens/audio_tokens from stage input processors reach the pipeline
- entrypoints/async_omni_diffusion.py: extract OmniTokensPrompt.additional_information
into OmniDiffusionRequest.extra before creating request
- entrypoints/omni_stage.py: skip empty engine inputs (text-only requests where
thinker2vision_decoder/thinker2audio_decoder return [])
- entrypoints/async_omni.py: handle skipped sentinel in _process_single_result
so text-only requests complete without crashing on Stage 1/2
* fix: correct decoder params and HCX porting fixes
- hcx_omni.yaml: guidance_scale 3.5→0.75, num_inference_steps 30→50
(matches OmniServe production defaults; 3.5 caused over-amplified
autoguidance → shrunken/degraded output images)
- omni_stage.py: skip empty engine inputs for text-only requests
- async_omni_diffusion.py: extract OmniTokensPrompt.additional_information
into OmniDiffusionRequest.extra (audio_tokens/vision_tokens)
- registry.py: HCX Omni diffusion model registration fix
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat: HyperCLOVAX-SEED-Omni-8B stage pipeline and entrypoint fixes
* fix: change guidance_scale from 9.0 to 0.75 (autoguidance scale, OmniServe default)
* feat: add audio decoder Stage 2 to hcx_omni pipeline
- Wire HyperCLOVAXAudioPipeline as Stage 2 in hcx_omni.yaml
- GPU 5 assigned for audio decoder (Unit-BigVGAN / NCCosybigvganDecoder)
- Add runtime edge 0->2 (thinker -> audio decoder)
- Implement post-generation PCM chunk streaming for audio output
(4800 samples / 200ms per SSE event @ 24kHz, int16 base64-encoded)
Refs: github.com/vllm-project/vllm-omni/pull/869 (already incorporated)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: vllm version compatibility for HyperCLOVAX audio decoder startup
- config/model.py: try/except fallback for AttentionBackendEnum import
(vllm.v1.attention.backends.registry absent in older vllm builds)
- pipeline_hyperclovax_audio.py: return actual named_parameters() from
load_weights() when using MAR checkpoint so diffusers_loader strict
check passes (weights loaded eagerly in __init__ via MAR extraction)
- qwen3_omni_moe_thinker.py, qwen2_5_omni_thinker.py: try/except stubs
for check_interleaved_audio_video and merge_interleaved_embeddings
which are absent in older vllm qwen2_5_omni_thinker; these symbols
are only exercised by Qwen models, not HyperCLOVAX
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: add edge 1→2 and correct model key in hcx_omni.yaml Stage 2
- Add runtime edge from:1 to:2 (required for Stage-2 connector init;
without it AsyncOrchestrator cannot route to audio decoder at runtime)
- Change model_subdir to model for Stage-2 engine_args to match
total-poc working reference config
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: audio S2S output - handle diffusion outputs in _create_audio_choice
HyperCLOVAXAudioPipeline (diffusion) stores audio in multimodal_output
directly (OmniRequestOutput.from_diffusion), not in outputs[0].multimodal_output
like LLM pipelines. Fix three locations:
1. _create_audio_choice (non-streaming): use omni_outputs.multimodal_output
when final_res.outputs is empty (diffusion path).
2. Streaming audio path: same fix for _final_res.outputs[0].
3. Both loops (for output in final_res.outputs): fall back to single
synthetic choice at index 0 when outputs list is empty.
4. Handle bytes audio output from HyperCLOVAXAudioPipeline post-process
(returns WAV bytes, not tensors like Qwen3-Omni).
Also fixes audio input (A2T) regression: skip diffusion prompt extraction
when mm_data has audio content (added in previous session).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: parse WAV bytes with soundfile for uniform PCM chunk streaming
HyperCLOVAXAudioPipeline returns WAV bytes including 44-byte header.
The previous byte-offset splitting included the header in the first
chunk, corrupting it. Fix: parse with soundfile to get float32 PCM,
then convert to int16 chunks uniformly regardless of source type
(bytes or tensor).
Verified: 136 audio chunks x 200ms = 27.04s audio streamed correctly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat: zero-shot TTS with speaker embedding from input audio
- serving_chat.py: extract last input_audio base64 from request messages
and inject as ref_audio_b64 into engine_prompt dict
- thinker2audio_decoder: read ref_audio_b64 from prompt and pass as
ref_audio_tokens to Stage 2 (HyperCLOVAXAudioPipeline)
- hcx_omni.yaml: switch Stage 2 to NCZSCosybigvganDecoder.mar (zero-shot)
which uses ECAPA-TDNN speaker encoder instead of finetuned ID lookup
Pipeline: input audio -> ECAPA-TDNN -> speaker embedding -> BigVGAN synthesis
matching the voice characteristics of the original speaker.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat: wire audio decoder Stage 2 to hcx_omni pipeline and fix S2S flow
- Add Stage 2 (HyperCLOVAXAudioPipeline / NCZSCosybigvganDecoder) to hcx_omni.yaml
with GPU 5, gpu_memory_utilization 0.4, edge 0->2 from thinker
- Fix thinker2audio_decoder: correct audio token range (128606-135167),
remap to [0, 6561) for BigVGAN input, handle empty token case gracefully
- Fix pipeline_hyperclovax_audio.py post_process_func signature and
incorporate PR#869 BUG FIX patches for stable audio generation
* fix: use finetuned audio decoder and fix transformers_modules deserialization
- hcx_omni.yaml: switch Stage 2 from NCZSCosybigvganDecoder (zero-shot,
ECAPA-TDNN) to NCCosybigvganDecoder (finetuned, nn.Embedding speaker id).
Zero-shot decoder required ref_audio (mel spectrogram) which is unavailable
for text-only requests and incompatible with finetuned decoder path.
- pipeline_hyperclovax_audio.py: guard ref_audio processing with
'not self.bigvgan.finetune' — finetuned decoder has no ECAPA-TDNN encoder,
so passing ref_audio bytes would crash with 'expected 100 channels'.
- omni_stage.py: add HuggingFace modules cache (~/.cache/huggingface/modules)
to sys.path before queue.get_nowait() in try_collect(). Stage-0 pickles
outputs containing custom classes from transformers_modules (trust_remote_code),
but the API server process doesn't have this path, causing deserialization
failures that silently drop Stage-0 outputs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: restore zero-shot speaker cloning with fallback for text-only requests
- hcx_omni.yaml: revert to NCZSCosybigvganDecoder.mar (zero-shot ECAPA-TDNN)
for voice-preserving S2S synthesis. NCCosybigvganDecoder used a fixed
integer speaker_id and lost the input speaker's voice.
- pipeline_hyperclovax_audio.py: add zero-mel fallback branch for
finetune=False + ref_audio=None case. When a text-only request arrives
(no input audio → no ref_audio), ECAPA-TDNN receives a zero mel tensor
[1, num_mels, 64] instead of crashing with 'expected 100 channels'.
S2S requests always have ref_audio so the zero-shot cloning path is
unchanged.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat: add stage config yaml for HCX audio decoder
Signed-off-by: Hyunjoon Jeong <hyunjoon.jeong@navercorp.com>
* feat: add HyperCLOVAX-SEED-Omni 8B model as vllm-omni executor
Signed-off-by: Hyunjoon Jeong <hyunjoon.jeong@navercorp.com>
* feat: add HCX audio decoder pipeline
Signed-off-by: Hyunjoon Jeong <hyunjoon.jeong@navercorp.com>
* fix: modify exception for HCX audio decoder (GAN)
Signed-off-by: Hyunjoon Jeong <hyunjoon.jeong@navercorp.com>
* fix: default temperature set to 0, and pipeline model evaluation mode
Signed-off-by: Hyunjoon Jeong <hyunjoon.jeong@navercorp.com>
---------
Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>
Signed-off-by: dengyunyang <584797741@qq.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: samithuang <285365963@qq.com>
Signed-off-by: Samit <285365963@qq.com>
Signed-off-by: lishunyang <lishunyang12@163.com>
Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>
Signed-off-by: Kyle Huang <yellowsea@gmail.com>
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: natureofnature <wzliu@connect.hku.hk>
Signed-off-by: linyueqian <linyueqian@outlook.com>
Signed-off-by: mxuax <mxuax@connect.ust.hk>
Signed-off-by: XU Mingshi <91017482+mxuax@users.noreply.github.com>
Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>
Signed-off-by: wuzhongjian wuzhongjian_yewu@cmss.chinamobile.com
Signed-off-by: ZeldaHuang <hzm414167@alibaba-inc.com>
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: Lin, Fanli <fanli.lin@intel.com>
Signed-off-by: Fanli Lin <fanli.lin@intel.com>
Signed-off-by: Fanli Lin <fanli0116@gmail.com>
Signed-off-by: dongbo910220 <1275604947@qq.com>
Signed-off-by: Ding Zuhao <e1583181@u.nus.edu>
Signed-off-by: jzz <e1583181@u.nus.edu>
Signed-off-by: Andy Zhou <46011930+AndyZhou952@users.noreply.github.com>
Signed-off-by: wangyu <53896905+yenuo26@users.noreply.github.com>
Signed-off-by: Pierre Le Guen <26087574+PierreLeGuen@users.noreply.github.com>
Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
Signed-off-by: ram16g <anlianfengjie@163.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: pablo <juanz9312@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: anna <lee.anna@navercorp.com>
Signed-off-by: Rustam Khadipash <16683750+hadipash@users.noreply.github.com>
Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com>
Signed-off-by: Taichang Zhou <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: hsliu <liuhongsheng4@huawei.com>
Signed-off-by: hsliu_ustc <hsliu_ustc@noreply.gitcode.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>
Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
Signed-off-by: xiedeyantu <czjourney@163.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <ljh_lbj@163.com>
Signed-off-by: David Chen <530634352@qq.com>
Signed-off-by: weichen <calvin_zhu0210@outlook.com>
Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: ApsarasX <apsarax@outlook.com>
Signed-off-by: Chenguang ZHENG <645327136@qq.com>
Signed-off-by: yenuo26 <410167048@qq.com>
Signed-off-by: Semmer2 <semmer@live.cn>
Signed-off-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com>
Signed-off-by: zhou zhuoxin <zhouzhuoxin1508@outlook.com>
Signed-off-by: Gao Han <hgaoaf@connect.ust.hk>
Signed-off-by: Rein Yang <ruiruyang2@gmail.com>
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Signed-off-by: SYLAR <125541396+lishunyang12@users.noreply.github.com>
Signed-off-by: Ziming Huang <1520787127@qq.com>
Signed-off-by: SamitHuang <285365963@qq.com>
Signed-off-by: GG-li <3226868735@qq.com>
Signed-off-by: CHEN <116010019@link.cuhk.edu.cn>
Signed-off-by: John Liu BUAA <liukecheng97@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
Signed-off-by: Alex Brooks <albrooks@redhat.com>
Signed-off-by: Sy03 <1370724210@qq.com>
Signed-off-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: UsamaKenway <usamakenway@gmail.com>
Signed-off-by: Hunter Liu <hunter@liu.sh>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: wuhang <whlbx@hotmail.com>
Signed-off-by: pablo <pablo@agigo.ai>
Signed-off-by: Shijin Zhang <75300765+Dovis01@users.noreply.github.com>
Signed-off-by: Dong Wang <dongw2019@gmail.com>
Signed-off-by: sniper35 <dongw2019@gmail.com>
Signed-off-by: Yanick Schraner <yanick.schraner@bs.ch>
Signed-off-by: Sangchun Ha <seomk9896@gmail.com>
Signed-off-by: jader <yjader@foxmail.com>
Signed-off-by: junuxyz <216036880+junuxyz@users.noreply.github.com>
Signed-off-by: AndyZhou952 <jzhoubc@connect.ust.hk>
Signed-off-by: Yupu <feng.yu.pu0330@gmail.com>
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
Signed-off-by: zhumingjue <zhumingjue@huawei.com>
Signed-off-by: zhumingjue138 <zhumingjue@huawei.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: Jared Wen <w13431838023@gmail.com>
Signed-off-by: xulusjb <fdukeshik@gmail.com>
Signed-off-by: 齐保元 <qibaoyuan@xiaomi.com>
Signed-off-by: Sihao Li <111170255+GG-li@users.noreply.github.com>
Signed-off-by: Baoyuan Qi <qibaoyuan@126.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com>
Signed-off-by: dongbo910220 <32610838+dongbo910220@users.noreply.github.com>
Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
Signed-off-by: baoyuan qi <qibaoyuan@126.com>
Signed-off-by: Prajwal A <prajwalanagani@gmail.com>
Signed-off-by: 丁宁 <nndding@gmail.com>
Signed-off-by: SHIJIN ZHANG <75300765+Dovis01@users.noreply.github.com>
Signed-off-by: dingning<dingning7@xiaomi.com>
Signed-off-by: dingning <dingning7@xiaomi.com>
Signed-off-by: dingning <dingning@xiaomi.com>
Signed-off-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com>
Signed-off-by: Canlin Guo <961750412@qq.com>
Signed-off-by: shijin zhang <75300765+Dovis01@users.noreply.github.com>
Signed-off-by: Hyunjoon Jeong <hyunjoon.jeong@navercorp.com>
Signed-off-by: Hyunjoon Jeong <with1015@unist.ac.kr>
Co-authored-by: Zeyu Huang | 黃澤宇 <11222265+fhfuih@users.noreply.github.com>
Co-authored-by: JohnJan <wuzhongjian_yewu@cmss.chinamobile.com>
Co-authored-by: dengyunyang <584797741@qq.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Co-authored-by: Canlin Guo <canlinguosdu@gmail.com>
Co-authored-by: Samit <285365963@qq.com>
Co-authored-by: SYLAR <125541396+lishunyang12@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com>
Co-authored-by: wangyu <53896905+yenuo26@users.noreply.github.com>
Co-authored-by: wangyu31577 <wangyu31577@hundsun.com>
Co-authored-by: kYLe <yellowsea@gmail.com>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
Co-authored-by: NATURE <wzliu@connect.hku.hk>
Co-authored-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com>
Co-authored-by: Zhou Taichang <tzhouam@connect.ust.hk>
Co-authored-by: root <root@hk01dgx028.cm.cluster>
Co-authored-by: XU Mingshi <91017482+mxuax@users.noreply.github.com>
Co-authored-by: amy-why-3459 <wuhaiyan17@huawei.com>
Co-authored-by: Rein Yang <ruiruyang2@gmail.com>
Co-authored-by: Ziming Huang <hzm414167@alibaba-inc.com>
Co-authored-by: dsinghvi <divyanshsinghvi@gmail.com>
Co-authored-by: Fanli Lin <fanli.lin@intel.com>
Co-authored-by: dongbo910220 <32610838+dongbo910220@users.noreply.github.com>
Co-authored-by: Ding Zuhao <e1583181@u.nus.edu>
Co-authored-by: Andy Zhou <46011930+AndyZhou952@users.noreply.github.com>
Co-authored-by: Pierre LE GUEN <26087574+PierreLeGuen@users.noreply.github.com>
Co-authored-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com>
Co-authored-by: Yuanheng Zhao <54058983+yuanheng-zhao@users.noreply.github.com>
Co-authored-by: ram16g <anlianfengjie@163.com>
Co-authored-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Co-authored-by: Markus / Mark <46672778+marksverdhei@users.noreply.github.com>
Co-authored-by: Juan Pablo Zuluaga <46724788+JuanPZuluaga@users.noreply.github.com>
Co-authored-by: muziyuhui666 <111362884+muziyuhui666@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: ceanna93 <fairyanna@naver.com>
Co-authored-by: anna <lee.anna@navercorp.com>
Co-authored-by: Rustam Khadipash <16683750+hadipash@users.noreply.github.com>
Co-authored-by: Alicia <115451386+congw729@users.noreply.github.com>
Co-authored-by: hsliu_ustc <hsliu_ustc@noreply.gitcode.com>
Co-authored-by: liuzhenwei <zhenweiliu@habana.ai>
Co-authored-by: erfgss <97771661+erfgss@users.noreply.github.com>
Co-authored-by: Jensen <czjourney@163.com>
Co-authored-by: Junhong Liu <ljh_lbj@163.com>
Co-authored-by: weichen <calvin_zhu0210@outlook.com>
Co-authored-by: PopSoda2002 <zhouhp.me@gmail.com>
Co-authored-by: Yan Ma <yan.ma@intel.com>
Co-authored-by: ApsarasX <apsarax@outlook.com>
Co-authored-by: Chenguang Zheng <645327136@qq.com>
Co-authored-by: Jiaping Wu <53215702+ElleElleWu@users.noreply.github.com>
Co-authored-by: zhou zhuoxin <zhouzhuoxin1508@outlook.com>
Co-authored-by: Gao Han <gaohan19@huawei.com>
Co-authored-by: rein yang <73573651+R2-Y@users.noreply.github.com>
Co-authored-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Flora Feng <4florafeng@gmail.com>
Co-authored-by: Sihao Li <111170255+GG-li@users.noreply.github.com>
Co-authored-by: ChenWenjing <54166744+Shirley125@users.noreply.github.com>
Co-authored-by: Bhanu068 <voutharoja.bhanu06@gmail.com>
Co-authored-by: John Liu BUAA <liukecheng97@gmail.com>
Co-authored-by: yenuo26 <410167048@qq.com>
Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Co-authored-by: liuzhenwei <zhenwei.liu@intel.com>
Co-authored-by: Isotr0py <Isotr0py@outlook.com>
Co-authored-by: ZJY0516 <zhu.jiangyun@foxmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Daniel Huang <daniel1.huang@intel.com>
Co-authored-by: Alex Brooks <albrooks@redhat.com>
Co-authored-by: Sy03 <1370724210@qq.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: UsamaKenway <56207634+UsamaKenway@users.noreply.github.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: wuhang <wuhang6@huawei.com>
Co-authored-by: pablo <pablo@agigo.ai>
Co-authored-by: SHIJIN ZHANG <75300765+Dovis01@users.noreply.github.com>
Co-authored-by: Dong W <89223086+sniper35@users.noreply.github.com>
Co-authored-by: Yanick Schraner <yanick.schraner@gmail.com>
Co-authored-by: Sangchun Ha <seomk9896@naver.com>
Co-authored-by: 亦瑾 <76905040+yJader@users.noreply.github.com>
Co-authored-by: junuxyz <216036880+junuxyz@users.noreply.github.com>
Co-authored-by: Yupu <feng.yu.pu0330@gmail.com>
Co-authored-by: Kevin H. Luu <khluu000@gmail.com>
Co-authored-by: zhumingjue138 <zhumingjue@huawei.com>
Co-authored-by: Canlin Guo <961750412@qq.com>
Co-authored-by: Jared Wen <w13431838023@gmail.com>
Co-authored-by: Xu Lu <572605156@qq.com>
Co-authored-by: xulusjb <fdukeshik@gmail.com>
Co-authored-by: Baoyuan Qi <qibaoyuan@xiaomi.com>
Co-authored-by: Zhang Shijin <zhangshijin@xiaomi.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: shijin zhang <zsj1364226740@gmail.com>
Co-authored-by: Prajwal A <34590600+LawJarp-A@users.noreply.github.com>
Co-authored-by: dingning <dingning7@xiaomi.com>
Co-authored-by: ning ding <nndding@gmail.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
Co-authored-by: Ting FU <futing10@huawei.com>
Co-authored-by: developer-account <irteam@vllm-omni-dev-0.vllm-omni-dev.p-nb13557.svc.cluster.local>
Co-authored-by: Hyunjoon Jeong <hyunjoon.jeong@navercorp.com>
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Add Omni Model Performance Benchmark Test
Design for Omni Model: refer to #1313
Related documentation provided in:#1167
Test Plan
pytest -s -v scripts/run_benchmark.py --html=report.html --self-contained-html
Test Result
result json list
perf result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)