[XPU] Add XPU Dockerfile and related docs#1162
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7e035f4ede
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| FROM final AS vllm-openai | ||
| RUN cd ${COMMON_WORKDIR}/vllm-omni | ||
| ENTRYPOINT ["vllm", "serve", "--omni"] |
There was a problem hiding this comment.
Clear inherited CMD for vllm-openai image
The vllm-openai stage sets an ENTRYPOINT but does not override the CMD inherited from the final stage (CMD ["/bin/bash"] above). When users run the image without extra args, Docker passes the inherited CMD to the entrypoint, effectively executing vllm serve --omni /bin/bash, which vllm serve will treat as a positional argument and fail. To avoid this, explicitly set CMD [] (or a valid default argument list) in this stage.
Useful? React with 👍 / 👎.
| @@ -0,0 +1,28 @@ | |||
| ARG BASE_IMAGE=gar-registry.caas.intel.com/pytorch/pytorch-ipex-spr:multi-bmg_release_2601_py2.10_ww2605.5 | |||
There was a problem hiding this comment.
should we use the multi-arc release? or this is temp setting?
There was a problem hiding this comment.
could we based on an upstreaming vllm xpu? From me, doing this will create technical debt for us, and for debt, we always need pay it with interest.
There was a problem hiding this comment.
I am planning to building images based on upstream commits. For example, if the coming vLLM-omni release is based on vLLM v0.15.0, I will build the image using file https://github.com/vllm-project/vllm/blob/releases/v0.15.0/docker/Dockerfile.xpu.
There was a problem hiding this comment.
Updated. @yao-matrix @xuechendi please take a further look.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 74cc42df39
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
||
| ENV VLLM_WORKER_MULTIPROC_METHOD=spawn | ||
|
|
||
| ENTRYPOINT ["vllm", "serve", "--omni"] |
There was a problem hiding this comment.
Source oneAPI env for vllm-openai entrypoint
The image installs oneAPI/oneCCL and only wires its environment into /root/.bashrc (via setvars.sh), but the vllm-openai stage runs vllm serve --omni directly. Non-interactive entrypoints do not source .bashrc, so the oneAPI environment (e.g., library paths for libccl.so) is missing when users run the container with the default entrypoint, which can lead to runtime linker errors or CCL not being found. Consider exporting the needed env vars with ENV or wrapping the entrypoint in bash -lc 'source /root/.bashrc && …' so the runtime environment matches what you prepared.
Useful? React with 👍 / 👎.
xuechendi
left a comment
There was a problem hiding this comment.
verified with intel gpu (1500 and B60), both working
|
@ywang96 , may you help to review and merge. |
| @@ -0,0 +1,110 @@ | |||
| FROM intel/deep-learning-essentials:2025.2.2-0-devel-ubuntu24.04 AS vllm-base | |||
There was a problem hiding this comment.
how long does it take to build a image in CI env?
There was a problem hiding this comment.
First time will take about 10min due to vllm docker build. We will provide our Intel GPU node to buildkite, so we can ensure vLLM base image cache to accelerate CI docker build time
There was a problem hiding this comment.
For the CI test, please check #400 and the follow-up design RFC for detials. We expect all other platform CI e2e time should not exceed the cuda e2e time.
There was a problem hiding this comment.
Got it, thanks for the info!
Is it OK to get this PR in firstly, and we will accelerate docker build time in next CI pr.
There was a problem hiding this comment.
@xuechendi are there plans to ship XPU docker image on vllm main repo?
There was a problem hiding this comment.
@tjtanaa , yes, we are WIP on this path, couple things will be done before that:
- migrate main repo ci from single script run to pipeline (docker build on AWS and multi-CI run on XPU)
- bring up B60 machines in vLLM CI
- ship xpu release docker image exactly follow vLLM cadense.
Will still take a while to get there, so we have to provide non-base-image omni docker here.
There was a problem hiding this comment.
@xuechendi great. Thanks for sharing the roadmap.
Do you plan to ship out the vllm-omni xpu release docker image before you start shipping xpu release docker image on vLLM? We have started setting up docker image release pipeline.
| ENV VLLM_WORKER_MULTIPROC_METHOD=spawn | ||
|
|
||
| RUN --mount=type=cache,target=/root/.cache/pip \ | ||
| --mount=type=bind,source=.git,target=.git \ |
There was a problem hiding this comment.
This bind mount is not needed since we are git cloning from vllm, and .git is for vllm-omni.
There was a problem hiding this comment.
Makes sense. removed this line.
|
@hsliuustc0106 PTAL again. |
|
CI PR is submitted, #1340 @hsliuustc0106 , we have validated CI docker build with PR1340, it takes about 1 min to build docker with Cache. |
There was a problem hiding this comment.
Pull request overview
Adds Intel XPU installation support to the vLLM-Omni repo by introducing an XPU-focused Dockerfile and wiring XPU sections into the GPU installation docs.
Changes:
- Added a new
docker/Dockerfile.xputo build an XPU-capable vLLM-Omni + OpenAI server image. - Added
docs/getting_started/installation/gpu/xpu.inc.mdand hooked it intodocs/getting_started/installation/gpu.md. - Updated the installation README to list Intel XPU under GPU platforms.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| docs/getting_started/installation/gpu/xpu.inc.md | New XPU installation snippet (requirements + Docker build/run instructions). |
| docs/getting_started/installation/gpu.md | Adds Intel XPU as a new tabbed variant across requirements/install sections. |
| docs/getting_started/installation/README.md | Adds Intel XPU to the installation platform list. |
| docker/Dockerfile.xpu | New multi-stage Dockerfile to build vLLM + vLLM-Omni for Intel XPU and set an OpenAI server entrypoint. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ln -s /opt/intel/oneapi/ccl/2021.15 /opt/intel/oneapi/ccl/latest | ||
|
|
||
| SHELL ["bash", "-c"] | ||
| CMD ["bash", "-c", "source /root/.bashrc && exec bash"] |
There was a problem hiding this comment.
This stage sets a CMD to source /root/.bashrc, but a later CMD ["/bin/bash"] overrides it, so the sourcing will never take effect. Remove the earlier CMD or ensure required OneAPI env vars are set via ENV/entrypoint in the final stage.
| CMD ["bash", "-c", "source /root/.bashrc && exec bash"] |
| From vllm-base as vllm-omni | ||
|
|
There was a problem hiding this comment.
Dockerfile instruction casing is inconsistent here (From vs FROM). The rest of the repo uses uppercase Dockerfile directives; please switch to FROM for consistency and to avoid issues with stricter tooling.
| @@ -0,0 +1,48 @@ | |||
| # --8<-- [start:requirements] | |||
|
|
|||
| - GPU: Validated on Intel® Arc™ B-Series (It should be supported on the AMD GPUs that are supported by vLLM.) | |||
There was a problem hiding this comment.
The XPU requirements bullet incorrectly says it should be supported on AMD GPUs. This looks like a copy/paste error; update it to refer to Intel/XPU hardware supported by vLLM (or remove the parenthetical if unsure).
| - GPU: Validated on Intel® Arc™ B-Series (It should be supported on the AMD GPUs that are supported by vLLM.) | |
| - GPU: Validated on Intel® Arc™ B-Series (It should be supported on other Intel® GPUs and XPU devices that are supported by vLLM.) |
| # --8<-- [end:requirements] | ||
| # --8<-- [start:set-up-using-python] | ||
|
|
||
| vLLM-Omni current recommends the steps in under setup through Docker Images. |
There was a problem hiding this comment.
Grammar issue: "vLLM-Omni current recommends the steps in under setup through Docker Images." reads incorrectly. Please rephrase (e.g., "vLLM-Omni currently recommends using the Docker image setup steps below").
| vLLM-Omni current recommends the steps in under setup through Docker Images. | |
| vLLM-Omni currently recommends using the Docker image setup steps below. |
| # --8<-- [start:pre-built-wheels] | ||
|
|
||
| # --8<-- [end:pre-built-wheels] | ||
|
|
||
| # --8<-- [start:build-wheel-from-source] | ||
|
|
||
| # --8<-- [end:build-wheel-from-source] |
There was a problem hiding this comment.
These include-marked sections are empty, but gpu.md links to them as separate tabs. This will render blank content for XPU in several places; consider adding a short note like "Not available for XPU" or removing the tab include until content exists.
| ARG COMMON_WORKDIR=/workspace/vllm-omni | ||
| RUN mkdir -p ${COMMON_WORKDIR} | ||
| COPY . ${COMMON_WORKDIR} | ||
|
|
There was a problem hiding this comment.
vllm-omni installation relies on target device detection in setup.py, which selects XPU only when torch.xpu.is_available() is true. During docker builds (typically on hosts without an XPU), this will likely fall back to CPU and install the wrong dependencies. Set VLLM_OMNI_TARGET_DEVICE=xpu (as an ENV or on this RUN) before installing vllm-omni.
| ENV VLLM_OMNI_TARGET_DEVICE=xpu |
| SHELL ["bash", "-c"] | ||
| CMD ["bash", "-c", "source /root/.bashrc && exec bash"] | ||
| WORKDIR /workspace/ | ||
| RUN git clone -b v0.15.0 https://github.com/vllm-project/vllm |
There was a problem hiding this comment.
Can you add a vllm version ARG VLLM_VERSION e.g.
https://github.com/vllm-project/vllm-omni/blob/00bd07b82bb1903a611b79ce35bc5fef53b5e2ea/docker/Dockerfile.ci#L2C1-L2C26
So that @tzhouam will be able to update the version when he is rebasing against new vLLM version.
@tzhouam what do you think?
There was a problem hiding this comment.
Just replacing vLLM version doesn't work on XPU since we need to copy whole base xpu docker file to here. This is our limitation and can be resolved when there is published vLLM+XPU image in future.
There was a problem hiding this comment.
The Dockerfile shouldn't change in many breaking changes, so I added in a build arg to change the git clone branch. Default behavior should stay the same as before though.
v0.15.0 to v0.16.0 was an exception since this was the version that integrated uv into vLLM's docker/Dockerfile.xpu.
There was a problem hiding this comment.
@tjtanaa , thanks, we've updated to use VLLM_VERSION in dockerfile.xpu
d427b0a to
2571640
Compare
Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
a9c6846 to
6075924
Compare
|
let's get it merged now, if the docs view does not show as expected, lets fix it in a follow-up PR. |
Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Daniel Huang <daniel1.huang@intel.com> Co-authored-by: Daniel Huang <daniel1.huang@intel.com>
* [Frontend][Model] Support batch request with refined OmniDiffusionReq… (#797) Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com> * [Model]: add FLUX.1-dev model (#853) * [BugFix] ignore mm data from stages to async omni (#954) Signed-off-by: dengyunyang <584797741@qq.com> * Revert "[BugFix] ignore mm data from stages to async omni" (#1023) * [Bugfix] Modify output to model_runner_output (#1026) Signed-off-by: gcanlin <canlinguosdu@gmail.com> * [Feature] Support cache-dit for Wan 2.2 inference (#1021) Signed-off-by: samithuang <285365963@qq.com> Signed-off-by: Samit <285365963@qq.com> * [Doc]Format profiling doc (#993) Signed-off-by: lishunyang <lishunyang12@163.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * [Hardware] Support platforms and plugin system (#774) Signed-off-by: gcanlin <canlinguosdu@gmail.com> * [Core]: KV Cache Transfer Encapsulation (#979) Signed-off-by: princepride <wangzhipeng628@gmail.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com> * [Test]Delete skip mark for amd ci test and fix CI failure (#927) Signed-off-by: wangyu31577 <wangyu31577@hundsun.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: wangyu31577 <wangyu31577@hundsun.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Bugfix][Doc]Specify Qwen3-TTS model name for each task type (#1036) Signed-off-by: Kyle Huang <yellowsea@gmail.com> * [Misc] pin version of fa3-fwd (#1051) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> * [CI] [ROCm] Add more AMD CI tests (#1039) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> * [Bugfix] fix qwen image layerd in dummy run (#1027) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> * [BugFix] Fix noisy output without setting a seed in Qwen Image (#1043) Signed-off-by: natureofnature <wzliu@connect.hku.hk> * [bugfix] remove vllm speech route (#1060) Signed-off-by: linyueqian <linyueqian@outlook.com> * [Debug] Update GLM-Image Pipeline (#1049) Co-authored-by: root <root@hk01dgx028.cm.cluster> * [Diffusion][Bugfix] Fix the flash_attn backends selection logic (#983) Signed-off-by: mxuax <mxuax@connect.ust.hk> Signed-off-by: XU Mingshi <91017482+mxuax@users.noreply.github.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [BugFix] Fix the accuracy issue of multimodal input. (#1020) Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com> Co-authored-by: Rein Yang <ruiruyang2@gmail.com> * [Bugfix] Set VaeImageProcessor `do_convert_rgb` True (#1032) Signed-off-by: gcanlin <canlinguosdu@gmail.com> * [feat]: adapt batch request for flux (#1028) Signed-off-by: wuzhongjian wuzhongjian_yewu@cmss.chinamobile.com * [CI] Change Qwen3 Omni stage placement strategy (#1072) Signed-off-by: ZeldaHuang <hzm414167@alibaba-inc.com> * [BugFix] Fix to use correct attn backend (#1038) Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com> * [Perf] Qwen3 Omni talker mtp optimization (#1005) Signed-off-by: ZeldaHuang <hzm414167@alibaba-inc.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Wan2.2] Optimize memory usage with conditional transformer loading (#980) Signed-off-by: Lin, Fanli <fanli.lin@intel.com> Signed-off-by: Samit <285365963@qq.com> Co-authored-by: Samit <285365963@qq.com> * [Feat] Support XPU Backend in vLLM-Omni (#191) Signed-off-by: Fanli Lin <fanli.lin@intel.com> Signed-off-by: Fanli Lin <fanli0116@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * [Fix] stabilize diffusion images LoRA E2E across CI drift (#1075) Signed-off-by: dongbo910220 <1275604947@qq.com> * [Bugfix][Test] Re-enable the log simple tests (#1065) Signed-off-by: gcanlin <canlinguosdu@gmail.com> * [Bugfix] pr conflict fix, bugfix ignore mm data from stages to async omni (#1025) Signed-off-by: dengyunyang <584797741@qq.com> * [Doc][Bagel] Add BAGEL-7B-MoT documentation and edit the default stage configuration (#987) Signed-off-by: Ding Zuhao <e1583181@u.nus.edu> Signed-off-by: jzz <e1583181@u.nus.edu> * [Fix] Increase max wait time for server readiness to accommodate model loading (#1089) Signed-off-by: Andy Zhou <46011930+AndyZhou952@users.noreply.github.com> * [Benchmark] Add vLLM-Omni Omni model online benchmark (#780) Signed-off-by: wangyu31577 <wangyu31577@hundsun.com> Signed-off-by: wangyu <53896905+yenuo26@users.noreply.github.com> Co-authored-by: wangyu31577 <wangyu31577@hundsun.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Bugfix] Remove Mooncake/Yuanrong connector import warning (#1091) Signed-off-by: natureofnature <wzliu@connect.hku.hk> * fix: UnboundLocalError for role in streaming audio/image responses (#784) Signed-off-by: Pierre Le Guen <26087574+PierreLeGuen@users.noreply.github.com> * [Misc] update wechat image (#1096) * [Feature] Support DiT Layerwise (Blockwise) CPU Offloading (#858) Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * [BugFix] Modify max_tokens and modify the log and fix #1103 (#1097) Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [BugFix] Fix modulate_index shape error in Qwen-Image-Edit Task (#1100) Signed-off-by: mxuax <mxuax@connect.ust.hk> Signed-off-by: XU Mingshi <91017482+mxuax@users.noreply.github.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Platform] Add supports_torch_inductor interface (#1108) Signed-off-by: gcanlin <canlinguosdu@gmail.com> * [BugFix] Fix Qwen3 Omni talker mtp torch.compile startup error (#1104) Signed-off-by: ram16g <anlianfengjie@163.com> Signed-off-by: ZeldaHuang <hzm414167@alibaba-inc.com> Co-authored-by: ram16g <anlianfengjie@163.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Bugfix] fix request_id of image generation in api server (#1112) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Perf]: CFG parallel abstraction (#851) Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [BugFix] Fix Qwen3 TTS 0.6B profile run hang (#995) (#1082) * [CI] [ROCm] Quick fix amd ci (#1116) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> * [Bugfix] fix benchmark audio timing error and add benchmark test (#1109) Signed-off-by: wangyu31577 <wangyu31577@hundsun.com> Co-authored-by: wangyu31577 <wangyu31577@hundsun.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Bugfix][Qwen3TTS] Load speaker_id/voices from model configuration (#1079) Signed-off-by: pablo <juanz9312@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com> * [NPU] Align with GPUModelRunner (#1114) Signed-off-by: gcanlin <canlinguosdu@gmail.com> * [FEATURE] /v1/images/edit interface (#1101) Signed-off-by: dengyunyang <584797741@qq.com> * [Bugfix] Fix NPU SDPA attention mask shape and semantics (#1031) Signed-off-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: muziyuhui666 <111362884+muziyuhui666@users.noreply.github.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [TeaCache]: Add Coefficient Estimation (#940) Signed-off-by: princepride <wangzhipeng628@gmail.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [CI]: Bagel E2E Smoked Test (#1074) Signed-off-by: princepride <wangzhipeng628@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Misc] Bump version to 0.14.0 (#1128) Signed-off-by: Roger Wang <hey@rogerw.io> * [Doc] First stable release of vLLM-Omni (#1129) Signed-off-by: Roger Wang <hey@rogerw.io> * [Misc] Align error handling with upstream vLLM v0.14.0 (#1122) Signed-off-by: anna <lee.anna@navercorp.com> Co-authored-by: anna <lee.anna@navercorp.com> * [Feature] add Tensor Parallelism to LongCat-Image(-Edit) (#926) Signed-off-by: Rustam Khadipash <16683750+hadipash@users.noreply.github.com> * [CI] Temporarily remove slow tests. (#1143) Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com> Signed-off-by: princepride <wangzhipeng628@gmail.com> Co-authored-by: princepride <wangzhipeng628@gmail.com> * [CI] Refactor test_sequence_parallel.py and add a warmup run for more accurate performance stat (#1165) Signed-off-by: mxuax <mxuax@connect.ust.hk> Signed-off-by: XU Mingshi <91017482+mxuax@users.noreply.github.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * Dev/rebase v0.15.0 (#1159) Signed-off-by: Taichang Zhou <tzhouam@connect.ust.hk> Signed-off-by: tzhouam <tzhouam@connect.ust.hk> Signed-off-by: princepride <wangzhipeng628@gmail.com> Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com> * Docs update paper link (#1169) Signed-off-by: hsliu <liuhongsheng4@huawei.com> Signed-off-by: hsliu_ustc <hsliu_ustc@noreply.gitcode.com> Co-authored-by: hsliu_ustc <hsliu_ustc@noreply.gitcode.com> * [Debug] Clear Dockerfile.ci to accelerate build image (#1172) Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * [Debug] Correct Unreasonable Long Timeout (#1175) Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * [Doc]Fix - Align with repo. (#1176) Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com> * [Bugfix][Qwen-Image-Edit] Add a warning log for none negative_prompt (#1170) Signed-off-by: gcanlin <canlinguosdu@gmail.com> * [Bugfix] fix qwen image oom (#1168) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> * [Hardware] Disable compile of diffusion on XPU (#1148) Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> * [Doc] Fix vLLM version in user docs (#1179) Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com> * [Refactor] Refactor async chunk and fix the shape mismatch issue (#1151) Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com> * bugfix: /images/edits endpoint fails pipeline data format check (#1141) Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Perf] resolving prolonged `cudastreamsynchronize` execution in z image processing (#1105) Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * [Bugfix] modify RTF use audio_e2e/audio_duration (#1157) Signed-off-by: wangyu31577 <wangyu31577@hundsun.com> Co-authored-by: wangyu31577 <wangyu31577@hundsun.com> * [Doc] Highlight paper & slides. (#1186) Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com> * [chore] Remove zmq context initialize (#1187) Signed-off-by: xiedeyantu <czjourney@163.com> * [NPU] Update Dockerfile and docs for v0.14.0 (#671) Signed-off-by: gcanlin <canlinguosdu@gmail.com> * [Bugfix] E2E metric incorrect qwen3-omni with async chunk feature (#1018) Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com> Signed-off-by: Junhong Liu <ljh_lbj@163.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Doc] opt doc (#1118) Signed-off-by: David Chen <530634352@qq.com> * [Bugfix] Fix tp+sp accuracy, incorrect process group mapping (#1178) Signed-off-by: David Chen <530634352@qq.com> * [Feature] Enable use_audio_in_video for Qwen 3 Omni Online (#1198) Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * [Bugfix] async_chunk rebase v0.15.0 (#1195) Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com> * [feature]: support flux cache_dit (#1145) Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com> * [CI] Add CI branch coverage calculation, fix statement coverage results and add log before test for buildkite log group (#1120) Signed-off-by: wangyu31577 <wangyu31577@hundsun.com> Co-authored-by: wangyu31577 <wangyu31577@hundsun.com> * [Wan 2.2][Diffusion] Add TP Support (#964) Signed-off-by: weichen <calvin_zhu0210@outlook.com> * [Hardware] [Feat] Setup platform dependent package installation (#1046) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: PopSoda2002 <zhouhp.me@gmail.com> Co-authored-by: gcanlin <canlinguosdu@gmail.com> * [XPU] Fix XPU UTs for basic coverage (#1164) Signed-off-by: Yan Ma <yan.ma@intel.com> * [Test] Add BuildKite test-full script for full CI. (#867) Signed-off-by: wangyu31577 <wangyu31577@hundsun.com> Co-authored-by: wangyu31577 <wangyu31577@hundsun.com> * [Refactor] Reuse upstream Qwen3MoeSparseMoeBlock (#1202) Signed-off-by: gcanlin <canlinguosdu@gmail.com> * [Bugfix] Fix wan2.2 ti2v (#1221) Signed-off-by: mxuax <mxuax@connect.ust.hk> Signed-off-by: XU Mingshi <91017482+mxuax@users.noreply.github.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Bugfix] Fix '--max-generated-image-size' cli args type (#1249) Signed-off-by: ApsarasX <apsarax@outlook.com> * [Bugfix] Ensure seed=0 is correctly handled in image edit (#1248) Signed-off-by: ApsarasX <apsarax@outlook.com> * [Docs] Add example image download step to Image-To-Video examples (#1258) Signed-off-by: lishunyang <lishunyang12@163.com> * [Bugfix] Fix padding bug in 12Hz tokenizer ConvTranspose1d decode (#1241) Signed-off-by: linyueqian <linyueqian@outlook.com> * [bugfix] Fix multimodal_output property to check completion outputs where audio data is attached (#1203) Signed-off-by: linyueqian <linyueqian@outlook.com> * [Doc] Update QA relevant to quantization (#1257) Signed-off-by: lishunyang <lishunyang12@163.com> * [Bugfix] Fix Doc link Rrror (#1263) Signed-off-by: lishunyang <lishunyang12@163.com> * Process-Scoped GPU Memory Accounting (#1204) Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com> * [ComfyUI]: ComfyUI integration (#1113) Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com> * fix: add diffusion offload args to OmniConfig group instead of serve_parser (#1271) Signed-off-by: Chenguang ZHENG <645327136@qq.com> * [Doc] Adding models/pipelines/features Tutorial (#1196) Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: dongbo910220 <32610838+dongbo910220@users.noreply.github.com> * [CI] Add env variable check for nightly CI (#1281) Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com> * [CI] Add pytest markers to current tests and update the doc. (#577) Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Diffusion][Perf] Remove Redundant Communication Cost by Refining SP Hook Design (#1275) Signed-off-by: mxuax <mxuax@connect.ust.hk> Signed-off-by: XU Mingshi <91017482+mxuax@users.noreply.github.com> * [Feature] Opt metrics structure (#891) Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com> Signed-off-by: Junhong Liu <ljh_lbj@163.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Test] Add example test cases for omni online (#1086) Signed-off-by: wangyu31577 <wangyu31577@hundsun.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Signed-off-by: yenuo26 <410167048@qq.com> Co-authored-by: wangyu31577 <wangyu31577@hundsun.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * [CI] Reduce the time for Diffusion Sequence Parallelism Test (#1283) Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com> * [Model] SupportHunyuanImage3 Diffusion Model in vllm-omni (#1085) Signed-off-by: Semmer2 <semmer@live.cn> * [Chore] Update copyright year. (#1256) Signed-off-by: lishunyang <lishunyang12@163.com> * [feature]: support Flux.1-dev CFG-Parallel (#1269) * [Bugfix] Fix 'NoneType' AttributeError in stable-diffusion model detect (#1254) Signed-off-by: Yan Ma <yan.ma@intel.com> * [Doc] Update Qwen3-TTS docs for consistency with Omni examples (#1226) Signed-off-by: linyueqian <linyueqian@outlook.com> Signed-off-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Fix]Ensure HuggingFace downloads complete before initialization. (#1213) Signed-off-by: zhou zhuoxin <zhouzhuoxin1508@outlook.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * [BugFix] Fixed the issue where ignore_eos was not working. (#1286) Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com> * [Test] Add e2e tests for Qwen3-TTS speech endpoint (#1206) Signed-off-by: linyueqian <linyueqian@outlook.com> Signed-off-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com> * [Feat]: support VAE patch parallelism (#756) Signed-off-by: dongbo910220 <1275604947@qq.com> Co-authored-by: hsliuustc0106 <liuhongsheng4@huawei.com> * [CI] Disable Qwen3-TTS E2E Test in pipeline.yml (#1306) Signed-off-by: Gao Han <hgaoaf@connect.ust.hk> * [Misc] Add per-request generator_device to online image gen and edit (#1183) Signed-off-by: gcanlin <canlinguosdu@gmail.com> * [Bagel]: Support TP (#1293) Signed-off-by: princepride <wangzhipeng628@gmail.com> * [Bugfix] Fix image edit RoPE crash when explicit height/width are provided (#1265) Signed-off-by: lishunyang <lishunyang12@163.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Doc] Sync (#1216) Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com> * [Bugfix] fix precision issues of qwen3-omni when enable async_chunk without system prompt (#1288) Signed-off-by: Rein Yang <ruiruyang2@gmail.com> * [Debug] Add trigger to concurrent stage init (#1274) Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * [Bugfix][Qwen3-TTS] Fix task type (#1317) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> * Unifying CLI Argument Naming Style (#1309) Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com> * [Bugfix][Qwen3-TTS] Preserve original model ID in omni_snapshot_download (#1318) * [CI] Run nightly tests. (#1333) Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com> * [Feature]: FP8 Quantization Support for DiT (#1034) Signed-off-by: lishunyang <lishunyang12@163.com> Signed-off-by: SYLAR <125541396+lishunyang12@users.noreply.github.com> * Fix yield token metrics and opt metrics record stats (#1292) * [Test] L2 & L3 Test Case Stratification Design for Omni Model (#1272) Signed-off-by: wangyu31577 <wangyu31577@hundsun.com> Signed-off-by: yenuo26 <410167048@qq.com> Signed-off-by: wangyu <53896905+yenuo26@users.noreply.github.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: wangyu31577 <wangyu31577@hundsun.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Pref] Support Qwen3 Omni code2wav batch infernce with async chunk (#1246) Signed-off-by: ZeldaHuang <hzm414167@alibaba-inc.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Signed-off-by: Ziming Huang <1520787127@qq.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * update qwen3-omni & qwen2.5-onmi openai client (#1304) Signed-off-by: Rein Yang <ruiruyang2@gmail.com> * [Feature] Support Wan2.2 T2V and I2V Online Serving with OpenAI /v1/videos API (#1073) Signed-off-by: samithuang <285365963@qq.com> Signed-off-by: Samit <285365963@qq.com> Signed-off-by: SamitHuang <285365963@qq.com> Co-authored-by: Flora Feng <4florafeng@gmail.com> * [Feature] add Tensor Parallelism to SD_3.5 (#1336) Signed-off-by: GG-li <3226868735@qq.com> * [Feature]async scheduling to overlap chunk IO and compute (#951) Signed-off-by: CHEN <116010019@link.cuhk.edu.cn> Co-authored-by: Bhanu068 <voutharoja.bhanu06@gmail.com> Co-authored-by: Gao Han <gaohan19@huawei.com> * [Bugfix] reused metrics to modify the API Server token statistics in Stream Response (#1301) Signed-off-by: John Liu BUAA <liukecheng97@gmail.com> * Refactor CPU Offloading Backend Pattern (#1223) Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com> Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com> Signed-off-by: Samit <285365963@qq.com> Co-authored-by: Samit <285365963@qq.com> * [DOC] Doc for CI test - Details about five level stucture and some other files. (#1167) Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com> Co-authored-by: yenuo26 <410167048@qq.com> * [Bugfix] remove Tongyi-MAI/Z-Image-Turbo related test from L2 ci (#1348) Signed-off-by: dengyunyang <584797741@qq.com> * [Misc] wechat image update (#1354) Signed-off-by: David Chen <530634352@qq.com> * [Misc] Support WorkerWrapperBase and CustomPipeline for Diffusion Worker (#764) Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com> * [Feature][Bugfix] Add CFG feature to Bagel (#1310) Signed-off-by: Ding Zuhao <e1583181@u.nus.edu> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com> Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com> * [Feature]: Diffusion sleep to use process level memory calculation (#1276) Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com> Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com> Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com> * change qwen3-omni open cudagraph by default (#1352) Signed-off-by: Rein Yang <ruiruyang2@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [XPU] Update Bagel's flash_attn_varlen_func to fa utils (#1295) Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> * [Test] Add Omni Model Performance Benchmark Test (#1321) Signed-off-by: yenuo26 <410167048@qq.com> Signed-off-by: wangyu <53896905+yenuo26@users.noreply.github.com> * [BugFix]: Revert utils change (#1369) Signed-off-by: princepride <wangzhipeng628@gmail.com> * [Rebase] Rebase to vllm v0.16.0 (#1357) Signed-off-by: Taichang Zhou <tzhouam@connect.ust.hk> Signed-off-by: tzhouam <tzhouam@connect.ust.hk> Signed-off-by: princepride <wangzhipeng628@gmail.com> Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Isotr0py <Isotr0py@outlook.com> Co-authored-by: ZJY0516 <zhu.jiangyun@foxmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com> * [Test] Fix expansion and example test case for qwen3-omni (#1358) Signed-off-by: yenuo26 <410167048@qq.com> * [v0.16.0][BUG FIX]Fix hunyuan MOE after update to 0.16.0 (#1401) Signed-off-by: Chendi Xue <chendi.xue@intel.com> * [0.16.0] remove cuda hard-code for Hunyuan Image3 (#1402) Signed-off-by: Chendi Xue <chendi.xue@intel.com> * [XPU] Add XPU Dockerfile and related docs (#1162) Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Daniel Huang <daniel1.huang@intel.com> Co-authored-by: Daniel Huang <daniel1.huang@intel.com> * [Bugfix] Fix Hardcoded Datatypes in Z-image (#1393) Signed-off-by: Alex Brooks <albrooks@redhat.com> * [Feature] : Support disaggregated inference pipeline for Qwen3_TTS (#1161) Signed-off-by: Sy03 <1370724210@qq.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Feature] Add automated PR reviewer bot with GLM integration (#1424) Signed-off-by: hsliu <liuhongsheng4@huawei.com> Signed-off-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * [Misc] Add Qwen2.5-Omni-3B model support to Gradio demo (#1382) Signed-off-by: UsamaKenway <usamakenway@gmail.com> * [misc] Feature/pr reviewer auto trigger&update model (#1431) Signed-off-by: hsliu <liuhongsheng4@huawei.com> Signed-off-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Hunter Liu <hunter@liu.sh> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * Revert "[misc] Feature/pr reviewer auto trigger&update model" (#1432) * [Doc] Update GPU installation commands (#1434) * [ROCM] [CI] fix dockerfile.rocm to support nightly build and also fix amd ci v0.16.0rc1 (#1380) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> * [Feature][BAGEL] Combine multi-branch cfg into a single batch to accelerate inference. (#1429) Signed-off-by: Ding Zuhao <e1583181@u.nus.edu> Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com> * [Feat]: add ASCII art logo for vLLM-Omni (#1430) * [Bug] [Bagel] Fix kv transfer bug (#1437) Signed-off-by: Ding Zuhao <e1583181@u.nus.edu> Co-authored-by: Wang Zhipeng: princepride <wangzhipeng628@gmail.com> * [CI] Set L2 & L3 tests running conditions. (#1344) Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com> * [Feature] vLLM-Omni RDMA connector (#1019) Signed-off-by: natureofnature <wzliu@connect.hku.hk> * [Minor][Refactor] Pass seq_token_counts explicitly (#1425) Signed-off-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Misc] Extend Diffusion Benchmark script to other backends (#875) Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Feature] Support Stage Based Deployment CLI (#939) Signed-off-by: wuhang <wuhang6@huawei.com> Signed-off-by: princepride <wangzhipeng628@gmail.com> Signed-off-by: wuhang <whlbx@hotmail.com> Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * [Doc] Optimize vLLM-Omni metrics documentation (#1311) Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com> Signed-off-by: Junhong Liu <ljh_lbj@163.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Bugfix] Forward all vllm-omni serve command parameters to model (#985) Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com> Signed-off-by: Junhong Liu <ljh_lbj@163.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Doc]: Add bagel single/multi node usage with mooncake document (#1450) * [Qwen3TTS][Feat] Code2Wav batched decoding (#1426) Signed-off-by: pablo <pablo@agigo.ai> Co-authored-by: pablo <pablo@agigo.ai> * [CI] Remove overwhelming debug log (#1463) Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * [Misc] update wechat image (#1464) Signed-off-by: David Chen <530634352@qq.com> * [Doc] Refine Diffusion Tutorial Documents (#1305) Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com> * [Bugfix] Robust Audio Data Handling in _create_audio_choice (#1222) Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com> * [Bugfix]: Fix merging updated additional information to ensure dict type (#1296) Signed-off-by: Shijin Zhang <75300765+Dovis01@users.noreply.github.com> * [Model]Add new nextstep_1(Diffusion) model(only T2I) (#612) Signed-off-by: Dong Wang <dongw2019@gmail.com> Signed-off-by: sniper35 <dongw2019@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Bugfix] Add TTS configuration options (#1177) Signed-off-by: Yanick Schraner <yanick.schraner@bs.ch> * [Debug] Multi-Request for Qwen 3 Omni use_audio_in_video (#1433) Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * [Bugfix] Fix case-sensitive task_type matching in Qwen3TTSModelForGeneration (#1455) Signed-off-by: Sangchun Ha <seomk9896@gmail.com> * [BugFix] process request.num_cached_tokens if it equals to the initial value (#1468) Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com> Co-authored-by: Gao Han <gaohan19@huawei.com> * [Bugfix] Fix SDPA attention mask dtype and shape (Fix #857) (#1349) Signed-off-by: jader <yjader@foxmail.com> * [Test] Reduce Perf test case and fix modify stage config (#1449) Signed-off-by: yenuo26 <410167048@qq.com> * [NPU] Upgrade to v0.16.0 (#1375) Signed-off-by: gcanlin <canlinguosdu@gmail.com> * [CI] Update Dockerfile for vllm-omni CI image and remove obsolete dep… (#1491) Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * [Fix][Chore] Qwen3-TTS Modeling Minor Code Sanity Improvements (#1482) Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com> * [Bugfix] Fix tuple/list KV cache extraction crash (#1405) Signed-off-by: junuxyz <216036880+junuxyz@users.noreply.github.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Doc] format lora related docs for the user's end (#1009) Signed-off-by: AndyZhou952 <jzhoubc@connect.ust.hk> Signed-off-by: Andy Zhou <46011930+AndyZhou952@users.noreply.github.com> * [Feature] Support Wan2.2 output with irregular shapes (#1279) Signed-off-by: gcanlin <canlinguosdu@gmail.com> * [Misc] Migrate L1 tests to use pytest-mock (#1315) Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com> Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com> * [Bugfix] Fix LoRA Scaling on Active Adapters (#1421) Signed-off-by: Alex Brooks <albrooks@redhat.com> * [Bugfix] fix record audio generated frame in offline infer (#1312) Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com> Signed-off-by: Junhong Liu <ljh_lbj@163.com> * [Model] Support OmniGen2 (#513) Signed-off-by: Yupu <feng.yu.pu0330@gmail.com> * [Bugfix][Qwen3TTS] (#1289) Signed-off-by: pablo <juanz9312@gmail.com> Co-authored-by: Gao Han <gaohan19@huawei.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * Use pull through cache image for H100 pool (#1518) Signed-off-by: Kevin H. Luu <khluu000@gmail.com> * [ROCm] [CI] [Docker] Point to use the latest vLLM v0.16.0 stable version (#1500) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> * [Bugfix] fix offline text_to_image error from #1009 (#1515) Signed-off-by: David Chen <530634352@qq.com> * [XPU] Enable FLASH_ATTN on XPU (#1332) Signed-off-by: Yan Ma <yan.ma@intel.com> * Revert gpu_1 job to use regular image (#1521) Signed-off-by: Kevin H. Luu <khluu000@gmail.com> * [Chore] remove unused logger in omni_diffusion (#531) (#1509) Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com> Co-authored-by: Gao Han <gaohan19@huawei.com> * [Qwen3TTS][Feat] Streaming output (#1438) Signed-off-by: pablo <pablo@agigo.ai> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: pablo <pablo@agigo.ai> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Bugfix] Race condition in MultiprocExecutor when concurent access to Scheduler (#1448) Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Doc][Test][Misc] ComfyUI test, more screenshot, and code cleaning (#1435) Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com> Signed-off-by: Samit <285365963@qq.com> Co-authored-by: Samit <285365963@qq.com> * [Performance]Qwen3-Omni performance optimization (#1378) Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com> * [Feature] Support HSDP for diffusion models (#1339) Signed-off-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [CI] fixed CI timeout (#1460) Signed-off-by: zhumingjue <zhumingjue@huawei.com> Signed-off-by: zhumingjue138 <zhumingjue@huawei.com> * [Bugfix] Use uds for zmq address if not set --stage-id (#1522) Signed-off-by: wuhang <wuhang6@huawei.com> * [BugFix] Restore talker's config (#1524) Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Canlin Guo <961750412@qq.com> * [XPU] fix qwen_omni after rebase to v0.16.0 (#1416) Signed-off-by: Chendi Xue <chendi.xue@intel.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Platform] Enable layerwise offload on all hardware (#1492) Signed-off-by: gcanlin <canlinguosdu@gmail.com> * diffusion: enable VAE patch parallel for SD3.5 (#1428) Signed-off-by: dongbo910220 <1275604947@qq.com> * [Perf] GLM Image (#920) Signed-off-by: JaredforReal <w13431838023@gmail.com> Signed-off-by: Jared Wen <w13431838023@gmail.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [skip ci][Doc] add design docs for async chunk in qwen3-omni (#962) Signed-off-by: Rein Yang <ruiruyang2@gmail.com> * feat(qwen3-tts): Add CUDA Graph support for speech tokenizer decoder (#1205) Signed-off-by: xulusjb <fdukeshik@gmail.com> Co-authored-by: xulusjb <fdukeshik@gmail.com> * [New Model]: XiaomiMiMo/MiMo-Audio-7B-Instruct support (#750) Signed-off-by: wangyu31577 <wangyu31577@hundsun.com> Signed-off-by: 齐保元 <qibaoyuan@xiaomi.com> Signed-off-by: hsliu <liuhongsheng4@huawei.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Signed-off-by: GG-li <3226868735@qq.com> Signed-off-by: Sihao Li <111170255+GG-li@users.noreply.github.com> Signed-off-by: XU Mingshi <91017482+mxuax@users.noreply.github.com> Signed-off-by: mxuax <mxuax@connect.ust.hk> Signed-off-by: Baoyuan Qi <qibaoyuan@126.com> Signed-off-by: gcanlin <canlinguosdu@gmail.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com> Signed-off-by: dongbo910220 <1275604947@qq.com> Signed-off-by: dongbo910220 <32610838+dongbo910220@users.noreply.github.com> Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com> Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Signed-off-by: baoyuan qi <qibaoyuan@126.com> Signed-off-by: tzhouam <tzhouam@connect.ust.hk> Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: Prajwal A <prajwalanagani@gmail.com> Signed-off-by: Shijin Zhang <75300765+Dovis01@users.noreply.github.com> Signed-off-by: 丁宁 <nndding@gmail.com> Signed-off-by: SHIJIN ZHANG <75300765+Dovis01@users.noreply.github.com> Signed-off-by: dingning<dingning7@xiaomi.com> Signed-off-by: dingning <dingning7@xiaomi.com> Signed-off-by: dingning <dingning@xiaomi.com> Co-authored-by: wangyu <53896905+yenuo26@users.noreply.github.com> Co-authored-by: wangyu31577 <wangyu31577@hundsun.com> Co-authored-by: Zhang Shijin <zhangshijin@xiaomi.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Sihao Li <111170255+GG-li@users.noreply.github.com> Co-authored-by: XU Mingshi <91017482+mxuax@users.noreply.github.com> Co-authored-by: Canlin Guo <canlinguosdu@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: JohnJan <wuzhongjian_yewu@cmss.chinamobile.com> Co-authored-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com> Co-authored-by: dongbo910220 <32610838+dongbo910220@users.noreply.github.com> Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com> Co-authored-by: Junhong Liu <ljh_lbj@163.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com> Co-authored-by: shijin zhang <zsj1364226740@gmail.com> Co-authored-by: Zhou Taichang <tzhouam@connect.ust.hk> Co-authored-by: root <root@hk01dgx028.cm.cluster> Co-authored-by: Prajwal A <34590600+LawJarp-A@users.noreply.github.com> Co-authored-by: Shijin Zhang <75300765+Dovis01@users.noreply.github.com> Co-authored-by: dingning <dingning7@xiaomi.com> Co-authored-by: ning ding <nndding@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * [Feature]: Native GGUF Quantization Support for DiT (#1285) Signed-off-by: David Chen <530634352@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * Add benchmark for `v1/audio/speech` non-streaming (#1408) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> * [Version] Auto generate version using `setuptool_scm` (#1224) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> * [Feat] : Support Async chunk cleanup (#1087) Signed-off-by: Sy03 <1370724210@qq.com> * [Profiler] Support online profiling (#1136) Signed-off-by: gcanlin <canlinguosdu@gmail.com> Signed-off-by: Canlin Guo <961750412@qq.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com> * [Bugfix] Fix redundant finished req status updating on OmniGenerationScheduler (#1510) Signed-off-by: shijin zhang <75300765+Dovis01@users.noreply.github.com> Co-authored-by: 齐保元 <qibaoyuan@xiaomi.com> * [XPU][NPU][ROCM] enable cpu_offloading flag for non_cuda (#1488) Signed-off-by: gcanlin <canlinguosdu@gmail.com> Signed-off-by: Chendi Xue <chendi.xue@intel.com> Co-authored-by: gcanlin <canlinguosdu@gmail.com> * [Chore] Cleanup dead code in GGUF DiT code path (#1533) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> * [Doc] Update installation instructions for vllm 0.16.0 (#1505) Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * [Doc] [skip ci]Sync. (#1363) Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com> Co-authored-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com> * [CI][skip ci]Update H100 image link based on #1518 (#1538) Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com> * Fix no embed text spk tokens (#1540) Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com> * [Debug] Merge vllm pull 35368 (#1534) Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * [Docs] update async chunk docs diagram [skip ci] (#1530) Signed-off-by: Rein Yang <ruiruyang2@gmail.com> * fix(qwen3-tts): fix Base ICL voice clone producing corrupted audio (#1554) Signed-off-by: linyueqian <linyueqian@outlook.com> * [NPU][Bugfix] Align GPU side and recover qwen3-tts (#1564) Signed-off-by: gcanlin <canlinguosdu@gmail.com> * [BugFix] Fix unexpected crash when init OmniDiffusion (#1562) Signed-off-by: Semmer2 <semmer@live.cn> * [CI] Modify some CI test cases to run on L4 environment to reduce H100 resource usage. (#1543) Signed-off-by: yenuo26 <410167048@qq.com> Signed-off-by: wangyu <53896905+yenuo26@users.noreply.github.com> * [BugFix]: fix a lot of bug (#1565) Signed-off-by: princepride <wangzhipeng628@gmail.com> * feat: add HyperCLOVAX-SEED-Omni-8B support Model files: - vllm_omni/diffusion/models/hyperclovax_vision/: vision decoder pipeline (HyperCLOVAXVisionPipeline) using flow matching diffusion + VisionTransformer - vllm_omni/diffusion/models/hyperclovax_audio/: audio decoder pipeline (HyperCLOVAXAudioPipeline) using Unit-BigVGAN codec - vllm_omni/model_executor/stage_input_processors/hyperclovax_seed_omni.py: thinker2vision_decoder and thinker2audio_decoder — extract discrete tokens from LLM output; truncate/pad vision codes to 729 (27x27) for decoder Registry: - vllm_omni/diffusion/registry.py: register HyperCLOVAXVisionPipeline and HyperCLOVAXAudioPipeline with post-process functions Stage config: - vllm_omni/model_executor/stage_configs/hcx_omni.yaml: 3-stage config Stage 0: LLM thinker (TP=4, GPUs 0-3), Stage 1: vision decoder (GPU 4), Stage 2: audio decoder (GPU 5) Bug fixes for HyperCLOVAX compatibility: - diffusion/request.py: add extra dict field to OmniDiffusionRequest so vision_tokens/audio_tokens from stage input processors reach the pipeline - entrypoints/async_omni_diffusion.py: extract OmniTokensPrompt.additional_information into OmniDiffusionRequest.extra before creating request - entrypoints/omni_stage.py: skip empty engine inputs (text-only requests where thinker2vision_decoder/thinker2audio_decoder return []) - entrypoints/async_omni.py: handle skipped sentinel in _process_single_result so text-only requests complete without crashing on Stage 1/2 * fix: correct decoder params and HCX porting fixes - hcx_omni.yaml: guidance_scale 3.5→0.75, num_inference_steps 30→50 (matches OmniServe production defaults; 3.5 caused over-amplified autoguidance → shrunken/degraded output images) - omni_stage.py: skip empty engine inputs for text-only requests - async_omni_diffusion.py: extract OmniTokensPrompt.additional_information into OmniDiffusionRequest.extra (audio_tokens/vision_tokens) - registry.py: HCX Omni diffusion model registration fix Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: HyperCLOVAX-SEED-Omni-8B stage pipeline and entrypoint fixes * fix: change guidance_scale from 9.0 to 0.75 (autoguidance scale, OmniServe default) * feat: add audio decoder Stage 2 to hcx_omni pipeline - Wire HyperCLOVAXAudioPipeline as Stage 2 in hcx_omni.yaml - GPU 5 assigned for audio decoder (Unit-BigVGAN / NCCosybigvganDecoder) - Add runtime edge 0->2 (thinker -> audio decoder) - Implement post-generation PCM chunk streaming for audio output (4800 samples / 200ms per SSE event @ 24kHz, int16 base64-encoded) Refs: github.com/vllm-project/vllm-omni/pull/869 (already incorporated) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: vllm version compatibility for HyperCLOVAX audio decoder startup - config/model.py: try/except fallback for AttentionBackendEnum import (vllm.v1.attention.backends.registry absent in older vllm builds) - pipeline_hyperclovax_audio.py: return actual named_parameters() from load_weights() when using MAR checkpoint so diffusers_loader strict check passes (weights loaded eagerly in __init__ via MAR extraction) - qwen3_omni_moe_thinker.py, qwen2_5_omni_thinker.py: try/except stubs for check_interleaved_audio_video and merge_interleaved_embeddings which are absent in older vllm qwen2_5_omni_thinker; these symbols are only exercised by Qwen models, not HyperCLOVAX Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: add edge 1→2 and correct model key in hcx_omni.yaml Stage 2 - Add runtime edge from:1 to:2 (required for Stage-2 connector init; without it AsyncOrchestrator cannot route to audio decoder at runtime) - Change model_subdir to model for Stage-2 engine_args to match total-poc working reference config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: audio S2S output - handle diffusion outputs in _create_audio_choice HyperCLOVAXAudioPipeline (diffusion) stores audio in multimodal_output directly (OmniRequestOutput.from_diffusion), not in outputs[0].multimodal_output like LLM pipelines. Fix three locations: 1. _create_audio_choice (non-streaming): use omni_outputs.multimodal_output when final_res.outputs is empty (diffusion path). 2. Streaming audio path: same fix for _final_res.outputs[0]. 3. Both loops (for output in final_res.outputs): fall back to single synthetic choice at index 0 when outputs list is empty. 4. Handle bytes audio output from HyperCLOVAXAudioPipeline post-process (returns WAV bytes, not tensors like Qwen3-Omni). Also fixes audio input (A2T) regression: skip diffusion prompt extraction when mm_data has audio content (added in previous session). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: parse WAV bytes with soundfile for uniform PCM chunk streaming HyperCLOVAXAudioPipeline returns WAV bytes including 44-byte header. The previous byte-offset splitting included the header in the first chunk, corrupting it. Fix: parse with soundfile to get float32 PCM, then convert to int16 chunks uniformly regardless of source type (bytes or tensor). Verified: 136 audio chunks x 200ms = 27.04s audio streamed correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: zero-shot TTS with speaker embedding from input audio - serving_chat.py: extract last input_audio base64 from request messages and inject as ref_audio_b64 into engine_prompt dict - thinker2audio_decoder: read ref_audio_b64 from prompt and pass as ref_audio_tokens to Stage 2 (HyperCLOVAXAudioPipeline) - hcx_omni.yaml: switch Stage 2 to NCZSCosybigvganDecoder.mar (zero-shot) which uses ECAPA-TDNN speaker encoder instead of finetuned ID lookup Pipeline: input audio -> ECAPA-TDNN -> speaker embedding -> BigVGAN synthesis matching the voice characteristics of the original speaker. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: wire audio decoder Stage 2 to hcx_omni pipeline and fix S2S flow - Add Stage 2 (HyperCLOVAXAudioPipeline / NCZSCosybigvganDecoder) to hcx_omni.yaml with GPU 5, gpu_memory_utilization 0.4, edge 0->2 from thinker - Fix thinker2audio_decoder: correct audio token range (128606-135167), remap to [0, 6561) for BigVGAN input, handle empty token case gracefully - Fix pipeline_hyperclovax_audio.py post_process_func signature and incorporate PR#869 BUG FIX patches for stable audio generation * fix: use finetuned audio decoder and fix transformers_modules deserialization - hcx_omni.yaml: switch Stage 2 from NCZSCosybigvganDecoder (zero-shot, ECAPA-TDNN) to NCCosybigvganDecoder (finetuned, nn.Embedding speaker id). Zero-shot decoder required ref_audio (mel spectrogram) which is unavailable for text-only requests and incompatible with finetuned decoder path. - pipeline_hyperclovax_audio.py: guard ref_audio processing with 'not self.bigvgan.finetune' — finetuned decoder has no ECAPA-TDNN encoder, so passing ref_audio bytes would crash with 'expected 100 channels'. - omni_stage.py: add HuggingFace modules cache (~/.cache/huggingface/modules) to sys.path before queue.get_nowait() in try_collect(). Stage-0 pickles outputs containing custom classes from transformers_modules (trust_remote_code), but the API server process doesn't have this path, causing deserialization failures that silently drop Stage-0 outputs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: restore zero-shot speaker cloning with fallback for text-only requests - hcx_omni.yaml: revert to NCZSCosybigvganDecoder.mar (zero-shot ECAPA-TDNN) for voice-preserving S2S synthesis. NCCosybigvganDecoder used a fixed integer speaker_id and lost the input speaker's voice. - pipeline_hyperclovax_audio.py: add zero-mel fallback branch for finetune=False + ref_audio=None case. When a text-only request arrives (no input audio → no ref_audio), ECAPA-TDNN receives a zero mel tensor [1, num_mels, 64] instead of crashing with 'expected 100 channels'. S2S requests always have ref_audio so the zero-shot cloning path is unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: add stage config yaml for HCX audio decoder Signed-off-by: Hyunjoon Jeong <hyunjoon.jeong@navercorp.com> * feat: add HyperCLOVAX-SEED-Omni 8B model as vllm-omni executor Signed-off-by: Hyunjoon Jeong <hyunjoon.jeong@navercorp.com> * feat: add HCX audio decoder pipeline Signed-off-by: Hyunjoon Jeong <hyunjoon.jeong@navercorp.com> * fix: modify exception for HCX audio decoder (GAN) Signed-off-by: Hyunjoon Jeong <hyunjoon.jeong@navercorp.com> * fix: default temperature set to 0, and pipeline model evaluation mode Signed-off-by: Hyunjoon Jeong <hyunjoon.jeong@navercorp.com> --------- Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com> Signed-off-by: dengyunyang <584797741@qq.com> Signed-off-by: gcanlin <canlinguosdu@gmail.com> Signed-off-by: samithuang <285365963@qq.com> Signed-off-by: Samit <285365963@qq.com> Signed-off-by: lishunyang <lishunyang12@163.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Signed-off-by: princepride <wangzhipeng628@gmail.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com> Signed-off-by: wangyu31577 <wangyu31577@hundsun.com> Signed-off-by: Kyle Huang <yellowsea@gmail.com> Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Signed-off-by: natureofnature <wzliu@connect.hku.hk> Signed-off-by: linyueqian <linyueqian@outlook.com> Signed-off-by: mxuax <mxuax@connect.ust.hk> Signed-off-by: XU Mingshi <91017482+mxuax@users.noreply.github.com> Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com> Signed-off-by: wuzhongjian wuzhongjian_yewu@cmss.chinamobile.com Signed-off-by: ZeldaHuang <hzm414167@alibaba-inc.com> Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com> Signed-off-by: Lin, Fanli <fanli.lin@intel.com> Signed-off-by: Fanli Lin <fanli.lin@intel.com> Signed-off-by: Fanli Lin <fanli0116@gmail.com> Signed-off-by: dongbo910220 <1275604947@qq.com> Signed-off-by: Ding Zuhao <e1583181@u.nus.edu> Signed-off-by: jzz <e1583181@u.nus.edu> Signed-off-by: Andy Zhou <46011930+AndyZhou952@users.noreply.github.com> Signed-off-by: wangyu <53896905+yenuo26@users.noreply.github.com> Signed-off-by: Pierre Le Guen <26087574+PierreLeGuen@users.noreply.github.com> Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com> Signed-off-by: ram16g <anlianfengjie@163.com> Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com> Signed-off-by: pablo <juanz9312@gmail.com> Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: anna <lee.anna@navercorp.com> Signed-off-by: Rustam Khadipash <16683750+hadipash@users.noreply.github.com> Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com> Signed-off-by: Taichang Zhou <tzhouam@connect.ust.hk> Signed-off-by: tzhouam <tzhouam@connect.ust.hk> Signed-off-by: hsliu <liuhongsheng4@huawei.com> Signed-off-by: hsliu_ustc <hsliu_ustc@noreply.gitcode.com> Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com> Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com> Signed-off-by: xiedeyantu <czjourney@163.com> Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com> Signed-off-by: Junhong Liu <ljh_lbj@163.com> Signed-off-by: David Chen <530634352@qq.com> Signed-off-by: weichen <calvin_zhu0210@outlook.com> Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: ApsarasX <apsarax@outlook.com> Signed-off-by: Chenguang ZHENG <645327136@qq.com> Signed-off-by: yenuo26 <410167048@qq.com> Signed-off-by: Semmer2 <semmer@live.cn> Signed-off-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com> Signed-off-by: zhou zhuoxin <zhouzhuoxin1508@outlook.com> Signed-off-by: Gao Han <hgaoaf@connect.ust.hk> Signed-off-by: Rein Yang <ruiruyang2@gmail.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Signed-off-by: SYLAR <125541396+lishunyang12@users.noreply.github.com> Signed-off-by: Ziming Huang <1520787127@qq.com> Signed-off-by: SamitHuang <285365963@qq.com> Signed-off-by: GG-li <3226868735@qq.com> Signed-off-by: CHEN <116010019@link.cuhk.edu.cn> Signed-off-by: John Liu BUAA <liukecheng97@gmail.com> Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com> Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com> Signed-off-by: Chendi Xue <chendi.xue@intel.com> Signed-off-by: Daniel Huang <daniel1.huang@intel.com> Signed-off-by: Alex Brooks <albrooks@redhat.com> Signed-off-by: Sy03 <1370724210@qq.com> Signed-off-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: UsamaKenway <usamakenway@gmail.com> Signed-off-by: Hunter Liu <hunter@liu.sh> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: wuhang <wuhang6@huawei.com> Signed-off-by: wuhang <whlbx@hotmail.com> Signed-off-by: pablo <pablo@agigo.ai> Signed-off-by: Shijin Zhang <75300765+Dovis01@users.noreply.github.com> Signed-off-by: Dong Wang <dongw2019@gmail.com> Signed-off-by: sniper35 <dongw2019@gmail.com> Signed-off-by: Yanick Schraner <yanick.schraner@bs.ch> Signed-off-by: Sangchun Ha <seomk9896@gmail.com> Signed-off-by: jader <yjader@foxmail.com> Signed-off-by: junuxyz <216036880+junuxyz@users.noreply.github.com> Signed-off-by: AndyZhou952 <jzhoubc@connect.ust.hk> Signed-off-by: Yupu <feng.yu.pu0330@gmail.com> Signed-off-by: Kevin H. Luu <khluu000@gmail.com> Signed-off-by: zhumingjue <zhumingjue@huawei.com> Signed-off-by: zhumingjue138 <zhumingjue@huawei.com> Signed-off-by: JaredforReal <w13431838023@gmail.com> Signed-off-by: Jared Wen <w13431838023@gmail.com> Signed-off-by: xulusjb <fdukeshik@gmail.com> Signed-off-by: 齐保元 <qibaoyuan@xiaomi.com> Signed-off-by: Sihao Li <111170255+GG-li@users.noreply.github.com> Signed-off-by: Baoyuan Qi <qibaoyuan@126.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com> Signed-off-by: dongbo910220 <32610838+dongbo910220@users.noreply.github.com> Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com> Signed-off-by: baoyuan qi <qibaoyuan@126.com> Signed-off-by: Prajwal A <prajwalanagani@gmail.com> Signed-off-by: 丁宁 <nndding@gmail.com> Signed-off-by: SHIJIN ZHANG <75300765+Dovis01@users.noreply.github.com> Signed-off-by: dingning<dingning7@xiaomi.com> Signed-off-by: dingning <dingning7@xiaomi.com> Signed-off-by: dingning <dingning@xiaomi.com> Signed-off-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com> Signed-off-by: Canlin Guo <961750412@qq.com> Signed-off-by: shijin zhang <75300765+Dovis01@users.noreply.github.com> Signed-off-by: Hyunjoon Jeong <hyunjoon.jeong@navercorp.com> Signed-off-by: Hyunjoon Jeong <with1015@unist.ac.kr> Co-authored-by: Zeyu Huang | 黃澤宇 <11222265+fhfuih@users.noreply.github.com> Co-authored-by: JohnJan <wuzhongjian_yewu@cmss.chinamobile.com> Co-authored-by: dengyunyang <584797741@qq.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Canlin Guo <canlinguosdu@gmail.com> Co-authored-by: Samit <285365963@qq.com> Co-authored-by: SYLAR <125541396+lishunyang12@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com> Co-authored-by: wangyu <53896905+yenuo26@users.noreply.github.com> Co-authored-by: wangyu31577 <wangyu31577@hundsun.com> Co-authored-by: kYLe <yellowsea@gmail.com> Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com> Co-authored-by: NATURE <wzliu@connect.hku.hk> Co-authored-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com> Co-authored-by: Zhou Taichang <tzhouam@connect.ust.hk> Co-authored-by: root <root@hk01dgx028.cm.cluster> Co-authored-by: XU Mingshi <91017482+mxuax@users.noreply.github.com> Co-authored-by: amy-why-3459 <wuhaiyan17@huawei.com> Co-authored-by: Rein Yang <ruiruyang2@gmail.com> Co-authored-by: Ziming Huang <hzm414167@alibaba-inc.com> Co-authored-by: dsinghvi <divyanshsinghvi@gmail.com> Co-authored-by: Fanli Lin <fanli.lin@intel.com> Co-authored-by: dongbo910220 <32610838+dongbo910220@users.noreply.github.com> Co-authored-by: Ding Zuhao <e1583181@u.nus.edu> Co-authored-by: Andy Zhou <46011930+AndyZhou952@users.noreply.github.com> Co-authored-by: Pierre LE GUEN <26087574+PierreLeGuen@users.noreply.github.com> Co-authored-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com> Co-authored-by: Yuanheng Zhao <54058983+yuanheng-zhao@users.noreply.github.com> Co-authored-by: ram16g <anlianfengjie@163.com> Co-authored-by: Didan Deng <33117903+wtomin@users.noreply.github.com> Co-authored-by: Markus / Mark <46672778+marksverdhei@users.noreply.github.com> Co-authored-by: Juan Pablo Zuluaga <46724788+JuanPZuluaga@users.noreply.github.com> Co-authored-by: muziyuhui666 <111362884+muziyuhui666@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: ceanna93 <fairyanna@naver.com> Co-authored-by: anna <lee.anna@navercorp.com> Co-authored-by: Rustam Khadipash <16683750+hadipash@users.noreply.github.com> Co-authored-by: Alicia <115451386+congw729@users.noreply.github.com> Co-authored-by: hsliu_ustc <hsliu_ustc@noreply.gitcode.com> Co-authored-by: liuzhenwei <zhenweiliu@habana.ai> Co-authored-by: erfgss <97771661+erfgss@users.noreply.github.com> Co-authored-by: Jensen <czjourney@163.com> Co-authored-by: Junhong Liu <ljh_lbj@163.com> Co-authored-by: weichen <calvin_zhu0210@outlook.com> Co-authored-by: PopSoda2002 <zhouhp.me@gmail.com> Co-authored-by: Yan Ma <yan.ma@intel.com> Co-authored-by: ApsarasX <apsarax@outlook.com> Co-authored-by: Chenguang Zheng <645327136@qq.com> Co-authored-by: Jiaping Wu <53215702+ElleElleWu@users.noreply.github.com> Co-authored-by: zhou zhuoxin <zhouzhuoxin1508@outlook.com> Co-authored-by: Gao Han <gaohan19@huawei.com> Co-authored-by: rein yang <73573651+R2-Y@users.noreply.github.com> Co-authored-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Co-authored-by: Flora Feng <4florafeng@gmail.com> Co-authored-by: Sihao Li <111170255+GG-li@users.noreply.github.com> Co-authored-by: ChenWenjing <54166744+Shirley125@users.noreply.github.com> Co-authored-by: Bhanu068 <voutharoja.bhanu06@gmail.com> Co-authored-by: John Liu BUAA <liukecheng97@gmail.com> Co-authored-by: yenuo26 <410167048@qq.com> Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com> Co-authored-by: liuzhenwei <zhenwei.liu@intel.com> Co-authored-by: Isotr0py <Isotr0py@outlook.com> Co-authored-by: ZJY0516 <zhu.jiangyun@foxmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Chendi.Xue <chendi.xue@intel.com> Co-authored-by: Daniel Huang <daniel1.huang@intel.com> Co-authored-by: Alex Brooks <albrooks@redhat.com> Co-authored-by: Sy03 <1370724210@qq.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: UsamaKenway <56207634+UsamaKenway@users.noreply.github.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: wuhang <wuhang6@huawei.com> Co-authored-by: pablo <pablo@agigo.ai> Co-authored-by: SHIJIN ZHANG <75300765+Dovis01@users.noreply.github.com> Co-authored-by: Dong W <89223086+sniper35@users.noreply.github.com> Co-authored-by: Yanick Schraner <yanick.schraner@gmail.com> Co-authored-by: Sangchun Ha <seomk9896@naver.com> Co-authored-by: 亦瑾 <76905040+yJader@users.noreply.github.com> Co-authored-by: junuxyz <216036880+junuxyz@users.noreply.github.com> Co-authored-by: Yupu <feng.yu.pu0330@gmail.com> Co-authored-by: Kevin H. Luu <khluu000@gmail.com> Co-authored-by: zhumingjue138 <zhumingjue@huawei.com> Co-authored-by: Canlin Guo <961750412@qq.com> Co-authored-by: Jared Wen <w13431838023@gmail.com> Co-authored-by: Xu Lu <572605156@qq.com> Co-authored-by: xulusjb <fdukeshik@gmail.com> Co-authored-by: Baoyuan Qi <qibaoyuan@xiaomi.com> Co-authored-by: Zhang Shijin <zhangshijin@xiaomi.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: shijin zhang <zsj1364226740@gmail.com> Co-authored-by: Prajwal A <34590600+LawJarp-A@users.noreply.github.com> Co-authored-by: dingning <dingning7@xiaomi.com> Co-authored-by: ning ding <nndding@gmail.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com> Co-authored-by: Ting FU <futing10@huawei.com> Co-authored-by: developer-account <irteam@vllm-omni-dev-0.vllm-omni-dev.p-nb13557.svc.cluster.local> Co-authored-by: Hyunjoon Jeong <hyunjoon.jeong@navercorp.com>
Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Daniel Huang <daniel1.huang@intel.com> Co-authored-by: Daniel Huang <daniel1.huang@intel.com>
Purpose
This PR adds XPU docker file and updates related docs.
As there is no vllm xpu image published, we build omni image based on vllm docker file directly.
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)