[Model][Rebase] Add GLM-Image Model and Partial Rebase to v0.14.0 (Support AR Offiline)#763
Conversation
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
…e_prior_tokens Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: root <root@hk01dgx039.cm.cluster>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0810dae881
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| sampling_params_list = [ | ||
| thinker_sampling_params, | ||
| talker_sampling_params, # code predictor is integrated into talker for Qwen3 Omni | ||
| code2wav_sampling_params, | ||
| # talker_sampling_params, # code predictor is integrated into talker for Qwen3 Omni | ||
| # code2wav_sampling_params, |
There was a problem hiding this comment.
Provide per-stage sampling params to match 3-stage pipeline
With only thinker_sampling_params in sampling_params_list, the default Qwen3-Omni Instruct pipeline (three stages in vllm_omni/model_executor/stage_configs/qwen3_omni_moe.yaml) will raise a ValueError because Omni._run_generation requires len(sampling_params_list) == len(self.stage_list) (vllm_omni/entrypoints/omni.py). This means running the example with the default stage config now fails before any generation occurs; it only works if users manually supply a single-stage config (e.g., thinking-only), which isn’t the default for this model.
Useful? React with 👍 / 👎.
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
… Qwen3 Omni Thinker is not finished) Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
…RequestState is not finished) Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
…rmat Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Gaohan123
left a comment
There was a problem hiding this comment.
LGTM, such a huge work. Thanks!
|
Tried to setup vllm-omini with GLM-Image support, but encounterd a lot of errors using vLLM:v0.14.0 and latest vLLM-omni (commit: 5e7035e and dev/rebase_0.14.0). |
* init and registry Signed-off-by: JaredforReal <w13431838023@gmail.com> * implement glm_image_transformer.py Signed-off-by: JaredforReal <w13431838023@gmail.com> * update transformer Signed-off-by: JaredforReal <w13431838023@gmail.com> * init pipeline_glm_image.py Signed-off-by: JaredforReal <w13431838023@gmail.com> * init pipeline_glm_image.py Signed-off-by: JaredforReal <w13431838023@gmail.com> * remove pre process Signed-off-by: JaredforReal <w13431838023@gmail.com> * add check_input(), implement CFG parallel in diffuse(), align generate_prior_tokens Signed-off-by: JaredforReal <w13431838023@gmail.com> * fix check_input(prompt_embed), add KVCache for Image Edit Signed-off-by: JaredforReal <w13431838023@gmail.com> * print out vllm version Signed-off-by: root <root@hk01dgx039.cm.cluster> * update model config Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * update worker Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * update one import in AsyncOmniLLM (not finish all, but can run) Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * update Qwen3 Omni ViT init based on updated interface (the update for Qwen3 Omni Thinker is not finished) Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * Remove unnecessary override for OmniRequestState (the update for OmniRequestState is not finished) Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * update model runner dummy run Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * update ar scheduler Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * update _preprocess, execute model and sample_tokens for AR Model Runner Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * debug AR Scheduler Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * update OmniGPUModelRunner._update_states Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * update the offline LLM request sorting due to changed requested id format Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * update Qwen3 Omni to fit with the engine core logic Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * update generation model runner Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * debug GLM-Image Model Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * remove deleted args from doc string Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * [Model][Rebase] Add GLM-Image Model and Partial Rebase to v0.14.0 (Support AR Offiline) (vllm-project#763) Signed-off-by: JaredforReal <w13431838023@gmail.com> Signed-off-by: root <root@hk01dgx039.cm.cluster> Signed-off-by: tzhouam <tzhouam@connect.ust.hk> Co-authored-by: JaredforReal <w13431838023@gmail.com> Co-authored-by: root <root@hk01dgx039.cm.cluster> * disable async scheduling for generation models, avoiding inconsistency from race condition Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * Update Qwen 3 Omni Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * [Fix] GLM Image (vllm-project#799) Signed-off-by: JaredforReal <w13431838023@gmail.com> * support online serving for Qwen3 Omni Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * fix pre-commit Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * inherit engine outputs Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * supporting audio in video(not finished) Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * Update Qwen2.5 Omni model to version 0.14, adding support for image and video input processing, and refining position handling for MRoPE. Adjustments made to the YAML configuration to disable async scheduling for consistency. Code cleanup and formatting improvements included. Signed-off-by: Taichang Zhou <tzhouam@connect.ust.hk> * debug qwen 2.5 Omni Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * update doc Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * rebase to vllm 0.14.0 Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * unify query type Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * fix build doc Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * Dev/rebase 0.14.0 (vllm-project#813) Signed-off-by: JaredforReal <w13431838023@gmail.com> Signed-off-by: root <root@hk01dgx039.cm.cluster> Signed-off-by: tzhouam <tzhouam@connect.ust.hk> Signed-off-by: TangPeng <85704592@qq.com> Signed-off-by: XU Mingshi <91017482+mxuax@users.noreply.github.com> Signed-off-by: mxuax <mxuax@connect.ust.hk> Signed-off-by: Sihyeon Jang <sihyeon.jang@navercorp.com> Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: iwzbi <wzbi@zju.edu.cn> Signed-off-by: ZeldaHuang <hzm414167@alibaba-inc.com> Signed-off-by: Yuhan Liu <30294295+liuyuhanalex@users.noreply.github.com> Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com> Signed-off-by: samithuang <285365963@qq.com> Signed-off-by: linyueqian <linyueqian@outlook.com> Signed-off-by: David Chen <530634352@qq.com> Signed-off-by: yinpeiqi <yinpeiqi809@gmail.com> Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Signed-off-by: wangyu31577 <wangyu31577@hundsun.com> Signed-off-by: princepride <wangzhipeng628@gmail.com> Signed-off-by: Taichang Zhou <tzhouam@connect.ust.hk> Signed-off-by: John Liu BUAA <liukecheng97@gmail.com> Signed-off-by: Dinesh G <G.Dinesh@ibm.com> Signed-off-by: gDINESH13 <dinesh13g@gmail.com> Co-authored-by: JaredforReal <w13431838023@gmail.com> Co-authored-by: root <root@hk01dgx039.cm.cluster> Co-authored-by: JustQJ <37905360+JustQJ@users.noreply.github.com> Co-authored-by: XU Mingshi <91017482+mxuax@users.noreply.github.com> Co-authored-by: Sihyeon Jang <uneedsihyeon@gmail.com> Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com> Co-authored-by: catcat <108673086+iwzbi@users.noreply.github.com> Co-authored-by: Ziming Huang <hzm414167@alibaba-inc.com> Co-authored-by: Yuhan Liu <30294295+liuyuhanalex@users.noreply.github.com> Co-authored-by: Alicia <115451386+congw729@users.noreply.github.com> Co-authored-by: Samit <285365963@qq.com> Co-authored-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com> Co-authored-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com> Co-authored-by: Peiqi Yin <60515999+yinpeiqi@users.noreply.github.com> Co-authored-by: Didan Deng <33117903+wtomin@users.noreply.github.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: wangyu <53896905+yenuo26@users.noreply.github.com> Co-authored-by: wangyu31577 <wangyu31577@hundsun.com> Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com> Co-authored-by: John Liu BUAA <liukecheng97@gmail.com> Co-authored-by: D!NE$H <67671800+gDINESH13@users.noreply.github.com> * update test import Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * update version from 0.14.0rc2 to 0.14.0 Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * set vllm config for all CI Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * update CI Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * Fix CPU offload OOM and performance issues in GLM-Image pipeline * Fix CPU offload OOM and performance issues in GLM-Image pipeline - Conditionally load vision_language_encoder, text_encoder, and vae to GPU only when CPU offload is disabled - Propagate cpu_offload_gb argument to enable_cpu_offload flag - Include vision_language_encoder in CPU offload hooks for proper AR model offloading - Fix device mismatch in generate_prior_tokens during CPU offload mode * Fix shared memory broadcast hang in GLM-Image pipeline - Add manual encoder activation support to SequentialOffloader - Explicitly trigger vision_language_encoder onload before get_image_features in pipeline - Prevents CPU-bound stalling during AR generation when offload is active * Fix device mismatch in generate() by triggering offload hook * Clean up temporary patch files --------- Signed-off-by: JaredforReal <w13431838023@gmail.com> Signed-off-by: root <root@hk01dgx039.cm.cluster> Signed-off-by: tzhouam <tzhouam@connect.ust.hk> Signed-off-by: Taichang Zhou <tzhouam@connect.ust.hk> Signed-off-by: TangPeng <85704592@qq.com> Signed-off-by: XU Mingshi <91017482+mxuax@users.noreply.github.com> Signed-off-by: mxuax <mxuax@connect.ust.hk> Signed-off-by: Sihyeon Jang <sihyeon.jang@navercorp.com> Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: iwzbi <wzbi@zju.edu.cn> Signed-off-by: ZeldaHuang <hzm414167@alibaba-inc.com> Signed-off-by: Yuhan Liu <30294295+liuyuhanalex@users.noreply.github.com> Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com> Signed-off-by: samithuang <285365963@qq.com> Signed-off-by: linyueqian <linyueqian@outlook.com> Signed-off-by: David Chen <530634352@qq.com> Signed-off-by: yinpeiqi <yinpeiqi809@gmail.com> Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Signed-off-by: wangyu31577 <wangyu31577@hundsun.com> Signed-off-by: princepride <wangzhipeng628@gmail.com> Signed-off-by: John Liu BUAA <liukecheng97@gmail.com> Signed-off-by: Dinesh G <G.Dinesh@ibm.com> Signed-off-by: gDINESH13 <dinesh13g@gmail.com> Co-authored-by: JaredforReal <w13431838023@gmail.com> Co-authored-by: root <root@hk01dgx039.cm.cluster> Co-authored-by: tzhouam <tzhouam@connect.ust.hk> Co-authored-by: JustQJ <37905360+JustQJ@users.noreply.github.com> Co-authored-by: XU Mingshi <91017482+mxuax@users.noreply.github.com> Co-authored-by: Sihyeon Jang <uneedsihyeon@gmail.com> Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com> Co-authored-by: catcat <108673086+iwzbi@users.noreply.github.com> Co-authored-by: Ziming Huang <hzm414167@alibaba-inc.com> Co-authored-by: Yuhan Liu <30294295+liuyuhanalex@users.noreply.github.com> Co-authored-by: Alicia <115451386+congw729@users.noreply.github.com> Co-authored-by: Samit <285365963@qq.com> Co-authored-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com> Co-authored-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com> Co-authored-by: Peiqi Yin <60515999+yinpeiqi@users.noreply.github.com> Co-authored-by: Didan Deng <33117903+wtomin@users.noreply.github.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: wangyu <53896905+yenuo26@users.noreply.github.com> Co-authored-by: wangyu31577 <wangyu31577@hundsun.com> Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com> Co-authored-by: John Liu BUAA <liukecheng97@gmail.com> Co-authored-by: D!NE$H <67671800+gDINESH13@users.noreply.github.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
|
I tried it with 0.14.0rc1, but it is not starting: |
Purpose
This PR aims to support GLM-Image model and rebase to v0.14.0 supporting AR offline inference.
Installation
GLM-Image
Test Plan
Tested on GLM-Image with commands:
cd examples/offline_inference/text_to_image python3 text_to_image.py --model zai-org/GLM-ImageTest Result
Image:

log:
Qwen 3 Omni
Test Plan
Tested on Qwen 3 Omni Thinker with Cuda Graph using query "use_image".
Test Result
text:
log:
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)