[Fix] GLM Image by JaredforReal · Pull Request #799 · vllm-project/vllm-omni

JaredforReal · 2026-01-15T09:10:22Z

Purpose

Fix the Image2Image mode error
Add PreProcessor Function
Refine PostProcessor Usage

Test Plan

python image_edit.py --model GLM-Image --image qwen_image_output.png --prompt "make it cartoon style"

Test Result

Original:

Edited:

Detailed Logs:

python image_edit.py --model /workspace/GLM-Image-Final/ --image qwen_image_output.png --prompt "make it cartoon style"
WARNING 01-15 09:06:36 [mooncake_connector.py:18] Mooncake not available, MooncakeOmniConnector will not work
INFO 01-15 09:06:39 [omni.py:122] Initializing stages for model: /workspace/GLM-Image-Final/
INFO 01-15 09:06:39 [initialization.py:35] No OmniTransferConfig provided
INFO 01-15 09:06:39 [omni_stage.py:108] [OmniStage] stage_config: {'stage_id': 0, 'stage_type': 'diffusion', 'runtime': {'process': True, 'devices': '0', 'max_batch_size': 1}, 'engine_args': {'model': '/workspace/GLM-Image-Final/', 'vae_use_slicing': False, 'vae_use_tiling': False, 'cache_backend': None, 'cache_config': None, 'parallel_config': {'pipeline_parallel_size': 1, 'data_parallel_size': 1, 'tensor_parallel_size': 1, 'sequence_parallel_size': 1, 'ulysses_degree': 1, 'ring_degree': 1, 'cfg_parallel_size': 1}, 'model_stage': 'diffusion'}, 'final_output': True, 'final_output_type': 'image'}
INFO 01-15 09:06:39 [omni.py:302] [Orchestrator] Waiting for 1 stages to initialize (timeout: 300s)
[Stage-0] WARNING 01-15 09:07:02 [mooncake_connector.py:18] Mooncake not available, MooncakeOmniConnector will not work
[Stage-0] INFO 01-15 09:07:04 [omni_stage.py:435] Starting stage worker with model: /workspace/GLM-Image-Final/
[Stage-0] WARNING 01-15 09:07:06 [envs.py:194] Flash Attention library "flash_attn" not found, using pytorch attention implementation
[Stage-0] INFO 01-15 09:07:06 [diffusion_engine.py:231] Starting server...
[Stage-0] WARNING 01-15 09:07:21 [mooncake_connector.py:18] Mooncake not available, MooncakeOmniConnector will not work
[Stage-0] WARNING 01-15 09:07:22 [envs.py:194] Flash Attention library "flash_attn" not found, using pytorch attention implementation
[Stage-0] INFO 01-15 09:07:23 [gpu_worker.py:273] Worker 0 created result MessageQueue
[Stage-0] INFO 01-15 09:07:24 [scheduler.py:229] Chunked prefill is enabled with max_num_batched_tokens=2048.
[Stage-0] INFO 01-15 09:07:24 [vllm.py:632] Asynchronous scheduling is enabled.
[Stage-0] INFO 01-15 09:07:24 [vllm.py:639] Disabling NCCL for DP synchronization when using async scheduling.
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Stage-0] INFO 01-15 09:07:24 [gpu_worker.py:77] Worker 0: Initialized device and distributed environment.
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Stage-0] INFO 01-15 09:07:24 [pipeline_glm_image.py:267] Loading GlmImageForConditionalGeneration (AR model)...
Loading weights: 100%|████████████████████████████████████████████████████████████████████████████| 1011/1011 [00:03<00:00, 284.76it/s, Materializing param=model.vqmodel.quantize.embedding.weight]
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
[Stage-0] INFO 01-15 09:07:31 [pipeline_glm_image.py:280] Loading T5EncoderModel (glyph encoder)...
Loading weights: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 111/111 [00:00<00:00, 724.11it/s, Materializing param=shared.weight]
[Stage-0] INFO 01-15 09:07:32 [pipeline_glm_image.py:293] Loading AutoencoderKL (VAE)...
[Stage-0] INFO 01-15 09:07:32 [pipeline_glm_image.py:300] Loading GlmImageTransformer2DModel (DiT)...
Loading safetensors checkpoint shards:   0% Completed | 0/3 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  33% Completed | 1/3 [00:00<00:01,  1.12it/s]
Loading safetensors checkpoint shards:  67% Completed | 2/3 [00:01<00:00,  1.06it/s]
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:02<00:00,  1.10it/s]
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:02<00:00,  1.09it/s]

[Stage-0] INFO 01-15 09:07:35 [diffusers_loader.py:214] Loading weights took 2.85 seconds
[Stage-0] INFO 01-15 09:07:36 [gpu_worker.py:100] Model loading took 33.0291 GiB and 12.011215 seconds
[Stage-0] INFO 01-15 09:07:36 [gpu_worker.py:105] Worker 0: Model loaded successfully.
[Stage-0] WARNING 01-15 09:07:36 [compile.py:27] Regional compilation skipped because the model does not define `_repeated_blocks`.
[Stage-0] INFO 01-15 09:07:36 [gpu_worker.py:126] Worker 0: Model compiled with torch.compile.
[Stage-0] INFO 01-15 09:07:36 [gpu_worker.py:409] Worker 0: Scheduler loop started.
[Stage-0] INFO 01-15 09:07:36 [gpu_worker.py:332] Worker 0 ready to receive requests via shared memory
[Stage-0] INFO 01-15 09:07:36 [scheduler.py:46] SyncScheduler initialized result MessageQueue
[Stage-0] INFO 01-15 09:07:36 [diffusion_engine.py:313] dummy run to warm up the model
[Stage-0] INFO 01-15 09:07:36 [pipeline_glm_image.py:880] Generating prior tokens with AR model...
[Stage-0] INFO 01-15 09:08:11 [pipeline_glm_image.py:889] Encoding prompt...
[Stage-0] INFO 01-15 09:08:11 [pipeline_glm_image.py:945] Starting denoising loop with 1 steps...
[Stage-0] INFO 01-15 09:08:12 [pipeline_glm_image.py:960] Decoding latents with VAE...
[Stage-0] INFO 01-15 09:08:13 [omni_stage.py:664] Max batch size: 1
INFO 01-15 09:08:13 [omni.py:295] [Orchestrator] Stage-0 reported ready
INFO 01-15 09:08:13 [omni.py:321] [Orchestrator] All stages initialized successfully
Pipeline loaded

============================================================
Generation Configuration:
  Model: /workspace/GLM-Image-Final/
  Inference steps: 50
  Cache backend: None (no acceleration)
  Input image size: (1024, 1024)
  Parallel configuration: ulysses_degree=1, ring_degree=1, cfg_parallel_size=1
============================================================

Adding requests:   0%|                                                                                                                                                        | 0/1 [00:00<?, ?it/s[Stage-0] INFO 01-15 09:08:13 [omni_diffusion.py:115] Prepared 1 requests for generation.                                  | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 unit/s, output: 0.00 unit/s]
[Stage-0] INFO 01-15 09:08:14 [diffusion_engine.py:99] Pre-processing completed in 0.1563 seconds
[Stage-0] INFO 01-15 09:08:14 [pipeline_glm_image.py:880] Generating prior tokens with AR model...
[Stage-0] INFO 01-15 09:08:41 [pipeline_glm_image.py:889] Encoding prompt...
[Stage-0] INFO 01-15 09:08:41 [pipeline_glm_image.py:902] Preparing KV cache for Image Edit mode...
[Stage-0] INFO 01-15 09:08:41 [pipeline_glm_image.py:945] Starting denoising loop with 50 steps...
[Stage-0] INFO 01-15 09:08:50 [pipeline_glm_image.py:960] Decoding latents with VAE...
[Stage-0] INFO 01-15 09:08:51 [diffusion_engine.py:104] Generation completed successfully.
[Stage-0] INFO 01-15 09:08:51 [diffusion_engine.py:127] Post-processing completed in 0.1097 seconds
INFO 01-15 09:08:51 [log_utils.py:550] {'type': 'request_level_metrics',
INFO 01-15 09:08:51 [log_utils.py:550]  'request_id': '0_58f63e24-674b-4667-a85a-740c0c33f51e',
INFO 01-15 09:08:51 [log_utils.py:550]  'e2e_time_ms': 37459.372997283936,
INFO 01-15 09:08:51 [log_utils.py:550]  'e2e_tpt': 0.0,
INFO 01-15 09:08:51 [log_utils.py:550]  'e2e_total_tokens': 0,
INFO 01-15 09:08:51 [log_utils.py:550]  'transfers_total_time_ms': 0.0,
INFO 01-15 09:08:51 [log_utils.py:550]  'transfers_total_bytes': 0,
INFO 01-15 09:08:51 [log_utils.py:550]  'stages': {0: {'stage_gen_time_ms': 37376.41453742981,
INFO 01-15 09:08:51 [log_utils.py:550]                 'num_tokens_out': 0,
INFO 01-15 09:08:51 [log_utils.py:550]                 'num_tokens_in': 0}}}
Processed prompts: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:37<00:00, 37.46s/img, est. speed stage-0 img/s: 0.00, avg e2e_lat: 0.0ms]
INFO 01-15 09:08:51 [omni.py:782] [Summary] {'e2e_requests': 1,██████████████████████████████████████████████████| 1/1 [00:37<00:00, 37.46s/img, est. speed stage-0 img/s: 0.00, avg e2e_lat: 0.0ms]
INFO 01-15 09:08:51 [omni.py:782]  'e2e_total_time_ms': 37462.43405342102,
INFO 01-15 09:08:51 [omni.py:782]  'e2e_sum_time_ms': 37459.372997283936,
INFO 01-15 09:08:51 [omni.py:782]  'e2e_total_tokens': 0,
INFO 01-15 09:08:51 [omni.py:782]  'e2e_avg_time_per_request_ms': 37459.372997283936,
INFO 01-15 09:08:51 [omni.py:782]  'e2e_avg_tokens_per_s': 0.0,
INFO 01-15 09:08:51 [omni.py:782]  'wall_time_ms': 37462.43405342102,
INFO 01-15 09:08:51 [omni.py:782]  'final_stage_id': {'0_58f63e24-674b-4667-a85a-740c0c33f51e': 0},
INFO 01-15 09:08:51 [omni.py:782]  'stages': [{'stage_id': 0,
INFO 01-15 09:08:51 [omni.py:782]              'requests': 1,
INFO 01-15 09:08:51 [omni.py:782]              'tokens': 0,
INFO 01-15 09:08:51 [omni.py:782]              'total_time_ms': 37461.493730545044,
INFO 01-15 09:08:51 [omni.py:782]              'avg_time_per_request_ms': 37461.493730545044,
INFO 01-15 09:08:51 [omni.py:782]              'avg_tokens_per_s': 0.0}],
INFO 01-15 09:08:51 [omni.py:782]  'transfers': []}
Adding requests:   0%|                                                                                                                                                        | 0/1 [00:37<?, ?it/s]
[Stage-0] INFO 01-15 09:08:51 [omni_stage.py:673] Received shutdown signal
[Stage-0] INFO 01-15 09:08:51 [gpu_worker.py:364] Worker 0: Received shutdown message
[Stage-0] INFO 01-15 09:08:51 [gpu_worker.py:386] event loop terminated.
[Stage-0] INFO 01-15 09:08:51 [gpu_worker.py:417] Worker 0: Shutdown complete.
Total generation time: 42.4675 seconds (42467.51 ms)
INFO 01-15 09:08:56 [image_edit.py:368] Outputs: [OmniRequestOutput(request_id='', finished=True, stage_id=0, final_output_type='image', request_output=[OmniRequestOutput(request_id='0_58f63e24-674b-4667-a85a-740c0c33f51e', finished=True, stage_id=None, final_output_type='image', request_output=None, images=[1 PIL Images], prompt='make it cartoon style', latents=None, metrics={})], images=[], prompt=None, latents=None, metrics={})]
Saved edited image to /opensource/guohong/vllm-omni/examples/offline_inference/image_to_image/output_image_edit.png

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: JaredforReal <w13431838023@gmail.com>

SamitHuang

lgtm

tzhouam · 2026-01-15T09:15:20Z

LGTM

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a109b772ef

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copilot

Pull request overview

This pull request fixes the Image2Image mode for GLM-Image by adding a preprocessing function and refining postprocessing usage. The changes move image preprocessing logic out of the pipeline's forward method into a dedicated pre_process_func that runs before batching, following the pattern used by other pipelines like QwenImageEdit.

Changes:

Added get_glm_image_pre_process_func to handle condition image preprocessing before pipeline execution
Refactored AR token generation to use processor-generated image grids directly
Fixed critical parameter name bug (kv_caches → kv_cache) in transformer calls
Moved postprocessing out of pipeline's forward method to external post_process_func

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
vllm_omni/diffusion/registry.py	Registered the new GLM-Image preprocessing function in the registry
vllm_omni/diffusion/models/glm_image/pipeline_glm_image.py	Added preprocessing function, refactored AR generation logic, fixed kv_cache parameter name bug, and delegated postprocessing to external handler
vllm_omni/diffusion/models/glm_image/init.py	Exported the new preprocessing function

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: JaredforReal <w13431838023@gmail.com>

* init and registry Signed-off-by: JaredforReal <w13431838023@gmail.com> * implement glm_image_transformer.py Signed-off-by: JaredforReal <w13431838023@gmail.com> * update transformer Signed-off-by: JaredforReal <w13431838023@gmail.com> * init pipeline_glm_image.py Signed-off-by: JaredforReal <w13431838023@gmail.com> * init pipeline_glm_image.py Signed-off-by: JaredforReal <w13431838023@gmail.com> * remove pre process Signed-off-by: JaredforReal <w13431838023@gmail.com> * add check_input(), implement CFG parallel in diffuse(), align generate_prior_tokens Signed-off-by: JaredforReal <w13431838023@gmail.com> * fix check_input(prompt_embed), add KVCache for Image Edit Signed-off-by: JaredforReal <w13431838023@gmail.com> * print out vllm version Signed-off-by: root <root@hk01dgx039.cm.cluster> * update model config Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * update worker Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * update one import in AsyncOmniLLM (not finish all, but can run) Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * update Qwen3 Omni ViT init based on updated interface (the update for Qwen3 Omni Thinker is not finished) Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * Remove unnecessary override for OmniRequestState (the update for OmniRequestState is not finished) Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * update model runner dummy run Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * update ar scheduler Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * update _preprocess, execute model and sample_tokens for AR Model Runner Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * debug AR Scheduler Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * update OmniGPUModelRunner._update_states Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * update the offline LLM request sorting due to changed requested id format Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * update Qwen3 Omni to fit with the engine core logic Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * update generation model runner Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * debug GLM-Image Model Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * remove deleted args from doc string Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * [Model][Rebase] Add GLM-Image Model and Partial Rebase to v0.14.0 (Support AR Offiline) (vllm-project#763) Signed-off-by: JaredforReal <w13431838023@gmail.com> Signed-off-by: root <root@hk01dgx039.cm.cluster> Signed-off-by: tzhouam <tzhouam@connect.ust.hk> Co-authored-by: JaredforReal <w13431838023@gmail.com> Co-authored-by: root <root@hk01dgx039.cm.cluster> * disable async scheduling for generation models, avoiding inconsistency from race condition Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * Update Qwen 3 Omni Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * [Fix] GLM Image (vllm-project#799) Signed-off-by: JaredforReal <w13431838023@gmail.com> * support online serving for Qwen3 Omni Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * fix pre-commit Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * inherit engine outputs Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * supporting audio in video(not finished) Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * Update Qwen2.5 Omni model to version 0.14, adding support for image and video input processing, and refining position handling for MRoPE. Adjustments made to the YAML configuration to disable async scheduling for consistency. Code cleanup and formatting improvements included. Signed-off-by: Taichang Zhou <tzhouam@connect.ust.hk> * debug qwen 2.5 Omni Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * update doc Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * rebase to vllm 0.14.0 Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * unify query type Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * fix build doc Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * Dev/rebase 0.14.0 (vllm-project#813) Signed-off-by: JaredforReal <w13431838023@gmail.com> Signed-off-by: root <root@hk01dgx039.cm.cluster> Signed-off-by: tzhouam <tzhouam@connect.ust.hk> Signed-off-by: TangPeng <85704592@qq.com> Signed-off-by: XU Mingshi <91017482+mxuax@users.noreply.github.com> Signed-off-by: mxuax <mxuax@connect.ust.hk> Signed-off-by: Sihyeon Jang <sihyeon.jang@navercorp.com> Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: iwzbi <wzbi@zju.edu.cn> Signed-off-by: ZeldaHuang <hzm414167@alibaba-inc.com> Signed-off-by: Yuhan Liu <30294295+liuyuhanalex@users.noreply.github.com> Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com> Signed-off-by: samithuang <285365963@qq.com> Signed-off-by: linyueqian <linyueqian@outlook.com> Signed-off-by: David Chen <530634352@qq.com> Signed-off-by: yinpeiqi <yinpeiqi809@gmail.com> Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Signed-off-by: wangyu31577 <wangyu31577@hundsun.com> Signed-off-by: princepride <wangzhipeng628@gmail.com> Signed-off-by: Taichang Zhou <tzhouam@connect.ust.hk> Signed-off-by: John Liu BUAA <liukecheng97@gmail.com> Signed-off-by: Dinesh G <G.Dinesh@ibm.com> Signed-off-by: gDINESH13 <dinesh13g@gmail.com> Co-authored-by: JaredforReal <w13431838023@gmail.com> Co-authored-by: root <root@hk01dgx039.cm.cluster> Co-authored-by: JustQJ <37905360+JustQJ@users.noreply.github.com> Co-authored-by: XU Mingshi <91017482+mxuax@users.noreply.github.com> Co-authored-by: Sihyeon Jang <uneedsihyeon@gmail.com> Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com> Co-authored-by: catcat <108673086+iwzbi@users.noreply.github.com> Co-authored-by: Ziming Huang <hzm414167@alibaba-inc.com> Co-authored-by: Yuhan Liu <30294295+liuyuhanalex@users.noreply.github.com> Co-authored-by: Alicia <115451386+congw729@users.noreply.github.com> Co-authored-by: Samit <285365963@qq.com> Co-authored-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com> Co-authored-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com> Co-authored-by: Peiqi Yin <60515999+yinpeiqi@users.noreply.github.com> Co-authored-by: Didan Deng <33117903+wtomin@users.noreply.github.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: wangyu <53896905+yenuo26@users.noreply.github.com> Co-authored-by: wangyu31577 <wangyu31577@hundsun.com> Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com> Co-authored-by: John Liu BUAA <liukecheng97@gmail.com> Co-authored-by: D!NE$H <67671800+gDINESH13@users.noreply.github.com> * update test import Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * update version from 0.14.0rc2 to 0.14.0 Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * set vllm config for all CI Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * update CI Signed-off-by: tzhouam <tzhouam@connect.ust.hk> * Fix CPU offload OOM and performance issues in GLM-Image pipeline * Fix CPU offload OOM and performance issues in GLM-Image pipeline - Conditionally load vision_language_encoder, text_encoder, and vae to GPU only when CPU offload is disabled - Propagate cpu_offload_gb argument to enable_cpu_offload flag - Include vision_language_encoder in CPU offload hooks for proper AR model offloading - Fix device mismatch in generate_prior_tokens during CPU offload mode * Fix shared memory broadcast hang in GLM-Image pipeline - Add manual encoder activation support to SequentialOffloader - Explicitly trigger vision_language_encoder onload before get_image_features in pipeline - Prevents CPU-bound stalling during AR generation when offload is active * Fix device mismatch in generate() by triggering offload hook * Clean up temporary patch files --------- Signed-off-by: JaredforReal <w13431838023@gmail.com> Signed-off-by: root <root@hk01dgx039.cm.cluster> Signed-off-by: tzhouam <tzhouam@connect.ust.hk> Signed-off-by: Taichang Zhou <tzhouam@connect.ust.hk> Signed-off-by: TangPeng <85704592@qq.com> Signed-off-by: XU Mingshi <91017482+mxuax@users.noreply.github.com> Signed-off-by: mxuax <mxuax@connect.ust.hk> Signed-off-by: Sihyeon Jang <sihyeon.jang@navercorp.com> Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: iwzbi <wzbi@zju.edu.cn> Signed-off-by: ZeldaHuang <hzm414167@alibaba-inc.com> Signed-off-by: Yuhan Liu <30294295+liuyuhanalex@users.noreply.github.com> Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com> Signed-off-by: samithuang <285365963@qq.com> Signed-off-by: linyueqian <linyueqian@outlook.com> Signed-off-by: David Chen <530634352@qq.com> Signed-off-by: yinpeiqi <yinpeiqi809@gmail.com> Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Signed-off-by: wangyu31577 <wangyu31577@hundsun.com> Signed-off-by: princepride <wangzhipeng628@gmail.com> Signed-off-by: John Liu BUAA <liukecheng97@gmail.com> Signed-off-by: Dinesh G <G.Dinesh@ibm.com> Signed-off-by: gDINESH13 <dinesh13g@gmail.com> Co-authored-by: JaredforReal <w13431838023@gmail.com> Co-authored-by: root <root@hk01dgx039.cm.cluster> Co-authored-by: tzhouam <tzhouam@connect.ust.hk> Co-authored-by: JustQJ <37905360+JustQJ@users.noreply.github.com> Co-authored-by: XU Mingshi <91017482+mxuax@users.noreply.github.com> Co-authored-by: Sihyeon Jang <uneedsihyeon@gmail.com> Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com> Co-authored-by: catcat <108673086+iwzbi@users.noreply.github.com> Co-authored-by: Ziming Huang <hzm414167@alibaba-inc.com> Co-authored-by: Yuhan Liu <30294295+liuyuhanalex@users.noreply.github.com> Co-authored-by: Alicia <115451386+congw729@users.noreply.github.com> Co-authored-by: Samit <285365963@qq.com> Co-authored-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com> Co-authored-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com> Co-authored-by: Peiqi Yin <60515999+yinpeiqi@users.noreply.github.com> Co-authored-by: Didan Deng <33117903+wtomin@users.noreply.github.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: wangyu <53896905+yenuo26@users.noreply.github.com> Co-authored-by: wangyu31577 <wangyu31577@hundsun.com> Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com> Co-authored-by: John Liu BUAA <liukecheng97@gmail.com> Co-authored-by: D!NE$H <67671800+gDINESH13@users.noreply.github.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>

fix glm image

a109b77

Signed-off-by: JaredforReal <w13431838023@gmail.com>

JaredforReal requested a review from hsliuustc0106 as a code owner January 15, 2026 09:10

Copilot AI review requested due to automatic review settings January 15, 2026 09:10

Copilot started reviewing on behalf of JaredforReal January 15, 2026 09:11 View session

tzhouam self-requested a review January 15, 2026 09:11

tzhouam added ready label to trigger buildkite CI and removed ready label to trigger buildkite CI labels Jan 15, 2026

SamitHuang approved these changes Jan 15, 2026

View reviewed changes

tzhouam approved these changes Jan 15, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jan 15, 2026

View reviewed changes

Comment thread vllm_omni/diffusion/models/glm_image/pipeline_glm_image.py

Copilot AI reviewed Jan 15, 2026

View reviewed changes

JaredforReal added 2 commits January 15, 2026 17:34

accept some reviews

9b20c8e

Signed-off-by: JaredforReal <w13431838023@gmail.com>

remove empty file

fd10877

Signed-off-by: JaredforReal <w13431838023@gmail.com>

Gaohan123 merged commit b938725 into vllm-project:dev/rebase_0.14.0 Jan 15, 2026
2 checks passed

JaredforReal deleted the fix/glm-image branch January 15, 2026 10:16

david6666666 mentioned this pull request Jan 16, 2026

vLLM-Omni Model Support #808

Open

63 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] GLM Image#799

[Fix] GLM Image#799
Gaohan123 merged 3 commits into
vllm-project:dev/rebase_0.14.0from
JaredforReal:fix/glm-image

JaredforReal commented Jan 15, 2026 •

edited

Loading

Uh oh!

SamitHuang left a comment

Uh oh!

tzhouam commented Jan 15, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

JaredforReal commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

SamitHuang left a comment

Choose a reason for hiding this comment

Uh oh!

tzhouam commented Jan 15, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

JaredforReal commented Jan 15, 2026 •

edited

Loading