[Enhancement] Add force_refresh support for GLM-Image for cache-dit 1.3.0 upgrade by SamitHuang · Pull Request #1858 · vllm-project/vllm-omni

SamitHuang · 2026-03-12T17:44:23Z

Purpose

Add force_refresh_step_hint and force_refresh_step_policy support from cache-dit 1.3.0 for GLM-Image model, aligning with the cache-dit example usage. Also adds --cache-backend CLI support to the GLM-Image end2end example script.

Why GLM-Image needs special handling

GLM-Image (GlmImagePipeline): In editing mode, the transformer is called once to process the input image before the denoising loop begins. Setting force_refresh_step_hint = 1 ensures the cache is force-refreshed after this preprocessing call, discarding stale hidden states before actual denoising. For text-to-image mode, force_refresh_step_hint = None (no force refresh needed). This can be configured in cache-dit 1.3.0

Changes

vllm_omni/diffusion/data.py: Added force_refresh_step_hint and force_refresh_step_policy fields to DiffusionCacheConfig
vllm_omni/diffusion/cache/cache_dit_backend.py:
- Pass new fields through _build_db_cache_config() to DBCacheConfig
- Added enable_cache_for_glm_image() custom enabler
- Registered in CUSTOM_DIT_ENABLERS
examples/offline_inference/glm_image/end2end.py: Added --cache-backend and --enable-cache-dit-summary CLI arguments
requirements/common.txt: Upgraded cache-dit from 1.2.0 to 1.3.0

Dependency

This PR depends on vllm-project/vllm-omni#1834 (cache-dit 1.3.0 upgrade) being merged first, or includes the upgrade in this PR.

Test Plan

Pre-commit check passes (ruff check, ruff format, typos)
Verify GLM-Image T2I inference with cache-dit acceleration

Test Commands (GLM-Image)

# Baseline (no cache-dit)
CUDA_VISIBLE_DEVICES=1,6 python examples/offline_inference/glm_image/end2end.py \
  --model-path zai-org/GLM-Image \
  --config-path examples/offline_inference/glm_image/glm_image.yaml \
  --prompt "A photo of an astronaut riding a horse on mars" \
  --height 1024 --width 1024 --num-inference-steps 50 --seed 42 \
  --output glm_image_baseline.png --verbose

# With cache-dit
CUDA_VISIBLE_DEVICES=1,6 python examples/offline_inference/glm_image/end2end.py \
  --model-path zai-org/GLM-Image \
  --config-path examples/offline_inference/glm_image/glm_image.yaml \
  --prompt "A photo of an astronaut riding a horse on mars" \
  --height 1024 --width 1024 --num-inference-steps 50 --seed 42 \
  --cache-backend cache_dit --enable-cache-dit-summary \
  --output glm_image_cachedit.png --verbose

Test Result

Benchmark on dual NVIDIA H800 GPUs (AR on GPU 1, Diffusion on GPU 6), GLM-Image T2I, 1024x1024, 50 steps:

Metric	Without Cache-DiT	With Cache-DiT 1.3.0	Speedup
Total generation time (AR+Diffusion)	45.86s	35.74s	1.28x
Diffusion stage time	~15s	~5s	~3x
Cached steps / total steps	0/50	35/50 (70%)	—

Cache-dit DBCache config: F1B0_W4_threshold=0.24_MC3

Note: The total generation time is dominated by the AR stage (~27s), so the diffusion-stage speedup (~3x) translates to a more modest ~1.28x end-to-end speedup. For workloads with longer diffusion steps or batch processing, the speedup would be more pronounced.

Cache-dit summary (from the accelerated run):

[Cache-DiT] ⚡️Cache Steps and Residual Diffs Statistics: GlmImageTransformerBlock
| Cache Steps | Diffs P50 | Diffs P95 | Diffs Max |
|-------------|-----------|-----------|-----------|
| 35          | 0.057     | 0.13      | 0.157     |

w/o cache-dit:

w/ cache-dit:

Signed-off-by: samithuang <285365963@qq.com>

Two optimizations that eliminate ~6.5s of IPC serialization overhead for single-stage diffusion pipelines (e.g. Wan2.2 I2V/T2V) in online serving mode: Phase 1 – Inline diffusion (eliminate Hop3): When there is exactly one diffusion stage in async mode, initialize OmniDiffusion directly in the orchestrator process instead of spawning a stage worker subprocess. This removes the entire Hop3 serialization path (pickle + mp.Queue/SHM) between the stage worker and orchestrator. GPU workers for tensor parallelism are still spawned by DiffusionExecutor. Phase 2 – SHM tensor transfer (optimize Hop1): Replace pickle-based serialization of large tensors through MessageQueue with POSIX shared memory. The worker copies tensor data into a named SHM segment and enqueues only lightweight metadata; the scheduler reconstructs the tensor from SHM. This reduces Hop1 overhead from ~3.4s to ~1.5s. Measured on Wan2.2-I2V-A14B (TP=2, 1280x720, 5s@16fps, 1 step): Before: e2e = 37.5s Phase 1: e2e = 33.1s (−4.4s) Phase 2: e2e = 31.0s (−2.1s) Total: e2e = 31.0s (−6.5s, −17.5%) Made-with: Cursor Signed-off-by: samithuang <285365963@qq.com>

…17.5%) perf: reduce IPC overhead for single-stage diffusion serving (~6.5s, 17.5%)

Signed-off-by: Samit <285365963@qq.com>

Signed-off-by: samithuang <285365963@qq.com>

Signed-off-by: Samit <285365963@qq.com>

Signed-off-by: samithuang <285365963@qq.com>

Upgrade cache-dit dependency to the latest release (1.3.0). All existing imports and APIs remain compatible. Verified with Qwen-Image offline inference showing ~2x speedup with cache-dit acceleration. Signed-off-by: yx <yx@users.noreply.github.com> Made-with: Cursor Signed-off-by: samithuang <285365963@qq.com>

…Image Add force_refresh_step_hint and force_refresh_step_policy to DiffusionCacheConfig and wire them through to DBCacheConfig. Register custom cache-dit enablers for HeliosPipeline, HeliosPyramidPipeline, and GlmImagePipeline. - Helios: multi-chunk denoise loop requires cache reset between chunks, so force_refresh_step_hint defaults to num_inference_steps and force_refresh_step_policy defaults to "repeat". - GLM-Image: editing mode preprocesses input image in one extra transformer call; force_refresh_step_hint=1 discards stale state. Signed-off-by: yx <yx@users.noreply.github.com> Made-with: Cursor Signed-off-by: samithuang <285365963@qq.com>

Add --cache-backend and --enable-cache-dit-summary CLI arguments to the GLM-Image offline inference example, enabling cache-dit acceleration for the diffusion stage. Signed-off-by: yx <yx@users.noreply.github.com> Made-with: Cursor Signed-off-by: samithuang <285365963@qq.com>

Made-with: Cursor Signed-off-by: samithuang <285365963@qq.com>

Signed-off-by: Samit <285365963@qq.com>

wtomin · 2026-03-13T02:57:34Z

How is this PR different from #1399? Is it the cache-dit version different?

princepride

LGTM

princepride · 2026-03-13T03:08:25Z

How is this PR different from #1399? Is it the cache-dit version different?

Add two args: force_refresh_step_hint and force_refresh_step_policy?

SamitHuang · 2026-03-13T03:44:22Z

How is this PR different from #1399? Is it the cache-dit version different?

Add two args: force_refresh_step_hint and force_refresh_step_policy?

yes, support for cache-dit v1.3.0

princepride · 2026-03-13T05:53:30Z

@SamitHuang Can you cooperate with this pr's author: #1399, I hope we can encourage more wild developer participate in our project😊 !

SamitHuang · 2026-03-13T06:54:09Z

@SamitHuang Can you cooperate with this pr's author: #1399, I hope we can encourage more wild developer participate in our project😊 !

sure, happy to do that. I didn't notice 1399 previously

Gaohan123 · 2026-03-21T11:25:01Z

Do other models need this feature?

wtomin

LGTM.

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

DefTruth · 2026-03-23T12:53:19Z

Do other models need this feature?

reference docs at: https://cache-dit.readthedocs.io/en/latest/user_guide/CACHE_API/#mcc-multiple-cache-contexts-within-a-single-denoising-loop

….3.0 upgrade (vllm-project#1858) Signed-off-by: samithuang <285365963@qq.com> Signed-off-by: Samit <285365963@qq.com> Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com> Co-authored-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

SamitHuang added 22 commits March 6, 2026 10:14

add time cost log for different stages

26a4e8d

Signed-off-by: samithuang <285365963@qq.com>

reduce hop3 overhead

b3b70a8

Signed-off-by: samithuang <285365963@qq.com>

perf: reduce IPC overhead for single-stage diffusion serving (~6.5s, …

bf2ddb0

…17.5%) perf: reduce IPC overhead for single-stage diffusion serving (~6.5s, 17.5%)

Merge branch 'main' into main

735b2ca

Signed-off-by: Samit <285365963@qq.com>

fix conflicts

dd4468c

Signed-off-by: samithuang <285365963@qq.com>

rm redundancy

ff62a1e

Signed-off-by: samithuang <285365963@qq.com>

Merge branch 'main' into main

870963e

Signed-off-by: Samit <285365963@qq.com>

rm logs

5414a42

Signed-off-by: samithuang <285365963@qq.com>

fix inline

e3dec54

Signed-off-by: samithuang <285365963@qq.com>

fix ci

2cd9f9f

Signed-off-by: samithuang <285365963@qq.com>

fix ci

172040a

Signed-off-by: samithuang <285365963@qq.com>

fix log

0a86fc5

Signed-off-by: samithuang <285365963@qq.com>

fix

9b9c597

Signed-off-by: samithuang <285365963@qq.com>

fix log

bda0f2d

Signed-off-by: samithuang <285365963@qq.com>

Merge branch 'main' of https://github.com/samithuang/vllm-omni

3452ad3

Merge remote-tracking branch 'upstream/main'

9ab7c55

Merge remote-tracking branch 'upstream/main'

c32a78a

[Enhancement] Remove cache-dit support for Helios, keep GLM-Image only

c36e4b4

Made-with: Cursor Signed-off-by: samithuang <285365963@qq.com>

SamitHuang requested a review from hsliuustc0106 as a code owner March 12, 2026 17:44

SamitHuang changed the title ~~[Enhancement] Add cache-dit force_refresh support for GLM-Image and upgrade to 1.3.0~~ [Enhancement] Add cache-dit force_refresh support for GLM-Image based on cache-dit 1.3.0 Mar 12, 2026

Merge branch 'main' into feat/cache-dit-helios-glm-image

c9e7f43

Signed-off-by: Samit <285365963@qq.com>

SamitHuang changed the title ~~[Enhancement] Add cache-dit force_refresh support for GLM-Image based on cache-dit 1.3.0~~ [Enhancement] Add cache-dit support for GLM-Image based on force_refresh in cache-dit 1.3.0 Mar 12, 2026

SamitHuang requested review from princepride and wtomin March 13, 2026 01:53

princepride approved these changes Mar 13, 2026

View reviewed changes

SamitHuang mentioned this pull request Mar 13, 2026

[Feat] cache-dit for GLM-Image #1399

Merged

SamitHuang changed the title ~~[Enhancement] Add cache-dit support for GLM-Image based on force_refresh in cache-dit 1.3.0~~ [Enhancement] Add force_refresh support for GLM-Image for cache-dit 1.3.0 upgrade Mar 13, 2026

Gaohan123 added this to the v0.18.0 milestone Mar 21, 2026

wtomin approved these changes Mar 23, 2026

View reviewed changes

Merge branch 'main' into feat/cache-dit-helios-glm-image

833ce12

Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

wtomin added the ready label to trigger buildkite CI label Mar 23, 2026

wtomin merged commit 9ead0d8 into vllm-project:main Mar 24, 2026
7 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Add force_refresh support for GLM-Image for cache-dit 1.3.0 upgrade#1858

[Enhancement] Add force_refresh support for GLM-Image for cache-dit 1.3.0 upgrade#1858
wtomin merged 24 commits intovllm-project:mainfrom
SamitHuang:feat/cache-dit-helios-glm-image

SamitHuang commented Mar 12, 2026 •

edited

Loading

Uh oh!

wtomin commented Mar 13, 2026

Uh oh!

princepride left a comment

Uh oh!

princepride commented Mar 13, 2026

Uh oh!

SamitHuang commented Mar 13, 2026

Uh oh!

princepride commented Mar 13, 2026

Uh oh!

SamitHuang commented Mar 13, 2026

Uh oh!

Gaohan123 commented Mar 21, 2026

Uh oh!

wtomin left a comment

Uh oh!

DefTruth commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

SamitHuang commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Why GLM-Image needs special handling

Changes

Dependency

Test Plan

Test Commands (GLM-Image)

Test Result

Uh oh!

wtomin commented Mar 13, 2026

Uh oh!

princepride left a comment

Choose a reason for hiding this comment

Uh oh!

princepride commented Mar 13, 2026

Uh oh!

SamitHuang commented Mar 13, 2026

Uh oh!

princepride commented Mar 13, 2026

Uh oh!

SamitHuang commented Mar 13, 2026

Uh oh!

Gaohan123 commented Mar 21, 2026

Uh oh!

wtomin left a comment

Choose a reason for hiding this comment

Uh oh!

DefTruth commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

SamitHuang commented Mar 12, 2026 •

edited

Loading