[Enhancement] Upgrade cache-dit from 1.2.0 to 1.3.0 by SamitHuang · Pull Request #1834 · vllm-project/vllm-omni

SamitHuang · 2026-03-12T04:47:22Z

Purpose

Upgrade the cache-dit dependency from 1.2.0 to 1.3.0 (latest release). This is a version bump with full API backward compatibility — all existing imports, enable_cache(), refresh_context(), BlockAdapter, DBCacheConfig, etc., remain unchanged.

Test Plan

Verified all cache_dit imports used in vllm_omni/diffusion/cache/cache_dit_backend.py pass with 1.3.0
Ran offline inference benchmark on Qwen/Qwen-Image (text-to-image, 1024x1024, 50 steps) comparing with and without cache-dit acceleration
Pre-commit passes on the changed file

# Without cache-dit (baseline)
CUDA_VISIBLE_DEVICES=1 python examples/offline_inference/text_to_image/text_to_image.py \
  --model Qwen/Qwen-Image --prompt "a cup of coffee on the table" \
  --seed 142 --num-inference-steps 50 --height 1024 --width 1024

# With cache-dit 1.3.0
CUDA_VISIBLE_DEVICES=1 python examples/offline_inference/text_to_image/text_to_image.py \
  --model Qwen/Qwen-Image --prompt "a cup of coffee on the table" \
  --seed 142 --num-inference-steps 50 --height 1024 --width 1024 \
  --cache-backend cache_dit --enable-cache-dit-summary

Test Result

Benchmark on single NVIDIA H800 GPU:

Metric	Without Cache-DiT	With Cache-DiT 1.3.0	Speedup
Total generation time	7.551s	3.761s	2.01x
Diffusion engine exec time	7,436ms	3,644ms	2.04x

Cache-dit 1.3.0 delivers ~2x acceleration on Qwen-Image with default DBCache config (Fn=1, Bn=0, W=4, threshold=0.24), consistent with 1.2.0 behavior.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands.
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update — N/A, no doc changes needed.
(Optional) Release notes update — N/A, minor dependency bump.

Signed-off-by: samithuang <285365963@qq.com>

Two optimizations that eliminate ~6.5s of IPC serialization overhead for single-stage diffusion pipelines (e.g. Wan2.2 I2V/T2V) in online serving mode: Phase 1 – Inline diffusion (eliminate Hop3): When there is exactly one diffusion stage in async mode, initialize OmniDiffusion directly in the orchestrator process instead of spawning a stage worker subprocess. This removes the entire Hop3 serialization path (pickle + mp.Queue/SHM) between the stage worker and orchestrator. GPU workers for tensor parallelism are still spawned by DiffusionExecutor. Phase 2 – SHM tensor transfer (optimize Hop1): Replace pickle-based serialization of large tensors through MessageQueue with POSIX shared memory. The worker copies tensor data into a named SHM segment and enqueues only lightweight metadata; the scheduler reconstructs the tensor from SHM. This reduces Hop1 overhead from ~3.4s to ~1.5s. Measured on Wan2.2-I2V-A14B (TP=2, 1280x720, 5s@16fps, 1 step): Before: e2e = 37.5s Phase 1: e2e = 33.1s (−4.4s) Phase 2: e2e = 31.0s (−2.1s) Total: e2e = 31.0s (−6.5s, −17.5%) Made-with: Cursor Signed-off-by: samithuang <285365963@qq.com>

…17.5%) perf: reduce IPC overhead for single-stage diffusion serving (~6.5s, 17.5%)

Signed-off-by: Samit <285365963@qq.com>

Signed-off-by: samithuang <285365963@qq.com>

Signed-off-by: Samit <285365963@qq.com>

Signed-off-by: samithuang <285365963@qq.com>

Upgrade cache-dit dependency to the latest release (1.3.0). All existing imports and APIs remain compatible. Verified with Qwen-Image offline inference showing ~2x speedup with cache-dit acceleration. Signed-off-by: yx <yx@users.noreply.github.com> Made-with: Cursor Signed-off-by: samithuang <285365963@qq.com>

Signed-off-by: samithuang <285365963@qq.com> Signed-off-by: yiliu30 <yi4.liu@intel.com>

Signed-off-by: samithuang <285365963@qq.com>

SamitHuang added 18 commits March 6, 2026 10:14

add time cost log for different stages

26a4e8d

Signed-off-by: samithuang <285365963@qq.com>

reduce hop3 overhead

b3b70a8

Signed-off-by: samithuang <285365963@qq.com>

perf: reduce IPC overhead for single-stage diffusion serving (~6.5s, …

bf2ddb0

…17.5%) perf: reduce IPC overhead for single-stage diffusion serving (~6.5s, 17.5%)

Merge branch 'main' into main

735b2ca

Signed-off-by: Samit <285365963@qq.com>

fix conflicts

dd4468c

Signed-off-by: samithuang <285365963@qq.com>

rm redundancy

ff62a1e

Signed-off-by: samithuang <285365963@qq.com>

Merge branch 'main' into main

870963e

Signed-off-by: Samit <285365963@qq.com>

rm logs

5414a42

Signed-off-by: samithuang <285365963@qq.com>

fix inline

e3dec54

Signed-off-by: samithuang <285365963@qq.com>

fix ci

2cd9f9f

Signed-off-by: samithuang <285365963@qq.com>

fix ci

172040a

Signed-off-by: samithuang <285365963@qq.com>

fix log

0a86fc5

Signed-off-by: samithuang <285365963@qq.com>

fix

9b9c597

Signed-off-by: samithuang <285365963@qq.com>

fix log

bda0f2d

Signed-off-by: samithuang <285365963@qq.com>

Merge branch 'main' of https://github.com/samithuang/vllm-omni

3452ad3

Merge remote-tracking branch 'upstream/main'

9ab7c55

Merge remote-tracking branch 'upstream/main'

c32a78a

SamitHuang requested a review from hsliuustc0106 as a code owner March 12, 2026 04:47

gcanlin approved these changes Mar 12, 2026

View reviewed changes

DefTruth mentioned this pull request Mar 12, 2026

Bump up cache-dit to v1.3.0 #1835

Closed

5 tasks

SamitHuang force-pushed the upgrade/cache-dit-1.3.0 branch from 997ba86 to 30a6201 Compare March 12, 2026 06:20

SamitHuang force-pushed the upgrade/cache-dit-1.3.0 branch from 30a6201 to 5fcf302 Compare March 12, 2026 06:49

SamitHuang added the ready label to trigger buildkite CI label Mar 12, 2026

SamitHuang mentioned this pull request Mar 12, 2026

[Enhancement] Add cache-dit force_refresh support for Helios and GLM-Image SamitHuang/vllm-omni#2

Open

7 tasks

SamitHuang merged commit 4dbaa74 into vllm-project:main Mar 12, 2026
7 checks passed

SamitHuang mentioned this pull request Mar 12, 2026

[Enhancement] Add force_refresh support for GLM-Image for cache-dit 1.3.0 upgrade #1858

Merged

2 tasks

yiliu30 pushed a commit to yiliu30/vllm-omni-fork that referenced this pull request Mar 20, 2026

[Enhancement] Upgrade cache-dit from 1.2.0 to 1.3.0 (vllm-project#1834)

3bd3d20

Signed-off-by: samithuang <285365963@qq.com> Signed-off-by: yiliu30 <yi4.liu@intel.com>

clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026

[Enhancement] Upgrade cache-dit from 1.2.0 to 1.3.0 (vllm-project#1834)

0e15ae4

Signed-off-by: samithuang <285365963@qq.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Upgrade cache-dit from 1.2.0 to 1.3.0#1834

[Enhancement] Upgrade cache-dit from 1.2.0 to 1.3.0#1834
SamitHuang merged 19 commits into
vllm-project:mainfrom
SamitHuang:upgrade/cache-dit-1.3.0

SamitHuang commented Mar 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SamitHuang commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SamitHuang commented Mar 12, 2026 •

edited

Loading