[Enhancement] Upgrade cache-dit from 1.2.0 to 1.3.0#1834
Merged
SamitHuang merged 19 commits intoMar 12, 2026
Conversation
Signed-off-by: samithuang <285365963@qq.com>
Signed-off-by: samithuang <285365963@qq.com>
Two optimizations that eliminate ~6.5s of IPC serialization overhead for single-stage diffusion pipelines (e.g. Wan2.2 I2V/T2V) in online serving mode: Phase 1 – Inline diffusion (eliminate Hop3): When there is exactly one diffusion stage in async mode, initialize OmniDiffusion directly in the orchestrator process instead of spawning a stage worker subprocess. This removes the entire Hop3 serialization path (pickle + mp.Queue/SHM) between the stage worker and orchestrator. GPU workers for tensor parallelism are still spawned by DiffusionExecutor. Phase 2 – SHM tensor transfer (optimize Hop1): Replace pickle-based serialization of large tensors through MessageQueue with POSIX shared memory. The worker copies tensor data into a named SHM segment and enqueues only lightweight metadata; the scheduler reconstructs the tensor from SHM. This reduces Hop1 overhead from ~3.4s to ~1.5s. Measured on Wan2.2-I2V-A14B (TP=2, 1280x720, 5s@16fps, 1 step): Before: e2e = 37.5s Phase 1: e2e = 33.1s (−4.4s) Phase 2: e2e = 31.0s (−2.1s) Total: e2e = 31.0s (−6.5s, −17.5%) Made-with: Cursor Signed-off-by: samithuang <285365963@qq.com>
…17.5%) perf: reduce IPC overhead for single-stage diffusion serving (~6.5s, 17.5%)
Signed-off-by: Samit <285365963@qq.com>
Signed-off-by: samithuang <285365963@qq.com>
Signed-off-by: samithuang <285365963@qq.com>
Signed-off-by: Samit <285365963@qq.com>
Signed-off-by: samithuang <285365963@qq.com>
gcanlin
approved these changes
Mar 12, 2026
997ba86 to
30a6201
Compare
Upgrade cache-dit dependency to the latest release (1.3.0). All existing imports and APIs remain compatible. Verified with Qwen-Image offline inference showing ~2x speedup with cache-dit acceleration. Signed-off-by: yx <yx@users.noreply.github.com> Made-with: Cursor Signed-off-by: samithuang <285365963@qq.com>
30a6201 to
5fcf302
Compare
Open
7 tasks
2 tasks
yiliu30
pushed a commit
to yiliu30/vllm-omni-fork
that referenced
this pull request
Mar 20, 2026
Signed-off-by: samithuang <285365963@qq.com> Signed-off-by: yiliu30 <yi4.liu@intel.com>
clodaghwalsh17
pushed a commit
to clodaghwalsh17/nm-vllm-omni-ent
that referenced
this pull request
May 12, 2026
Signed-off-by: samithuang <285365963@qq.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Upgrade the
cache-ditdependency from 1.2.0 to 1.3.0 (latest release). This is a version bump with full API backward compatibility — all existing imports,enable_cache(),refresh_context(),BlockAdapter,DBCacheConfig, etc., remain unchanged.Test Plan
cache_ditimports used invllm_omni/diffusion/cache/cache_dit_backend.pypass with 1.3.0Test Result
Benchmark on single NVIDIA H800 GPU:
Cache-dit 1.3.0 delivers ~2x acceleration on Qwen-Image with default DBCache config (
Fn=1, Bn=0, W=4, threshold=0.24), consistent with 1.2.0 behavior.Essential Elements of an Effective PR Description Checklist