Skip to content

[Feat] cache-dit for GLM-Image#1399

Merged
hsliuustc0106 merged 7 commits intovllm-project:mainfrom
RuixiangMa:glmforcachedit
Apr 18, 2026
Merged

[Feat] cache-dit for GLM-Image#1399
hsliuustc0106 merged 7 commits intovllm-project:mainfrom
RuixiangMa:glmforcachedit

Conversation

@RuixiangMa
Copy link
Copy Markdown
Contributor

@RuixiangMa RuixiangMa commented Feb 18, 2026

Purpose

support cache-dit for GLM-Image

Test Plan

Test Result

curl -X POST http://localhost:8004/v1/images/generations   -H "Content-Type: application/json"   -d '{
    "prompt": "A beautifully designed modern food magazine style dessert recipe illustration, themed around a raspberry mousse cake. The overall layout is clean and bright, divided into four main areas: the top left features a bold black title '\''Raspberry Mousse Cake Recipe Guide'\'', with a soft-lit close-up photo of the finished cake on the right, showcasing a light pink cake adorned with fresh raspberries and mint leaves; the bottom left contains an ingredient list section, titled '\''Ingredients'\'' in a simple font, listing '\''Flour 150g'\'', '\''Eggs 3'\'', '\''Sugar 120g'\'', '\''Raspberry puree 200g'\'', '\''Gelatin sheets 10g'\'', '\''Whipping cream 300ml'\'', and '\''Fresh raspberries'\'', each accompanied by minimalist line icons (like a flour bag, eggs, sugar jar, etc.); the bottom right displays four equally sized step boxes, each containing high-definition macro photos and corresponding instructions, arranged from top to bottom as follows: Step 1 shows a whisk whipping white foam (with the instruction '\''Whip egg whites to stiff peaks'\''), Step 2 shows a red-and-white mixture being folded with a spatula (with the instruction '\''Gently fold in the puree and batter'\''), Step 3 shows pink liquid being poured into a round mold (with the instruction '\''Pour into mold and chill for 4 hours'\''), Step 4 shows the finished cake decorated with raspberries and mint leaves (with the instruction '\''Decorate with raspberries and mint'\''); a light brown information bar runs along the bottom edge, with icons on the left representing '\''Preparation time: 30 minutes'\'', '\''Cooking time: 20 minutes'\'', and '\''Servings: 8'\''. The overall color scheme is dominated by creamy white and light pink, with a subtle paper texture in the background, featuring compact and orderly text and image layout with clear information hierarchy.",
    "height": 1024,
    "width": 1024,
    "num_inference_steps": 50,
    "guidance_scale": 1.5,
    "seed": 42
  }' | jq -r '.data[0].b64_json' | base64 -d > recipe.png
Metric NO cache-dit cache-dit
Image recipe recipecachedit
Time 90 s/img 54 s/img

@RuixiangMa RuixiangMa marked this pull request as draft February 18, 2026 12:26
Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work — the ~40% speedup is solid and output quality looks well preserved.

@RuixiangMa RuixiangMa marked this pull request as ready for review February 24, 2026 07:53
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

@vllm-omni-reviewer

@github-actions
Copy link
Copy Markdown

🤖 VLLM-Omni PR Review

Code Review

1. Overview

This PR introduces support for cache-dit (Caching Diffusion Transformers) for the GlmImagePipeline. It adds a new function enable_cache_for_glm_image which configures the caching mechanism specifically for the GLM-Image model architecture and registers it in the custom enablers dictionary.

The changes are focused and follow the established patterns in the codebase. The test results demonstrate a significant performance improvement (approximately 40% speedup: 90s/img -> 54s/img) with visually consistent output quality.

Assessment: Positive. The implementation is clean, well-documented, and consistent with existing code patterns.

2. Code Quality

  • Code Style: The code adheres to the existing style guide (docstrings, logging, variable naming conventions).
  • Consistency: The implementation mirrors the structure of other enabler functions (e.g., enable_cache_for_sd3, enable_cache_for_bagel), ensuring consistency across the codebase.
  • Comments: The inline comment regarding patch_functor and ForwardPattern is excellent. It explains why a specific configuration was chosen relative to the standard diffusers implementation, which is crucial for future maintenance.
  • Potential Bugs: No obvious bugs detected. The logic handles configuration building, TaylorSeer calibration (optional), and context refreshing correctly.

3. Architecture & Design

  • Integration: The PR correctly utilizes the plugin-style architecture by updating the CUSTOM_DIT_ENABLERS dictionary. This decouples the specific model logic from the core pipeline execution logic.
  • Design Patterns: Uses the Factory pattern (via the dictionary registration) and closures (returning refresh_cache_context).
  • Adapter Configuration:
    • ForwardPattern.Pattern_0: Correctly identified as necessary because the vLLM-Omni implementation returns (hidden_states, encoder_hidden_states).
    • has_separate_cfg=True: This suggests the pipeline handles Classifier-Free Guidance in a specific manner. This flag is critical for cache correctness; assuming this matches the GlmImagePipeline implementation in vLLM-Omni, this is correct.

4. Security & Safety

  • Resource Management: Caching mechanisms inherently trade memory for computation speed. This implementation relies on DBCacheConfig to manage these resources. There are no obvious memory leaks introduced in the Python logic.
  • Input Validation: The function relies on cache_config being a valid DiffusionCacheConfig object, which is consistent with other enablers.

5. Testing & Documentation

  • Test Plan: The PR provides a clear curl command for reproducibility.
  • Results: The comparison table effectively demonstrates the value of the feature (speed increase) and validates that image quality is preserved.
  • Documentation: The docstrings are clear and explain the arguments and the returned closure.

6. Specific Suggestions

vllm_omni/diffusion/cache/cache_dit_backend.py:899 (ForwardPattern)
The comment states: "We use ForwardPattern.Pattern_0 because our block returns (hidden_states, encoder_hidden_states)".

  • Suggestion: Ensure this pattern strictly matches the signature of pipeline.transformer.transformer_blocks[i].forward. If the vLLM-Omni implementation changes in the future to match standard diffusers (returning only hidden_states), this would need updating. The current implementation seems correct based on your comment.

vllm_omni/diffusion/cache/cache_dit_backend.py:901 (has_separate_cfg)

  • Suggestion: Double-check that GlmImagePipeline in vLLM-Omni actually runs Conditional and Unconditional passes separately (or if this flag handles the specific CFG caching logic required). If the pipeline fuses CFG (batching cond/uncond), this flag might need to be False. Given the performance gains, it appears configured correctly, but this is a high-risk parameter if wrong.

vllm_omni/diffusion/cache/cache_dit_backend.py:936 (Registration)

  • Nitpick: The key "GlmImagePipeline" must exactly match the class name of the pipeline instance passed to the backend. This looks correct.

7. Approval Status

LGTM with suggestions

The implementation is solid, follows project conventions, and provides significant performance benefits. The suggestions above are primarily verification points regarding the specific ForwardPattern and CFG handling, which rely on the internal details of the GlmImage implementation in vLLM-Omni.

Action: If the internal pipeline signature matches the assumptions in the code (Pattern_0 and separate CFG), this is ready to merge.


This review was generated automatically by the VLLM-Omni PR Reviewer Bot
using glm-5.

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

Hi @RuixiangMa 👋

This PR hasn't been updated for 16 days. We're tracking stale PRs for the next release. Could you share the current status? Is there anything blocking progress?

Thanks!

@RuixiangMa
Copy link
Copy Markdown
Contributor Author

Hi @RuixiangMa 👋

This PR hasn't been updated for 16 days. We're tracking stale PRs for the next release. Could you share the current status? Is there anything blocking

No,it work well for me,awaiting review and merge

@SamitHuang
Copy link
Copy Markdown
Collaborator

@RuixiangMa can you fix the conflicts at first? we can merge it and then upgrade to cache-dit 1.3.0 in #1858

Signed-off-by: Lancer <maruixiang6688@gmail.com>
@RuixiangMa
Copy link
Copy Markdown
Contributor Author

@RuixiangMa can you fix the conflicts at first? we can merge it and then upgrade to cache-dit 1.3.0 in #1858

fixed

Copy link
Copy Markdown
Collaborator

@SamitHuang SamitHuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SamitHuang SamitHuang enabled auto-merge (squash) March 13, 2026 10:22
@Gaohan123 Gaohan123 added this to the v0.18.0 milestone Mar 17, 2026
@Gaohan123 Gaohan123 modified the milestones: v0.18.0, v0.20.0 Apr 14, 2026
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

fix precommits please, we expect this to be merged asap

Signed-off-by: Lancer <maruixiang6688@gmail.com>
auto-merge was automatically disabled April 17, 2026 04:34

Head branch was pushed to by a user without write access

@hsliuustc0106 hsliuustc0106 added the ready label to trigger buildkite CI label Apr 17, 2026
@hsliuustc0106 hsliuustc0106 merged commit 9cf1fe7 into vllm-project:main Apr 18, 2026
8 checks passed
lvliang-intel pushed a commit to lvliang-intel/vllm-omni that referenced this pull request Apr 20, 2026
Signed-off-by: Lancer <maruixiang6688@gmail.com>
Co-authored-by: Samit <285365963@qq.com>
qinganrice pushed a commit to qinganrice/vllm-omni that referenced this pull request Apr 23, 2026
Signed-off-by: Lancer <maruixiang6688@gmail.com>
Co-authored-by: Samit <285365963@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants