[Feat] Support MagCache#1287
Conversation
Signed-off-by: Lancer <maruixiang6688@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 55e3509660
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Thanks for your contribution!😁 May I ask what's the difference with Cache Dit? |
55e3509 to
39d12e2
Compare
MagCache decides before computation whether to skip blocks by comparing residual magnitude ratios ( ||r_t|| / ||r_{t-1}|| ). It's faster but less accurate. Cache-DIT decides after computation by comparing actual residual differences ( max(|r_t - r_cached|) ). It's more accurate but requires computing the residual first; In addition to this, users only need one-time calibration per schedule—run once, copy the magnitude ratios, and reuse them everywhere. Super convenient. Honestly, the main reason for this PR is just that I found MagCache pretty interesting and the tests show it works pretty well 😄 Plus, seeing that Diffusers already added support made it even more fun to contribute. |
|
Thanks for your clarifying! |
|
@RuixiangMa Can we create some custom functions like: |
gcanlin
left a comment
There was a problem hiding this comment.
It looks like the generated picture accuracy has a little regression. Is it expected?
Yes, all cache algorithms will regress accuracy |
yeah,three key parameters control the quality-speed trade-off:
|
|
@RuixiangMa Hi, how is the progress, can you also update the docs about supported model list? And you can click the |
Thanks for the suggestion! I've done some refactoring to clean up the code,made it clearer and easier to plug in other models. |
c831f62 to
1e222bf
Compare
lishunyang12
left a comment
There was a problem hiding this comment.
Solid feature addition -- the hook architecture is clean and the strategy pattern makes it easy to extend. A few issues around correctness and dead code worth addressing before merge.
|
@vllm-omni-reviewer |
1 similar comment
|
@vllm-omni-reviewer |
lishunyang12
left a comment
There was a problem hiding this comment.
Most of the issues from my last review have been addressed -- thanks for the fixes. Two remaining things:
-
The single-block path in
apply_mag_cache_hookregisters two hooks on the same block (MagCacheHeadHook+MagCacheBlockHook). ButHookRegistry.dispatchuses thepre_forward/post_forwardchain for multi-hook blocks, which bypassesnew_forwardentirely. The caching logic won't run for single-block transformers. -
The dead code branch I flagged earlier is still present (see inline thread).
| logger.info(f"MagCache: Applying Head+Tail Hooks to single block '{name}'") | ||
| _apply_mag_cache_block_hook(block, state_manager, config, is_tail=True, strategy=strategy) | ||
| _apply_mag_cache_head_hook(block, state_manager, config, strategy) | ||
| return |
There was a problem hiding this comment.
When len(remaining_blocks) == 1, this registers both a head hook and a block hook on the same module. HookRegistry.dispatch with 2+ hooks uses the pre_forward/post_forward chain, not new_forward. Since both hooks only implement new_forward, the caching logic is silently skipped and you get a plain forward pass.
Either make single-block use a combined hook, or update dispatch to handle multi-hook new_forward chaining.
|
@RuixiangMa Please resolve conflicts |
Signed-off-by: Lancer <maruixiang6688@gmail.com>
4f7a4a2 to
359828c
Compare
Signed-off-by: Lancer <maruixiang6688@gmail.com>
fixed |
Signed-off-by: Lancer <maruixiang6688@gmail.com>
|
CI failure is unrelated to this PR. |
MagCache Integration
1. Overview
MagCache (Magnitude-based Cache) accelerates diffusion model inference by reusing transformer block computations. It decides whether to skip computation based on the residual magnitude ratio between consecutive timesteps.
Reference: https://github.com/Zehong-Ma/MagCache and https://github.com/huggingface/diffusers
2. Architecture
graph TB subgraph "ConfigSection" ConfigData[MagCacheConfig] end subgraph "StateLayer" StateData[MagCacheState] end subgraph "StrategyLayer" StrategyDef[MagCacheStrategy] end subgraph "Hooks" Head[MagCacheHeadHook] Block[MagCacheBlockHook] end ConfigData --> Head ConfigData --> Block StateData --> Head StateData --> Block StrategyDef --> Head StrategyDef --> BlockComponent Responsibilities:
graph LR Input[hidden_states] --> Head Head -->|compute| Blocks Head -->|skip| Tail Blocks --> Tail Tail --> Output[output]3. Usage
3.1 Quick Start
3.2 Calibration Mode
Run calibration when using a new model or changing scheduler to get optimal mag_ratios:
Calibration output:
norm_ratios: use directly asmag_ratiosnorm_stds: residual fluctuation per step (reference)cos_dises: residual direction change per step (reference)4. Adapting New Models
4.1 Overview
To add MagCache support for a new model, you need to implement a
MagCacheStrategysubclass and register it. The strategy handles model-specific logic for:mag_ratios: Pre-computed magnitude ratios for each transformer blockcompute_residual: How to calculate residual (override only if model has special output format)apply_residual_tuple: How to apply residual for tuple outputs (override for dual-stream models)4.2 Minimal Implementation
For models with standard Diffusion output format (
output = hidden_states + residual):Note:
transformer_cls_namemust exactly matchpipeline.transformer.__class__.__name__.4.3 Models with Special Output Format
If your model returns a tuple (e.g., dual-stream architectures like Flux), override
compute_residualandapply_residual_tuple:4.4 Block Registration and Metadata
MagCache uses
TransformerBlockRegistryto get metadata about transformer blocks. Each strategy can provide custom metadata for different block types.Auto-Registration with Strategy
Base Strategy Method
Override
register_block_metadatain your strategy to handle model-specific block types:Default Indices
To determine indices: check the return format of
block.forward().5. Core Parameters
This default configuration is consistent with the official MagCache implementation.
mag_thresholdmag_max_skip_stepsmag_retention_rationum_inference_stepsmag_ratios6. Workflow
sequenceDiagram participant Head as Head Hook participant Blocks as Transformer participant Block as Tail Hook Note over Head: each step start Head->>Head: check accumulated_err alt accumulated_err <= threshold Head->>Head: skip computation, apply cached residual Head-->>Block: return output else need compute Head->>Blocks: execute transformer Blocks-->>Block: return output Block->>Block: compute residual = output - input Block->>Block: store residual end7. Hook Responsibilities
MagCacheHeadHook vs MagCacheBlockHook
should_computebased on accumulated errorgraph LR subgraph "Inside Transformer" Head[MagCacheHeadHook<br/>entry] Blocks[All Transformer Blocks] Tail[MagCacheBlockHook<br/>exit] end Input[hidden_states] --> Head Head -->|need compute| Blocks Head -->|skip| Tail Blocks --> Tail Tail --> Output[output]Head Hook responsibilities:
Block Hook responsibilities:
8. File Structure
9. Supported Models
MagCache currently supports the following models:
black-forest-labs/FLUX.1-devblack-forest-labs/FLUX.2-klein-4B10. Test
2 * NVIDIA 4096(24G)