Skip to content

[Diffusion] Implement MagCache #18498

Open
eitanturok wants to merge 54 commits intosgl-project:mainfrom
eitanturok:magcache2
Open

[Diffusion] Implement MagCache #18498
eitanturok wants to merge 54 commits intosgl-project:mainfrom
eitanturok:magcache2

Conversation

@eitanturok
Copy link
Copy Markdown
Contributor

@eitanturok eitanturok commented Feb 9, 2026

Motivation

This PR implements MagCache, a timestep caching method for diffusion models.

On Wan2.1-T2V-1.3B, MagCache generates videos 1.88x faster. It outperforms TeaCache (1.74x) with superior visual fidelity.

Beyond this implementation, I propose standardizing a TimeStep Diffusion Caching API for sglang diffusion models.

Background

Diffusion caching generally falls into two categories:

  1. Block Cache (Intra-step): Skipping specific transformer blocks (e.g., TaylorSeer, DBCache, SCM).
  2. TimeStep Cache (Inter-step): Skipping entire denoising timesteps based on feature similarity (e.g., TeaCache, MagCache).

Sglang supports numerous block caching methods via integration with the cache-dit library. But, it only supports a single timestep caching strategy, TeaCache, and lacks support for general purpose block-level caching.

A unified interface for step-level caching would allow sglang to support other step-level caching strategies such as MagCache, TaoCache, EasyCache, Chipmunk, etc., all of which outperform TeaCache in their papers. As a MVP, I implement MagCache in this PR.

To do:

[ ] Implement a unified TimestepCache base class to abstract skip-logic for TeaCache, MagCache, etc.
[ ] Add a calibration function to compute the features that determine which timesteps to skip in the TimeStep cache. In my benchmarking,(https://github.com/Zehong-Ma/MagCache/blob/df81cb181776c2c61477c08e1d21f87fda1cd938/MagCache4Wan2.1/magcache_generate.py#L912) and TeaCache use the calibrated features from their original papers. These calibrated features may differ in sglang due to different inference stacks, kernels, attention backends, etc.
[ ] Validate performance across additional models (e.g., HunyuanVideo).

Discussion Points

  1. Diffusers Parity: MagCache was recently implemented in diffusers. Since sglang supports a diffusers backend, is this feature still desired?
  2. Calibration Storage: Should sglang host pre-computed calibrated features for popular models to provide a "zero-setup" user experience?

Modifications

  1. Introduced MagCacheMixin: Implements the magnitude-based thresholding logic.
  2. Refactored ModelConfig: to include enable_magcache and associated threshold parameters.
  3. Note: Current implementation is a functional draft; refactoring into a standardized Mixin/API is planned for the next iteration.

Accuracy Tests

Benchmarking and Profiling

On an A100 with Wan-AI/Wan2.1-T2V-1.3B-Diffusers, MagCache has 1.88x speedup over the baseline while TeaCache has a 1.74x speedup over the baseline. Both have minor drops in visual quality, but MagCache is better quality.

Method Generation Time Speedup
None (Base) 152.39s 1.00x
TeaCache 87.79s 1.74x
MagCache 80.89s 1.88x

1. Base (No Cache)

sglang generate --model-path "Wan-AI/Wan2.1-T2V-1.3B-Diffusers" --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." --num-inference-steps 50 --num-frames 81 --width 832 --height 480 --save-output --output-path outputs --output-file-name fight-base.mp4 --seed 42 --pin-cpu-memory
fight-base.mp4

2. TeaCache

sglang generate --model-path "Wan-AI/Wan2.1-T2V-1.3B-Diffusers" --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." --num-inference-steps 50 --num-frames 81 --width 832 --height 480 --save-output --output-path outputs --output-file-name fight-teacache.mp4 --seed 42 --pin-cpu-memory --enable-teacache
fight-teacache.mp4

3. MagCache

sglang generate --model-path "Wan-AI/Wan2.1-T2V-1.3B-Diffusers" --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." --num-inference-steps 50 --num-frames 81 --width 832 --height 480 --save-output --output-path outputs --output-file-name fight-magcache.mp4 --seed 42 --pin-cpu-memory --enable-magcache
fight-magcache.mp4

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@github-actions github-actions bot added the diffusion SGLang Diffusion label Feb 9, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @eitanturok, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates MagCache, a novel technique designed to significantly enhance the speed of video generation in diffusion models by intelligently skipping redundant denoising steps. The implementation includes new classes for MagCache logic and parameters, alongside updates to the core sampling and API components to enable its use. This addition provides a measurable performance boost, making the generation process more efficient while maintaining comparable visual quality.

Highlights

  • MagCache Implementation: Introduced MagCache, a new optimization technique for diffusion models that accelerates video generation by skipping denoising steps based on the magnitude of residuals. This is implemented via a new MagCacheMixin class.
  • Performance Improvement: MagCache achieves a 1.88x speedup over the baseline for video generation on an A100 with Wan-AI/Wan2.1-T2V-1.3B-Diffusers, outperforming the existing TeaCache's 1.74x speedup.
  • Configuration and CLI Integration: Added enable_magcache as a new parameter to SamplingParams and integrated it into the CLI arguments and OpenAI API endpoints, allowing users to easily enable this optimization.
  • Model-Specific Parameters: Defined MagCacheParams for general configuration and WanMagCacheParams for WAN-specific settings, including pre-calibrated magnitude ratios for the Wan-AI/Wan2.1-T2V-1.3B-Diffusers model.
  • Unified Caching Logic: The CachableDiT base class now inherits from both TeaCacheMixin and MagCacheMixin, providing a unified mechanism to manage and route between different caching strategies based on enabled flags.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • python/sglang/cli/init.py
    • Added icecream import and install() call for debugging purposes.
  • python/sglang/cli/generate.py
    • Added ic(is_diffusion_model) for debugging.
  • python/sglang/multimodal_gen/apps/ComfyUI_SGLDiffusion/core/server_api.py
    • Added ic(common_params) for debugging.
  • python/sglang/multimodal_gen/configs/sample/diffusers_generic.py
    • Added ic(self) for debugging in __post_init__.
  • python/sglang/multimodal_gen/configs/sample/flux.py
    • Imported field from dataclasses and TeaCacheParams.
    • Added teacache_params field with default values to FluxSamplingParams.
  • python/sglang/multimodal_gen/configs/sample/magcache.py
    • Added new file for MagCache configuration.
    • Defined T2V_13B_MAG_RATIOS (pre-calibrated magnitude ratios for a specific model).
    • Implemented nearest_interp and get_interpolated_mag_ratios for dynamic ratio adjustment.
    • Defined MagCacheParams dataclass for general MagCache settings (threshold, max_skip_steps, retention_ratio).
    • Defined WanMagCacheParams dataclass for WAN-specific MagCache parameters, including ret_steps and get_cutoff_steps properties.
  • python/sglang/multimodal_gen/configs/sample/sampling_params.py
    • Added enable_magcache: bool = False to SamplingParams.
    • Included ic debugging calls related to num_frames adjustment.
    • Added --enable-magcache CLI argument for enabling the feature.
  • python/sglang/multimodal_gen/configs/sample/wan.py
    • Imported WanMagCacheParams.
    • Added magcache_params field with default WanMagCacheParams to various WanT2V and WanI2V sampling parameter classes.
  • python/sglang/multimodal_gen/runtime/cache/init.py
    • Updated docstring to include MagCache as a supported optimization.
    • Imported MagCacheContext and MagCacheMixin.
    • Added MagCacheContext and MagCacheMixin to the module's __all__ export list.
  • python/sglang/multimodal_gen/runtime/cache/magcache.py
    • Added new file implementing MagCacheContext and MagCacheMixin.
    • Implemented state management (_init_magcache_state, reset_magcache_state, _update_magcache_state).
    • Defined the core decision logic for skipping steps (_compute_magcache_decision).
    • Included a placeholder for calibration (_calibrate_magcache) and context retrieval (_get_magcache_context).
    • Incorporated logic for CFG-aware caching, accumulated error, and consecutive skip tracking.
  • python/sglang/multimodal_gen/runtime/cache/teacache.py
    • Modified _get_teacache_context to include debugging calls and a fallback mechanism for retrieving teacache_params.
  • python/sglang/multimodal_gen/runtime/entrypoints/cli/generate.py
    • Added ic debugging calls for server_args, sampling_params_kwargs, and generator.
  • python/sglang/multimodal_gen/runtime/entrypoints/diffusion_generator.py
    • Added ic(requests) for debugging.
  • python/sglang/multimodal_gen/runtime/entrypoints/openai/image_api.py
    • Added enable_magcache parameter to _build_sampling_params_from_request, generations, and edits functions for OpenAI image API.
  • python/sglang/multimodal_gen/runtime/entrypoints/openai/protocol.py
    • Added enable_magcache: Optional[bool] = False to ImageGenerationsRequest and VideoGenerationsRequest protocols.
  • python/sglang/multimodal_gen/runtime/entrypoints/openai/video_api.py
    • Added enable_magcache parameter to _build_sampling_params_from_request and create_video function for OpenAI video API.
  • python/sglang/multimodal_gen/runtime/entrypoints/utils.py
    • Added ic debugging calls related to post_process_sample and video saving.
  • python/sglang/multimodal_gen/runtime/models/dits/base.py
    • Imported MagCacheMixin.
    • Modified CachableDiT to inherit from MagCacheMixin in addition to TeaCacheMixin and BaseDiT.
    • Updated __init__ to call _init_magcache_state() alongside _init_teacache_state().
    • Added reset_cache_state, maybe_cache_states, retrieve_cached_states, and should_skip_forward_for_cached_states methods to CachableDiT to provide a unified caching interface for both TeaCache and MagCache.
  • python/sglang/multimodal_gen/runtime/models/dits/wanvideo.py
    • Imported WanMagCacheParams and MagCacheContext.
    • Enabled self.enable_magcache based on forward_batch.
    • Modified the forward method to cache original_hidden_states if either TeaCache or MagCache is enabled.
    • Updated should_skip_forward_for_cached_states to route between TeaCache and MagCache logic, prioritizing TeaCache if enabled, then MagCache, and includes extensive debugging calls.
  • python/sglang/multimodal_gen/runtime/pipelines_core/schedule_batch.py
    • Imported MagCacheParams and WanMagCacheParams.
    • Added magcache_params field to the Req dataclass.
Activity
  • The pull request introduces a new feature, MagCache, for diffusion models, aiming to accelerate video generation.
  • The author has provided initial benchmarking results showing a significant speedup (1.88x) compared to the baseline and TeaCache.
  • The author acknowledges that the code is an initial implementation and requires cleanup.
  • Several 'To do' items are listed, including adding a calibration script, re-benchmarking, and testing on more models.
  • The author has posed questions to maintainers regarding the necessity of this feature given existing Diffusers backend support and interest in supporting calibration for denoising step skipping methods.
  • Numerous ic() (icecream) debugging calls have been added throughout the codebase, indicating active development and debugging efforts.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@eitanturok eitanturok closed this Feb 9, 2026
@eitanturok eitanturok reopened this Feb 9, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces MagCache, which is a great feature for accelerating diffusion models. The implementation looks mostly good, but there are several debugging statements (ic(...)) left in the code that should be removed before merging. There is also a critical issue with how magnitude ratios are calculated, as they are not dynamically adjusted based on the number of inference steps. I've provided a series of comments to address this. Additionally, there's a redundant method override that can be cleaned up.

@eitanturok eitanturok marked this pull request as ready for review March 5, 2026 12:44
@eitanturok eitanturok mentioned this pull request Mar 5, 2026
5 tasks
@hubertlu-tw
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@hubertlu-tw
Copy link
Copy Markdown
Collaborator

hubertlu-tw commented Mar 5, 2026

@eitanturok can you resolve linter errors by
pre-commit run --all-files

@mickqian
Copy link
Copy Markdown
Collaborator

mickqian commented Mar 8, 2026

also cc @DefTruth, do we need this?

@eitanturok
Copy link
Copy Markdown
Contributor Author

eitanturok commented Mar 8, 2026

cache-dit only supports block-level caching and does not support magacache or teacache, which use timestep-level caching. That's why I thought this might be helpful.

Other frameworks like hf diffusers support magache.

@DefTruth
Copy link
Copy Markdown
Contributor

DefTruth commented Mar 9, 2026

@eitanturok Hi~ can you fix the conflicts? After you fix the conflicts, we might be able to quickly test this PR.

@eitanturok
Copy link
Copy Markdown
Contributor Author

@DefTruth conflicts fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

diffusion SGLang Diffusion run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants