[Diffusion] Implement MagCache by eitanturok · Pull Request #18498 · sgl-project/sglang

eitanturok · 2026-02-09T17:59:24Z

Motivation

This PR implements MagCache, a timestep caching method for diffusion models.

On Wan2.1-T2V-1.3B, MagCache generates videos 1.88x faster. It outperforms TeaCache (1.74x) with superior visual fidelity.

Beyond this implementation, I propose standardizing a TimeStep Diffusion Caching API for sglang diffusion models.

Background

Diffusion caching generally falls into two categories:

Block Cache (Intra-step): Skipping specific transformer blocks (e.g., TaylorSeer, DBCache, SCM).
TimeStep Cache (Inter-step): Skipping entire denoising timesteps based on feature similarity (e.g., TeaCache, MagCache).

Sglang supports numerous block caching methods via integration with the cache-dit library. But, it only supports a single timestep caching strategy, TeaCache, and lacks support for general purpose block-level caching.

A unified interface for step-level caching would allow sglang to support other step-level caching strategies such as MagCache, TaoCache, EasyCache, Chipmunk, etc., all of which outperform TeaCache in their papers. As a MVP, I implement MagCache in this PR.

To do:

[ ] Implement a unified TimestepCache base class to abstract skip-logic for TeaCache, MagCache, etc.
[ ] Add a calibration function to compute the features that determine which timesteps to skip in the TimeStep cache. In my benchmarking,(https://github.com/Zehong-Ma/MagCache/blob/df81cb181776c2c61477c08e1d21f87fda1cd938/MagCache4Wan2.1/magcache_generate.py#L912) and TeaCache use the calibrated features from their original papers. These calibrated features may differ in sglang due to different inference stacks, kernels, attention backends, etc.
[ ] Validate performance across additional models (e.g., HunyuanVideo).

Discussion Points

Diffusers Parity: MagCache was recently implemented in diffusers. Since sglang supports a diffusers backend, is this feature still desired?
Calibration Storage: Should sglang host pre-computed calibrated features for popular models to provide a "zero-setup" user experience?

Modifications

Introduced MagCacheMixin: Implements the magnitude-based thresholding logic.
Refactored ModelConfig: to include enable_magcache and associated threshold parameters.
Note: Current implementation is a functional draft; refactoring into a standardized Mixin/API is planned for the next iteration.

Accuracy Tests

Benchmarking and Profiling

On an A100 with Wan-AI/Wan2.1-T2V-1.3B-Diffusers, MagCache has 1.88x speedup over the baseline while TeaCache has a 1.74x speedup over the baseline. Both have minor drops in visual quality, but MagCache is better quality.

Method	Generation Time	Speedup
None (Base)	152.39s	1.00x
TeaCache	87.79s	1.74x
MagCache	80.89s	1.88x

1. Base (No Cache)

sglang generate --model-path "Wan-AI/Wan2.1-T2V-1.3B-Diffusers" --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." --num-inference-steps 50 --num-frames 81 --width 832 --height 480 --save-output --output-path outputs --output-file-name fight-base.mp4 --seed 42 --pin-cpu-memory

fight-base.mp4

2. TeaCache

sglang generate --model-path "Wan-AI/Wan2.1-T2V-1.3B-Diffusers" --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." --num-inference-steps 50 --num-frames 81 --width 832 --height 480 --save-output --output-path outputs --output-file-name fight-teacache.mp4 --seed 42 --pin-cpu-memory --enable-teacache

fight-teacache.mp4

3. MagCache

sglang generate --model-path "Wan-AI/Wan2.1-T2V-1.3B-Diffusers" --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." --num-inference-steps 50 --num-frames 81 --width 832 --height 480 --save-output --output-path outputs --output-file-name fight-magcache.mp4 --seed 42 --pin-cpu-memory --enable-magcache

fight-magcache.mp4

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

gemini-code-assist · 2026-02-09T18:00:00Z

Summary of Changes

Hello @eitanturok, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates MagCache, a novel technique designed to significantly enhance the speed of video generation in diffusion models by intelligently skipping redundant denoising steps. The implementation includes new classes for MagCache logic and parameters, alongside updates to the core sampling and API components to enable its use. This addition provides a measurable performance boost, making the generation process more efficient while maintaining comparable visual quality.

Highlights

MagCache Implementation: Introduced MagCache, a new optimization technique for diffusion models that accelerates video generation by skipping denoising steps based on the magnitude of residuals. This is implemented via a new MagCacheMixin class.
Performance Improvement: MagCache achieves a 1.88x speedup over the baseline for video generation on an A100 with Wan-AI/Wan2.1-T2V-1.3B-Diffusers, outperforming the existing TeaCache's 1.74x speedup.
Configuration and CLI Integration: Added enable_magcache as a new parameter to SamplingParams and integrated it into the CLI arguments and OpenAI API endpoints, allowing users to easily enable this optimization.
Model-Specific Parameters: Defined MagCacheParams for general configuration and WanMagCacheParams for WAN-specific settings, including pre-calibrated magnitude ratios for the Wan-AI/Wan2.1-T2V-1.3B-Diffusers model.
Unified Caching Logic: The CachableDiT base class now inherits from both TeaCacheMixin and MagCacheMixin, providing a unified mechanism to manage and route between different caching strategies based on enabled flags.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

python/sglang/cli/init.py
- Added icecream import and install() call for debugging purposes.
python/sglang/cli/generate.py
- Added ic(is_diffusion_model) for debugging.
python/sglang/multimodal_gen/apps/ComfyUI_SGLDiffusion/core/server_api.py
- Added ic(common_params) for debugging.
python/sglang/multimodal_gen/configs/sample/diffusers_generic.py
- Added ic(self) for debugging in __post_init__.
python/sglang/multimodal_gen/configs/sample/flux.py
- Imported field from dataclasses and TeaCacheParams.
- Added teacache_params field with default values to FluxSamplingParams.
python/sglang/multimodal_gen/configs/sample/magcache.py
- Added new file for MagCache configuration.
- Defined T2V_13B_MAG_RATIOS (pre-calibrated magnitude ratios for a specific model).
- Implemented nearest_interp and get_interpolated_mag_ratios for dynamic ratio adjustment.
- Defined MagCacheParams dataclass for general MagCache settings (threshold, max_skip_steps, retention_ratio).
- Defined WanMagCacheParams dataclass for WAN-specific MagCache parameters, including ret_steps and get_cutoff_steps properties.
python/sglang/multimodal_gen/configs/sample/sampling_params.py
- Added enable_magcache: bool = False to SamplingParams.
- Included ic debugging calls related to num_frames adjustment.
- Added --enable-magcache CLI argument for enabling the feature.
python/sglang/multimodal_gen/configs/sample/wan.py
- Imported WanMagCacheParams.
- Added magcache_params field with default WanMagCacheParams to various WanT2V and WanI2V sampling parameter classes.
python/sglang/multimodal_gen/runtime/cache/init.py
- Updated docstring to include MagCache as a supported optimization.
- Imported MagCacheContext and MagCacheMixin.
- Added MagCacheContext and MagCacheMixin to the module's __all__ export list.
python/sglang/multimodal_gen/runtime/cache/magcache.py
- Added new file implementing MagCacheContext and MagCacheMixin.
- Implemented state management (_init_magcache_state, reset_magcache_state, _update_magcache_state).
- Defined the core decision logic for skipping steps (_compute_magcache_decision).
- Included a placeholder for calibration (_calibrate_magcache) and context retrieval (_get_magcache_context).
- Incorporated logic for CFG-aware caching, accumulated error, and consecutive skip tracking.
python/sglang/multimodal_gen/runtime/cache/teacache.py
- Modified _get_teacache_context to include debugging calls and a fallback mechanism for retrieving teacache_params.
python/sglang/multimodal_gen/runtime/entrypoints/cli/generate.py
- Added ic debugging calls for server_args, sampling_params_kwargs, and generator.
python/sglang/multimodal_gen/runtime/entrypoints/diffusion_generator.py
- Added ic(requests) for debugging.
python/sglang/multimodal_gen/runtime/entrypoints/openai/image_api.py
- Added enable_magcache parameter to _build_sampling_params_from_request, generations, and edits functions for OpenAI image API.
python/sglang/multimodal_gen/runtime/entrypoints/openai/protocol.py
- Added enable_magcache: Optional[bool] = False to ImageGenerationsRequest and VideoGenerationsRequest protocols.
python/sglang/multimodal_gen/runtime/entrypoints/openai/video_api.py
- Added enable_magcache parameter to _build_sampling_params_from_request and create_video function for OpenAI video API.
python/sglang/multimodal_gen/runtime/entrypoints/utils.py
- Added ic debugging calls related to post_process_sample and video saving.
python/sglang/multimodal_gen/runtime/models/dits/base.py
- Imported MagCacheMixin.
- Modified CachableDiT to inherit from MagCacheMixin in addition to TeaCacheMixin and BaseDiT.
- Updated __init__ to call _init_magcache_state() alongside _init_teacache_state().
- Added reset_cache_state, maybe_cache_states, retrieve_cached_states, and should_skip_forward_for_cached_states methods to CachableDiT to provide a unified caching interface for both TeaCache and MagCache.
python/sglang/multimodal_gen/runtime/models/dits/wanvideo.py
- Imported WanMagCacheParams and MagCacheContext.
- Enabled self.enable_magcache based on forward_batch.
- Modified the forward method to cache original_hidden_states if either TeaCache or MagCache is enabled.
- Updated should_skip_forward_for_cached_states to route between TeaCache and MagCache logic, prioritizing TeaCache if enabled, then MagCache, and includes extensive debugging calls.
python/sglang/multimodal_gen/runtime/pipelines_core/schedule_batch.py
- Imported MagCacheParams and WanMagCacheParams.
- Added magcache_params field to the Req dataclass.

Activity

The pull request introduces a new feature, MagCache, for diffusion models, aiming to accelerate video generation.
The author has provided initial benchmarking results showing a significant speedup (1.88x) compared to the baseline and TeaCache.
The author acknowledges that the code is an initial implementation and requires cleanup.
Several 'To do' items are listed, including adding a calibration script, re-benchmarking, and testing on more models.
The author has posed questions to maintainers regarding the necessity of this feature given existing Diffusers backend support and interest in supporting calibration for denoising step skipping methods.
Numerous ic() (icecream) debugging calls have been added throughout the codebase, indicating active development and debugging efforts.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces MagCache, which is a great feature for accelerating diffusion models. The implementation looks mostly good, but there are several debugging statements (ic(...)) left in the code that should be removed before merging. There is also a critical issue with how magnitude ratios are calculated, as they are not dynamically adjusted based on the number of inference steps. I've provided a series of comments to address this. Additionally, there's a redundant method override that can be cleaned up.

python/sglang/multimodal_gen/runtime/cache/magcache.py

python/sglang/multimodal_gen/runtime/models/dits/wanvideo.py

python/sglang/cli/__init__.py

python/sglang/multimodal_gen/configs/sample/magcache.py

python/sglang/multimodal_gen/runtime/models/dits/wanvideo.py

python/sglang/cli/generate.py

python/sglang/multimodal_gen/runtime/models/dits/wanvideo.py

hubertlu-tw · 2026-03-05T23:02:36Z

/rerun-failed-ci

hubertlu-tw · 2026-03-05T23:04:12Z

@eitanturok can you resolve linter errors by
pre-commit run --all-files

mickqian · 2026-03-08T03:39:40Z

also cc @DefTruth, do we need this?

eitanturok · 2026-03-08T08:44:03Z

cache-dit only supports block-level caching and does not support magacache or teacache, which use timestep-level caching. That's why I thought this might be helpful.

Other frameworks like hf diffusers support magache.

DefTruth · 2026-03-09T08:51:57Z

@eitanturok Hi~ can you fix the conflicts? After you fix the conflicts, we might be able to quickly test this PR.

eitanturok · 2026-03-09T23:10:48Z

@DefTruth conflicts fixed

eitanturok added 5 commits February 8, 2026 21:19

prints for teacache

d6a1a27

AI slop, but I unserstand parts of it

e007e8d

magcache should_skip_forward is sometimes true

dbb9b92

faster but wrong

0cff136

i think it wokrs

93b0f3a

github-actions bot added the diffusion SGLang Diffusion label Feb 9, 2026

eitanturok closed this Feb 9, 2026

eitanturok reopened this Feb 9, 2026

gemini-code-assist bot reviewed Feb 9, 2026

View reviewed changes

eitanturok and others added 20 commits February 16, 2026 18:42

update

f9323bb

Merge branch 'main' into magcache2

12247da

update

161635d

update

9995c51

update

905617a

magcachemixin2

1b0ed0c

init calibration

c4df4f4

teacache works again

219273f

clean a little

2600608

clean up

4d858b8

update

995105f

magcache_param is sample param, not server arg

b461250

teacache is sample param, not server arg

ac5f832

start adding context

cbb46a5

update

2b6ca62

cleanup

e357f33

update

f104d36

update

5031758

cache

ee618b9

Merge branch 'main' into magcache2

a606ae2

eitanturok and others added 7 commits March 4, 2026 15:36

clenaup context, typcheck

35a3434

skip_start_step, skip_end_step

44c64b4

Merge branch 'main' into magcache2

0633ba5

Merge branch 'main' into magcache2

e0ab2c1

update

d03159c

update

5389755

Merge branch 'sgl-project:main' into magcache2

b8b3f8f

eitanturok marked this pull request as ready for review March 5, 2026 12:44

eitanturok requested review from BBuf, mickqian, ping1jing2, yhyang201 and yingluosanqian as code owners March 5, 2026 12:44

eitanturok mentioned this pull request Mar 5, 2026

[diffusion] Cleanup TeaCache #19957

Open

5 tasks

Merge branch 'sgl-project:main' into magcache2

12bd5c7

hubertlu-tw added the run-ci label Mar 5, 2026

eitanturok added 3 commits March 6, 2026 12:37

run precommit

07b2d5f

run black-jupyter

9134dd4

update

68e2823

eitanturok mentioned this pull request Mar 9, 2026

Support Timestep level caching? vipshop/cache-dit#840

Open

eitanturok added 2 commits March 9, 2026 12:39

Merge branch 'main' into magcache2

e70b788

Merge branch 'main' into magcache2

9cda094

Merge branch 'main' into magcache2

6d4be99

eitanturok mentioned this pull request Mar 28, 2026

[diffusion] Refactor TeaCache #21613

Open

5 tasks

Conversation

eitanturok commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Background

To do:

Discussion Points

Modifications

Accuracy Tests

Benchmarking and Profiling

1. Base (No Cache)

2. TeaCache

3. MagCache

Checklist

Review Process

Uh oh!

gemini-code-assist bot commented Feb 9, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hubertlu-tw commented Mar 5, 2026

Uh oh!

hubertlu-tw commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mickqian commented Mar 8, 2026

Uh oh!

eitanturok commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DefTruth commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eitanturok commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

eitanturok commented Feb 9, 2026 •

edited

Loading

hubertlu-tw commented Mar 5, 2026 •

edited

Loading

eitanturok commented Mar 8, 2026 •

edited

Loading

DefTruth commented Mar 9, 2026 •

edited

Loading