[Diffusion] Implement MagCache #18498
Conversation
Summary of ChangesHello @eitanturok, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request integrates MagCache, a novel technique designed to significantly enhance the speed of video generation in diffusion models by intelligently skipping redundant denoising steps. The implementation includes new classes for MagCache logic and parameters, alongside updates to the core sampling and API components to enable its use. This addition provides a measurable performance boost, making the generation process more efficient while maintaining comparable visual quality. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces MagCache, which is a great feature for accelerating diffusion models. The implementation looks mostly good, but there are several debugging statements (ic(...)) left in the code that should be removed before merging. There is also a critical issue with how magnitude ratios are calculated, as they are not dynamically adjusted based on the number of inference steps. I've provided a series of comments to address this. Additionally, there's a redundant method override that can be cleaned up.
|
/rerun-failed-ci |
|
@eitanturok can you resolve linter errors by |
|
also cc @DefTruth, do we need this? |
|
Other frameworks like hf diffusers support magache. |
|
@eitanturok Hi~ can you fix the conflicts? After you fix the conflicts, we might be able to quickly test this PR. |
|
@DefTruth conflicts fixed |
Motivation
This PR implements MagCache, a timestep caching method for diffusion models.
On Wan2.1-T2V-1.3B, MagCache generates videos 1.88x faster. It outperforms TeaCache (1.74x) with superior visual fidelity.
Beyond this implementation, I propose standardizing a TimeStep Diffusion Caching API for sglang diffusion models.
Background
Diffusion caching generally falls into two categories:
Sglang supports numerous block caching methods via integration with the cache-dit library. But, it only supports a single timestep caching strategy, TeaCache, and lacks support for general purpose block-level caching.
A unified interface for step-level caching would allow sglang to support other step-level caching strategies such as MagCache, TaoCache, EasyCache, Chipmunk, etc., all of which outperform TeaCache in their papers. As a MVP, I implement MagCache in this PR.
To do:
[ ] Implement a unified
TimestepCachebase class to abstract skip-logic for TeaCache, MagCache, etc.[ ] Add a calibration function to compute the features that determine which timesteps to skip in the TimeStep cache. In my benchmarking,(https://github.com/Zehong-Ma/MagCache/blob/df81cb181776c2c61477c08e1d21f87fda1cd938/MagCache4Wan2.1/magcache_generate.py#L912) and TeaCache use the calibrated features from their original papers. These calibrated features may differ in sglang due to different inference stacks, kernels, attention backends, etc.
[ ] Validate performance across additional models (e.g., HunyuanVideo).
Discussion Points
Modifications
Accuracy Tests
Benchmarking and Profiling
On an A100 with
Wan-AI/Wan2.1-T2V-1.3B-Diffusers, MagCache has 1.88x speedup over the baseline while TeaCache has a 1.74x speedup over the baseline. Both have minor drops in visual quality, but MagCache is better quality.1. Base (No Cache)
fight-base.mp4
2. TeaCache
fight-teacache.mp4
3. MagCache
fight-magcache.mp4
Checklist
Review Process
/tag-run-ci-label,/rerun-failed-ci,/tag-and-rerun-ci