Skip to content

[diffusion] Cleanup TeaCache#19957

Open
eitanturok wants to merge 42 commits intosgl-project:mainfrom
eitanturok:teacache-refactor
Open

[diffusion] Cleanup TeaCache#19957
eitanturok wants to merge 42 commits intosgl-project:mainfrom
eitanturok:teacache-refactor

Conversation

@eitanturok
Copy link
Copy Markdown
Contributor

@eitanturok eitanturok commented Mar 5, 2026

Make Teacache use a generic DiffusionCache class so it is easier to add other timestep caching diffusion techniques such as Magcache. This is a prerequisite for #18498.

The Problem

Previously, TeaCache was implemented using mixins that tightly coupled the caching logic to individual model classes. This led to several issues:

  1. Code Duplication: Handling positive and negative CFG branches required duplicated code within the caching classs.
  2. Fragmented State: To compute teacache, some parameters were attached directly to the model, others would be attached to TeaCacheMixin, and others to TeaCacheContext. This made it difficult to manage across different generation requests.
  3. Inconsistent APIs: Different models (like Wan vs. Hunyuan) required different parameter classes (WanTeaCacheParams vs TeaCacheParams), making the user-facing API confusing.
  4. Integration Overhead: Adding TeaCache to a new model required nearly ~100 lines of boilerplate code.
  5. Broken Functionality: HunyuanVideo support was incomplete/broken due to these architectural limitations.
  6. Cannot support multiple types of caching:

Key Improvements

We split the logic into three distinct components:

  1. TeaCacheParams: User-provided settings (thresholds, offsets) defined in a request at the start of a generation.
  2. TeaCacheState: Internal state (cached tensors, accumulated L1 distances) that is updated after every forward pass. Crucially, the step counter is now attached here rather than the model.
  3. TeaCacheStrategy: The actual implementation that takes in TeaCacheParams and TeaCacheState to decide when to skip a computation.

Previously, Cacheable-DiT inherited TeaCacheMixin. To support multiple caching stratigies, Cacheable-DiT cannot inherit from all the different caching mixins bevause we only know which cache we want during the forward pass, not when we init and inherit fom cachable-dit. So we move the cache to Cacheable-DiT.cache and initializd it during init_cache().

  1. Simplified CFG Handling: Instead of duplicating code to handle positive and negative CFG branches, we now simply maintain two independent TeaCacheState objects within the strategy.
  2. HunyuanVideo Fixed: Since TeaCacheStrategy is better abstracted, supporting HunyuanVideo required only 5 lines of code, compared to ~100 lines before.
  3. Calibration Mode: Added a calibrate_cache flag. This allows users to calibrated the cache for a specific model and learn the values which the cache needs.
  4. Parameters: Unified WanTeaCacheParams into TeaCacheParams.
  5. Better docs and parameter names

Old PR description

Motivation

Make Teacache use a generic DiffusionCache class so it is easier to add other timestep caching diffusion techniques such as Magcache. This is a prerequisite for #18498.

Modifications

DiffusionCache class maintains a state to tracks the persistent information and a context for non-persistent information needed to determine if we should cache or not.

TeaCache previously had a lot of copy and pasted code to handle both positive and negative cfg classes. We abstract this away with a state for each cfg.

We also support a cache.calibrate method to compute the values needed for caching. Magcache implements this in #18498 and we can implement this for teacahe later.

We also add the parameter for teacache from the original paper into the config sampling params for Wan-AI/Wan2.1-T2V-1.3B-Diffusers.

The logic to compute skipping the forward pass was split in _compute_teacache_decision and _compute_l1_and_decide and now it is unified in should_skip_forward.

Accuracy Tests

Benchmarking and Profiling

Baseline (this branch)

sglang generate --model-path "Wan-AI/Wan2.1-T2V-1.3B-Diffusers" --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." --num-inference-steps 50 --num-frames 81 --width 832 --height 480 --save-output --output-path outputs --output-file-name fight-base.mp4 --seed 42 --pin-cpu-memory

[03-05 14:14:32] Completed batch processing. Generated 1 outputs in 150.78 seconds

fight-base.mp4

Baseline (main)

sglang generate --model-path "Wan-AI/Wan2.1-T2V-1.3B-Diffusers" --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." --num-inference-steps 50 --num-frames 81 --width 832 --height 480 --save-output --output-path outputs --output-file-name fight-base.mp4 --seed 42 --pin-cpu-memory

[03-05 14:25:39] Completed batch processing. Generated 1 outputs in 150.56 seconds

fight-base-main.mp4

Teacache (this branch)

sglang generate --model-path "Wan-AI/Wan2.1-T2V-1.3B-Diffusers" --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." --num-inference-steps 50 --num-frames 81 --width 832 --height 480 --save-output --output-path outputs --output-file-name fight-teacache.mp4 --seed 42 --pin-cpu-memory --enable-teacache

[03-05 12:45:47] Completed batch processing. Generated 1 outputs in 104.04 seconds

fight-teacache.mp4

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@yhyang201
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@github-actions github-actions bot added the run-ci label Mar 5, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a foundational architectural change by refactoring the TeaCache mechanism to leverage a new, generic DiffusionCache base class. This significantly improves the extensibility of the caching system, paving the way for the seamless addition of other timestep-level caching strategies. The update also includes a new calibration mode to gather performance statistics and refines existing TeaCache parameters for better control, ensuring the system is more adaptable and performant for future developments.

Highlights

  • Refactoring TeaCache: The existing TeaCache implementation has been refactored to utilize a new generic DiffusionCache base class, promoting modularity and easier integration of future timestep caching techniques like Magcache.
  • Introduction of DiffusionCache: A new abstract base class, DiffusionCache, has been introduced to standardize timestep-level caching strategies for diffusion models.
  • New Calibration Mode: A calibrate_cache parameter has been added, allowing the system to run in a calibration mode to collect magnitude ratio statistics without skipping steps, which is crucial for optimizing caching thresholds.
  • Updated TeaCache Parameters: The WanTeaCacheParams now includes default coefficients and parameters for skip_start_step and skip_end_step, providing more fine-grained control over when caching is active.
  • API Endpoint Integration: The calibrate_cache parameter has been integrated into the OpenAI-compatible image and video generation API endpoints, allowing users to control this feature via the API.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • python/sglang/multimodal_gen/configs/sample/sampling_params.py
    • Added a calibrate_cache boolean parameter to SamplingParams and its corresponding CLI argument.
  • python/sglang/multimodal_gen/configs/sample/teacache.py
    • Introduced TeaCacheState for per-CFG-branch state.
    • Refactored TeaCacheMixin into TeaCacheStrategy inheriting DiffusionCache.
    • Updated WanTeaCacheParams with default coefficients and skip step parameters.
  • python/sglang/multimodal_gen/runtime/cache/init.py
    • Updated exports to include DiffusionCache, TeaCacheState, and TeaCacheStrategy, and removed TeaCacheMixin.
  • python/sglang/multimodal_gen/runtime/cache/base.py
    • Added a new DiffusionCache abstract base class for diffusion model timestep caching strategies.
  • python/sglang/multimodal_gen/runtime/entrypoints/openai/image_api.py
    • Integrated the new calibrate_cache parameter into image generation and editing API calls.
  • python/sglang/multimodal_gen/runtime/entrypoints/openai/protocol.py
    • Added calibrate_cache as an optional boolean field to ImageGenerationsRequest and VideoGenerationsRequest models.
  • python/sglang/multimodal_gen/runtime/entrypoints/openai/video_api.py
    • Incorporated the calibrate_cache parameter into video generation API calls.
  • python/sglang/multimodal_gen/runtime/loader/weights_updater.py
    • Modified the cache reset logic to use the new CachableDiT.cache.reset() method.
  • python/sglang/multimodal_gen/runtime/models/dits/base.py
    • Refactored CachableDiT to compose DiffusionCache instead of inheriting TeaCacheMixin.
    • Introduced init_cache for lazy initialization.
    • Updated cache interaction methods to delegate to self.cache.
  • python/sglang/multimodal_gen/runtime/models/dits/wanvideo.py
    • Adapted the forward method to utilize the new DiffusionCache strategy for TeaCache management.
    • Removed deprecated TeaCacheMixin methods.
Activity
  • The pull request is a prerequisite for issue [Diffusion] Implement MagCache  #18498, indicating it's part of a larger feature development.
  • A baseline command for benchmarking is provided, suggesting that performance considerations are important for this change.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@eitanturok eitanturok changed the title init Cleanup TeaCache Mar 5, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a generic DiffusionCache base class, refactoring TeaCache from a mixin to a TeaCacheStrategy. This is a strong architectural improvement that enhances modularity and will make it easier to add other caching mechanisms like MagCache in the future. However, a high-severity Denial of Service (DoS) vulnerability was identified in the TeaCacheStrategy.get_context implementation. This is due to an assertion verifying the presence of teacache_params in sampling_params, which are not currently defined or initialized in the SamplingParams class. This allows an attacker to crash the model runner process by simply enabling TeaCache in a generation request. Please address this by ensuring proper parameter initialization and graceful error handling. Additionally, there is a suggestion to prevent a potential division-by-zero error.

@eitanturok eitanturok changed the title Cleanup TeaCache [diffusion] Cleanup TeaCache Mar 5, 2026
eitanturok and others added 3 commits March 5, 2026 15:57
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@mickqian
Copy link
Copy Markdown
Collaborator

mickqian commented Mar 5, 2026

/tag-and-rerun-ci

@mickqian
Copy link
Copy Markdown
Collaborator

mickqian commented Mar 5, 2026

also cc @DefTruth for this one

@yhyang201
Copy link
Copy Markdown
Collaborator

@mickqian Nvidia CI passed and PR is approved, ready for merge

— SGLDHelper bot

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Mar 7, 2026
@eitanturok
Copy link
Copy Markdown
Contributor Author

To do:

  1. Better docs
  2. Remove ret_use
  3. Compute polynomial on GPU, not CPU
  4. Make sure works with torch compile
  5. remove cache_type

@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

3 similar comments
@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

2 similar comments
@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

6 similar comments
@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

3 similar comments
@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

3 similar comments
@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

diffusion SGLang Diffusion documentation Improvements or additions to documentation run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants