[diffusion] Cleanup `TeaCache` by eitanturok · Pull Request #19957 · sgl-project/sglang

eitanturok · 2026-03-05T14:11:00Z

Make Teacache use a generic DiffusionCache class so it is easier to add other timestep caching diffusion techniques such as Magcache. This is a prerequisite for #18498.

The Problem

Previously, TeaCache was implemented using mixins that tightly coupled the caching logic to individual model classes. This led to several issues:

Code Duplication: Handling positive and negative CFG branches required duplicated code within the caching classs.
Fragmented State: To compute teacache, some parameters were attached directly to the model, others would be attached to TeaCacheMixin, and others to TeaCacheContext. This made it difficult to manage across different generation requests.
Inconsistent APIs: Different models (like Wan vs. Hunyuan) required different parameter classes (WanTeaCacheParams vs TeaCacheParams), making the user-facing API confusing.
Integration Overhead: Adding TeaCache to a new model required nearly ~100 lines of boilerplate code.
Broken Functionality: HunyuanVideo support was incomplete/broken due to these architectural limitations.
Cannot support multiple types of caching:

Key Improvements

We split the logic into three distinct components:

TeaCacheParams: User-provided settings (thresholds, offsets) defined in a request at the start of a generation.
TeaCacheState: Internal state (cached tensors, accumulated L1 distances) that is updated after every forward pass. Crucially, the step counter is now attached here rather than the model.
TeaCacheStrategy: The actual implementation that takes in TeaCacheParams and TeaCacheState to decide when to skip a computation.

Previously, Cacheable-DiT inherited TeaCacheMixin. To support multiple caching stratigies, Cacheable-DiT cannot inherit from all the different caching mixins bevause we only know which cache we want during the forward pass, not when we init and inherit fom cachable-dit. So we move the cache to Cacheable-DiT.cache and initializd it during init_cache().

Simplified CFG Handling: Instead of duplicating code to handle positive and negative CFG branches, we now simply maintain two independent TeaCacheState objects within the strategy.
HunyuanVideo Fixed: Since TeaCacheStrategy is better abstracted, supporting HunyuanVideo required only 5 lines of code, compared to ~100 lines before.
Calibration Mode: Added a calibrate_cache flag. This allows users to calibrated the cache for a specific model and learn the values which the cache needs.
Parameters: Unified WanTeaCacheParams into TeaCacheParams.
Better docs and parameter names

Old PR description

Motivation

Make Teacache use a generic DiffusionCache class so it is easier to add other timestep caching diffusion techniques such as Magcache. This is a prerequisite for #18498.

Modifications

DiffusionCache class maintains a state to tracks the persistent information and a context for non-persistent information needed to determine if we should cache or not.

TeaCache previously had a lot of copy and pasted code to handle both positive and negative cfg classes. We abstract this away with a state for each cfg.

We also support a cache.calibrate method to compute the values needed for caching. Magcache implements this in #18498 and we can implement this for teacahe later.

We also add the parameter for teacache from the original paper into the config sampling params for Wan-AI/Wan2.1-T2V-1.3B-Diffusers.

The logic to compute skipping the forward pass was split in _compute_teacache_decision and _compute_l1_and_decide and now it is unified in should_skip_forward.

Accuracy Tests

Benchmarking and Profiling

Baseline (this branch)

sglang generate --model-path "Wan-AI/Wan2.1-T2V-1.3B-Diffusers" --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." --num-inference-steps 50 --num-frames 81 --width 832 --height 480 --save-output --output-path outputs --output-file-name fight-base.mp4 --seed 42 --pin-cpu-memory

[03-05 14:14:32] Completed batch processing. Generated 1 outputs in 150.78 seconds

fight-base.mp4

Baseline (main)

sglang generate --model-path "Wan-AI/Wan2.1-T2V-1.3B-Diffusers" --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." --num-inference-steps 50 --num-frames 81 --width 832 --height 480 --save-output --output-path outputs --output-file-name fight-base.mp4 --seed 42 --pin-cpu-memory

[03-05 14:25:39] Completed batch processing. Generated 1 outputs in 150.56 seconds

fight-base-main.mp4

Teacache (this branch)

sglang generate --model-path "Wan-AI/Wan2.1-T2V-1.3B-Diffusers" --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." --num-inference-steps 50 --num-frames 81 --width 832 --height 480 --save-output --output-path outputs --output-file-name fight-teacache.mp4 --seed 42 --pin-cpu-memory --enable-teacache

[03-05 12:45:47] Completed batch processing. Generated 1 outputs in 104.04 seconds

fight-teacache.mp4

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

yhyang201 · 2026-03-05T14:11:48Z

/tag-and-rerun-ci

gemini-code-assist · 2026-03-05T14:12:54Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a foundational architectural change by refactoring the TeaCache mechanism to leverage a new, generic DiffusionCache base class. This significantly improves the extensibility of the caching system, paving the way for the seamless addition of other timestep-level caching strategies. The update also includes a new calibration mode to gather performance statistics and refines existing TeaCache parameters for better control, ensuring the system is more adaptable and performant for future developments.

Highlights

Refactoring TeaCache: The existing TeaCache implementation has been refactored to utilize a new generic DiffusionCache base class, promoting modularity and easier integration of future timestep caching techniques like Magcache.
Introduction of DiffusionCache: A new abstract base class, DiffusionCache, has been introduced to standardize timestep-level caching strategies for diffusion models.
New Calibration Mode: A calibrate_cache parameter has been added, allowing the system to run in a calibration mode to collect magnitude ratio statistics without skipping steps, which is crucial for optimizing caching thresholds.
Updated TeaCache Parameters: The WanTeaCacheParams now includes default coefficients and parameters for skip_start_step and skip_end_step, providing more fine-grained control over when caching is active.
API Endpoint Integration: The calibrate_cache parameter has been integrated into the OpenAI-compatible image and video generation API endpoints, allowing users to control this feature via the API.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

python/sglang/multimodal_gen/configs/sample/sampling_params.py
- Added a calibrate_cache boolean parameter to SamplingParams and its corresponding CLI argument.
python/sglang/multimodal_gen/configs/sample/teacache.py
- Introduced TeaCacheState for per-CFG-branch state.
- Refactored TeaCacheMixin into TeaCacheStrategy inheriting DiffusionCache.
- Updated WanTeaCacheParams with default coefficients and skip step parameters.
python/sglang/multimodal_gen/runtime/cache/init.py
- Updated exports to include DiffusionCache, TeaCacheState, and TeaCacheStrategy, and removed TeaCacheMixin.
python/sglang/multimodal_gen/runtime/cache/base.py
- Added a new DiffusionCache abstract base class for diffusion model timestep caching strategies.
python/sglang/multimodal_gen/runtime/entrypoints/openai/image_api.py
- Integrated the new calibrate_cache parameter into image generation and editing API calls.
python/sglang/multimodal_gen/runtime/entrypoints/openai/protocol.py
- Added calibrate_cache as an optional boolean field to ImageGenerationsRequest and VideoGenerationsRequest models.
python/sglang/multimodal_gen/runtime/entrypoints/openai/video_api.py
- Incorporated the calibrate_cache parameter into video generation API calls.
python/sglang/multimodal_gen/runtime/loader/weights_updater.py
- Modified the cache reset logic to use the new CachableDiT.cache.reset() method.
python/sglang/multimodal_gen/runtime/models/dits/base.py
- Refactored CachableDiT to compose DiffusionCache instead of inheriting TeaCacheMixin.
- Introduced init_cache for lazy initialization.
- Updated cache interaction methods to delegate to self.cache.
python/sglang/multimodal_gen/runtime/models/dits/wanvideo.py
- Adapted the forward method to utilize the new DiffusionCache strategy for TeaCache management.
- Removed deprecated TeaCacheMixin methods.

Activity

The pull request is a prerequisite for issue [Diffusion] Implement MagCache #18498, indicating it's part of a larger feature development.
A baseline command for benchmarking is provided, suggesting that performance considerations are important for this change.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a generic DiffusionCache base class, refactoring TeaCache from a mixin to a TeaCacheStrategy. This is a strong architectural improvement that enhances modularity and will make it easier to add other caching mechanisms like MagCache in the future. However, a high-severity Denial of Service (DoS) vulnerability was identified in the TeaCacheStrategy.get_context implementation. This is due to an assertion verifying the presence of teacache_params in sampling_params, which are not currently defined or initialized in the SamplingParams class. This allows an attacker to crash the model runner process by simply enabling TeaCache in a generation request. Please address this by ensuring proper parameter initialization and graceful error handling. Additionally, there is a suggestion to prevent a potential division-by-zero error.

python/sglang/multimodal_gen/runtime/cache/teacache.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

mickqian · 2026-03-05T17:59:52Z

/tag-and-rerun-ci

mickqian · 2026-03-05T18:02:08Z

also cc @DefTruth for this one

yhyang201 · 2026-03-06T04:36:40Z

@mickqian Nvidia CI passed and PR is approved, ready for merge

— SGLDHelper bot

eitanturok · 2026-03-07T20:29:15Z

To do:

Better docs
Remove ret_use
Compute polynomial on GPU, not CPU
Make sure works with torch compile
remove cache_type

yhyang201 · 2026-03-07T21:29:48Z

/rerun-failed-ci

yhyang201 · 2026-03-07T21:30:00Z

/rerun-failed-ci

yhyang201 · 2026-03-07T21:49:09Z

/rerun-failed-ci

yhyang201 · 2026-03-07T21:49:12Z

/rerun-failed-ci

yhyang201 · 2026-03-13T14:50:55Z

/rerun-failed-ci

yhyang201 · 2026-03-13T14:50:57Z

/rerun-failed-ci

yhyang201 · 2026-03-13T16:28:14Z

/rerun-failed-ci

yhyang201 · 2026-03-15T18:42:05Z

/tag-and-rerun-ci

yhyang201 · 2026-03-16T15:27:08Z

/tag-and-rerun-ci

yhyang201 · 2026-03-17T18:28:22Z

/rerun-failed-ci

yhyang201 · 2026-03-17T18:28:34Z

/rerun-failed-ci

yhyang201 · 2026-03-17T20:41:02Z

/rerun-failed-ci

yhyang201 · 2026-03-17T20:41:05Z

/rerun-failed-ci

yhyang201 · 2026-03-17T21:03:29Z

/rerun-failed-ci

yhyang201 · 2026-03-17T21:28:44Z

/rerun-failed-ci

yhyang201 · 2026-03-17T21:52:57Z

/rerun-failed-ci

yhyang201 · 2026-03-28T21:43:28Z

/tag-and-rerun-ci

yhyang201 · 2026-03-28T22:19:40Z

/tag-and-rerun-ci

yhyang201 · 2026-03-29T00:01:35Z

/rerun-failed-ci

yhyang201 · 2026-03-29T00:34:35Z

/rerun-failed-ci

yhyang201 · 2026-03-29T02:46:05Z

/rerun-failed-ci

yhyang201 · 2026-03-29T03:53:45Z

/rerun-failed-ci

yhyang201 · 2026-04-05T21:57:20Z

/tag-and-rerun-ci

yhyang201 · 2026-04-06T01:34:04Z

/rerun-failed-ci

yhyang201 · 2026-04-06T02:49:53Z

/rerun-failed-ci

yhyang201 · 2026-04-06T04:01:44Z

/rerun-failed-ci

yhyang201 · 2026-04-06T05:08:00Z

/rerun-failed-ci

init

b7ba59d

eitanturok requested review from BBuf, mickqian, ping1jing2, yhyang201 and yingluosanqian as code owners March 5, 2026 14:11

github-actions bot added the diffusion SGLang Diffusion label Mar 5, 2026

github-actions bot added the run-ci label Mar 5, 2026

eitanturok changed the title ~~init~~ Cleanup TeaCache Mar 5, 2026

gemini-code-assist bot reviewed Mar 5, 2026

View reviewed changes

python/sglang/multimodal_gen/runtime/cache/teacache.py Outdated Show resolved Hide resolved

python/sglang/multimodal_gen/runtime/cache/teacache.py Outdated Show resolved Hide resolved

eitanturok changed the title ~~Cleanup TeaCache~~ [diffusion] Cleanup TeaCache Mar 5, 2026

eitanturok mentioned this pull request Mar 5, 2026

[diffusion] fix TeaCache silently fails with --enable-teacache #19964

Merged

5 tasks

eitanturok and others added 3 commits March 5, 2026 15:57

fix from sgl-project#19964

e7f629c

Update python/sglang/multimodal_gen/runtime/cache/teacache.py

cfa1e57

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

dont need params here

23c7ee4

mickqian approved these changes Mar 5, 2026

View reviewed changes

Merge branch 'sgl-project:main' into teacache-refactor

a82e797

eitanturok and others added 2 commits March 7, 2026 21:41

Merge branch 'main' into teacache-refactor

3ed92f8

update

57689f5

github-actions bot added the documentation Improvements or additions to documentation label Mar 7, 2026

Merge branch 'sgl-project:main' into teacache-refactor

667623c

Merge branch 'sgl-project:main' into teacache-refactor

cb5f2b5

eitanturok mentioned this pull request Mar 16, 2026

[diffusion] Unify TeaCacheParams and WanTeaCacheParams #20706

Merged

5 tasks

Merge branch 'main' into teacache-refactor

4f1f83d

update

cdabdab

eitanturok mentioned this pull request Mar 28, 2026

[diffusion] Refactor TeaCache #21613

Open

5 tasks

Merge branch 'main' into teacache-refactor

1cdd5f4

Conversation

eitanturok commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The Problem

Key Improvements

Old PR description

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

yhyang201 commented Mar 5, 2026

Uh oh!

gemini-code-assist bot commented Mar 5, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

mickqian commented Mar 5, 2026

Uh oh!

mickqian commented Mar 5, 2026

Uh oh!

yhyang201 commented Mar 6, 2026

Uh oh!

eitanturok commented Mar 7, 2026

Uh oh!

yhyang201 commented Mar 7, 2026

Uh oh!

yhyang201 commented Mar 7, 2026

Uh oh!

yhyang201 commented Mar 7, 2026

Uh oh!

yhyang201 commented Mar 7, 2026

Uh oh!

yhyang201 commented Mar 13, 2026

Uh oh!

yhyang201 commented Mar 13, 2026

Uh oh!

yhyang201 commented Mar 13, 2026

Uh oh!

yhyang201 commented Mar 15, 2026

Uh oh!

yhyang201 commented Mar 16, 2026

Uh oh!

yhyang201 commented Mar 17, 2026

Uh oh!

yhyang201 commented Mar 17, 2026

Uh oh!

yhyang201 commented Mar 17, 2026

Uh oh!

yhyang201 commented Mar 17, 2026

Uh oh!

yhyang201 commented Mar 17, 2026

Uh oh!

yhyang201 commented Mar 17, 2026

Uh oh!

yhyang201 commented Mar 17, 2026

Uh oh!

yhyang201 commented Mar 28, 2026

Uh oh!

yhyang201 commented Mar 28, 2026

Uh oh!

yhyang201 commented Mar 29, 2026

Uh oh!

yhyang201 commented Mar 29, 2026

Uh oh!

yhyang201 commented Mar 29, 2026

Uh oh!

yhyang201 commented Mar 29, 2026

Uh oh!

yhyang201 commented Apr 5, 2026

Uh oh!

eitanturok commented Mar 5, 2026 •

edited

Loading