[diffusion] Refactor TeaCache by eitanturok · Pull Request #21613 · sgl-project/sglang

eitanturok · 2026-03-28T21:51:53Z

Motivation

This PR cleans up the TeaCache implementation.

Teacache Parameters were split between 4 different classes, updated at different times, and had a messy get_context function to glue it all together. Now we have 3 clearly seperated classes with well-defined API boundaries.
Adding Teacache to a new model now requires 5 lines instead of ~100.
Remove duplicated code for CFG pos + CFG neg
Implemented teacache more generically (no inheritance) so we can support different types of caching in the future
Remove if statements and numpy calls so more friendly for torch.compile

This is a prerequisite for supporting different timestep caching methods like MagCache (#18498 #19957).

Modifications

Here are the changes in more detail:

1. Fragmented State

The Problem: Parameters were scattered across the model class, TeaCacheMixin, TeaCacheParams, and TeaCacheContext. The TeaCacheParams class confusingly had methods attached to it. These classes and parameters were all updated at different times, making it difficult to manage teacache across different requests.

The Solution: Split logic into three distinct components:

TeaCacheParams: A pure data class for user settings (thresholds, offsets). It has no methods and is updated at the start of every new request by a user.
TeaCacheState: Manages internal runtime data (cached tensors, accumulated $L1$ distances). The step counter is now moved here instead of being attached to the model. This is updated after every forward pass.
TeaCacheStrategy: The actual implementation logic that takes in TeaCacheParams and TeaCacheState to decide when to skip a computation.

At a high level, this is what TeaCacheStrategy looks like. (In reality, the methods take in different arguments.)

class TeaCacheStrategy:
	def __init__(TeaCacheParams):
	def maybe_reset(TeaCacheState): # maybe reset the cache state
	def should_skip(TeaCacheState): # decide if we should should skip computing the forward pass for this timestep
	def read(TeaCacheState): # read from the cache
	def write(TeaCacheState): # write to the cache

2. Integration Overhead

The Problem: Adding TeaCache to a new model required ~100 lines of boilerplate code (e.g., the large amount of code deleted in wanvideo.py).

The Solution: Rewrote the TeaCache API so models now only need to pass in the modulated input. All other logic is generic and handled by TeaCacheStrategy.

3. Code Duplication:

The Problem: Handling positive and negative CFG branches required duplicated code blocks in teacache to track separate caching logic.

The Solution: We have a different TeaCacheState state for positive and negative CFG branches.

4. Cannot support multiple types of caching:

The Problem: Cacheable-DiT inherited directly from TeaCacheMixin. Since we only know which cache type we want during the forward pass not at init, the model couldn't easily support or switch between different caching strategies as we'd need to change the class we are inheriting from.

The solution: Instead of Cacheable-DiT inheriting the cache, just assign the cache to Cacheable-DiT.cache (composition). It is now initialized via init_cache(), allowing the model to dynamically support multiple caching types.

5. Not torch.compile friendly:

The Problem: TeaCache called np.poly1d and had if statements for deciding when to skip a forward pass. This would make it hard to compile this code.

The Solution: Use torch.where instead of if and implement np.poly1d in torch.

Accuracy Tests

Speed Tests and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.

Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci

After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

gemini-code-assist

Code Review

This pull request renames teacache_params to cache_params and introduces a calibrate_cache flag with an associated CLI argument. The review identifies that the renaming is incomplete, which will cause runtime errors in other modules like teacache.py, and suggests renaming enable_teacache for consistency. Furthermore, the calibrate_cache parameter appears to be dead code as it is not utilized within the current changes.

python/sglang/multimodal_gen/configs/sample/sampling_params.py

eitanturok · 2026-03-30T10:04:45Z

@yhyang201 @mickqian can you please tag this with run-ci?

eitanturok · 2026-04-01T12:11:18Z

@yhyang201 @mickqian @BBuf @yingluosanqian @ping1jing2 can you please tag this with run-ci?

calibrate cache

b604eb4

github-actions bot added the diffusion SGLang Diffusion label Mar 28, 2026

gemini-code-assist bot reviewed Mar 28, 2026

View reviewed changes

eitanturok added 3 commits March 28, 2026 21:54

undo

d814bc1

refactor teacache

5c975da

teacacheparms is stateless

6b053aa

eitanturok changed the title ~~Refactor Teacache~~ [diffusion[ Refactor Teacache Mar 28, 2026

eitanturok changed the title ~~[diffusion[ Refactor Teacache~~ [diffusion] Refactor TeaCache Mar 28, 2026

eitanturok added 2 commits March 28, 2026 22:49

fix spelling in params

9f8f300

precommit

a5b765a

eitanturok marked this pull request as ready for review March 28, 2026 23:18

eitanturok requested review from BBuf, mickqian, ping1jing2, yhyang201 and yingluosanqian as code owners March 28, 2026 23:18

eitanturok and others added 3 commits March 28, 2026 23:23

make method private

d15d4b9

Merge branch 'main' into refactor-teacache-2

a82d3de

Merge branch 'main' into refactor-teacache-2

86a3e1d

eitanturok added 4 commits March 30, 2026 23:07

Merge branch 'main' into refactor-teacache-2

eba648a

Merge branch 'main' into refactor-teacache-2

55c7e1a

Merge branch 'main' into refactor-teacache-2

a607193

Merge branch 'main' into refactor-teacache-2

1391194

eitanturok added 3 commits April 2, 2026 20:08

Merge branch 'main' into refactor-teacache-2

ed04c02

Merge branch 'main' into refactor-teacache-2

cb4f4a6

Merge branch 'main' into refactor-teacache-2

54f5d1b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[diffusion] Refactor TeaCache#21613

[diffusion] Refactor TeaCache#21613
eitanturok wants to merge 16 commits intosgl-project:mainfrom
eitanturok:refactor-teacache-2

eitanturok commented Mar 28, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eitanturok commented Mar 30, 2026

Uh oh!

eitanturok commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

eitanturok commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

1. Fragmented State

2. Integration Overhead

3. Code Duplication:

4. Cannot support multiple types of caching:

5. Not torch.compile friendly:

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eitanturok commented Mar 30, 2026

Uh oh!

eitanturok commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

eitanturok commented Mar 28, 2026 •

edited

Loading