[BugFix] Refresh TeaCache when num_inference_steps=None by alex-jw-brooks · Pull Request #2240 · vllm-project/vllm-omni

alex-jw-brooks · 2026-03-26T18:15:21Z

Purpose

Related to #2194

The proper fix for the above issue is to merge the sampling params to get the correct num_inference_steps, but this PR adds a short-term workaround for teacache, which doesn't depend on num_inference_steps. It also adds logging if the cache fails to reset for now while I am working on the more general fix.

This is needed because the warmup initializes teacache, which replaces forward(), and can cause bad behaviors when running TTI on models that accept image inputs. E.g., for Flux2Klein

from vllm_omni import Omni
from vllm_omni.inputs.data import OmniDiffusionSamplingParams

if __name__ == "__main__":
    omni = Omni(
        model="black-forest-labs/FLUX.2-klein-4B",
        cache_backend="tea_cache",
    )

    prompt = "A cat sitting on a windowsill"

   # If you specify num_inference_steps, you will see the second cache refresh (after warmup)
   # but if you don't pass it, you won't since refresh won't be called.
    sampling_params = OmniDiffusionSamplingParams(
        # Not specifying num_inference_steps will crash forward
    )

    outputs = omni.generate(prompt, sampling_params)
    outputs[0].images[0].save("meow.png")

Not refreshing before entering the forward pass will blow up because the new modulated inputs don't have an image component, while the previous (stale) ones do.

ERROR 03-26 18:11:45 [diffusion_worker.py:481]   File "/home/alex-jw-brooks/vllm-omni/vllm_omni/diffusion/cache/teacache/hook.py", line 222, in _should_compute_full_transformer
ERROR 03-26 18:11:45 [diffusion_worker.py:481]     (modulated_inp - state.previous_modulated_input).abs().mean()
ERROR 03-26 18:11:45 [diffusion_worker.py:481]      ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ERROR 03-26 18:11:45 [diffusion_worker.py:481] RuntimeError: The size of tensor a (4096) must match the size of tensor b (8192) at non-singleton dimension 1
...
ERROR 03-26 18:11:45 [stage_diffusion_client.py:78]   File "/home/alex-jw-brooks/vllm-omni/vllm_omni/entrypoints/async_omni_diffusion.py", line 309, in generate
ERROR 03-26 18:11:45 [stage_diffusion_client.py:78]     raise RuntimeError(f"Diffusion generation failed: {e}") from e
ERROR 03-26 18:11:45 [stage_diffusion_client.py:78] RuntimeError: Diffusion generation failed: The size of tensor a (4096) must match the size of tensor b (8192) at non-singleton dimension 1

This PR allows teacache to refresh in this case, and adds a log if we can't refresh the cache while the more correct fix is added.

@Gaohan123 @wtomin @fhfuih could you please take a look?

fhfuih · 2026-03-27T02:14:05Z

EDIT: Sorry, I actually missed your PR description. My understanding is correct, jumped right into your code 😂

Thanks for the PR. A quick question: if I understand it correctly, this PR is only a quick fix It force set the number of inference steps to 0: not None but falsy. This passes the check during cache refreshing, and also yields to pipeline-specific overrides.

And a more complete fix is at your cache_refresh branch

alex-jw-brooks · 2026-03-27T03:08:47Z

Hey @fhfuih! No worries 😆 but yes. My understanding of the flow is

The TeaCache hooks gets initialized in load_model, which also creates the StateManager etc for the cache
When we run requests, we run the _WrappedForward, which calls the hook's new forward (here).
The new forward for TeaCache (this) runs the extractor, then it gets the TeaCache state or creates a new one. After that, it checks the state here to see if it's the first timestep, and compares against the previous modulated state if it isn't.

For TeaCache, the refresh does not depend on the timesteps, and is just resetting the TeaCache state (i.e., the num_inference_steps aren't passed anywhere here). So the value of 0 is just a placeholder I chose because the arg is an int, but in the TeaCache case doesn't matter since all it's doing is clearing the state.

Since it's not being called currently, the state is stale from the last execute model call, so instead of creating a new one on the first time step, it gets the old one, so we fall through this check.

So this fix is okay for a short-term fix for the behavior for TeaCache, but the other branch will fix it more correctly by passing the actual num_inference_steps value, which we need to be able to reset DiTCache correctly 🙂

fhfuih · 2026-03-30T06:12:49Z

+                # FIXME (Alex): When num_inference_steps is None, we defer to
+                # pipelines for default, but don't refresh the cache; the right
+                # way to do this is to merge the sampling params first, but
+                # for now we allow teacache to refresh either way since it does
+                # not depend on the num_inference_steps.


For the comment, maybe we can explain why we need 0 instead of None for now---it hacks the logic in which places, and TeaCache requires which behavior/bugfix

fhfuih

Thanks for the explanation. All looks good to me. @SamitHuang @ZJY0516 could you also have a look and decide whether to merge this hotfix? Since it is related to a previous relevant PR. Thanks

lishunyang12

looks reasonable as a short-term workaround, left a couple of nits.

lishunyang12 · 2026-04-02T15:42:27Z

+                # pipelines for default, but don't refresh the cache; the right
+                # way to do this is to merge the sampling params first, but
+                # for now we allow teacache to refresh either way since it does
+                # not depend on the num_inference_steps.


Nit: the comment explains that teacache doesn't depend on num_inference_steps, but it'd be more useful to add a one-liner about why 0 is safe — i.e. TeaCacheBackend.refresh() ignores the value entirely (just resets hook state). That addresses @fhfuih's earlier feedback too.

Yup! Added a more clear explanation

Signed-off-by: Alex Brooks <albrooks@redhat.com>

Co-authored-by: SYLAR <125541396+lishunyang12@users.noreply.github.com> Signed-off-by: Alex Brooks <albrooks@redhat.com>

Signed-off-by: Alex Brooks <albrooks@redhat.com>

alex-jw-brooks requested a review from hsliuustc0106 as a code owner March 26, 2026 18:15

alex-jw-brooks changed the title ~~Refresh TeaCache when num_inference_steps=None~~ [BugFix] Refresh TeaCache when num_inference_steps=None Mar 27, 2026

fhfuih reviewed Mar 30, 2026

View reviewed changes

alex-jw-brooks force-pushed the flux2_tc_fix branch from 13cedd4 to 091d8f2 Compare April 1, 2026 08:52

lishunyang12 reviewed Apr 2, 2026

View reviewed changes

alex-jw-brooks force-pushed the flux2_tc_fix branch from 65a706e to cd4e66a Compare April 6, 2026 06:37

alex-jw-brooks and others added 6 commits April 14, 2026 05:02

add workaround for teacache with num_inference_steps=None

dd44f31

Signed-off-by: Alex Brooks <albrooks@redhat.com>

fix log

c8d55ac

Signed-off-by: Alex Brooks <albrooks@redhat.com>

Update vllm_omni/diffusion/worker/diffusion_model_runner.py

908f37a

Co-authored-by: SYLAR <125541396+lishunyang12@users.noreply.github.com> Signed-off-by: Alex Brooks <albrooks@redhat.com>

clarify comment

dbbfc40

Signed-off-by: Alex Brooks <albrooks@redhat.com>

doc

7ac518a

Signed-off-by: Alex Brooks <albrooks@redhat.com>

simplify

ec7424d

Signed-off-by: Alex Brooks <albrooks@redhat.com>

alex-jw-brooks force-pushed the flux2_tc_fix branch from cd4e66a to ec7424d Compare April 14, 2026 05:02

alex-jw-brooks mentioned this pull request Apr 15, 2026

[WIP] [Refactor]Add Base Class for Diffusion Pipelines #2811

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] Refresh TeaCache when num_inference_steps=None#2240

[BugFix] Refresh TeaCache when num_inference_steps=None#2240
alex-jw-brooks wants to merge 6 commits intovllm-project:mainfrom
alex-jw-brooks:flux2_tc_fix

alex-jw-brooks commented Mar 26, 2026

Uh oh!

fhfuih commented Mar 27, 2026 •

edited

Loading

Uh oh!

alex-jw-brooks commented Mar 27, 2026 •

edited

Loading

Uh oh!

fhfuih Mar 30, 2026

Uh oh!

fhfuih left a comment •

edited

Loading

Uh oh!

lishunyang12 left a comment

Uh oh!

Uh oh!

lishunyang12 Apr 2, 2026

Uh oh!

alex-jw-brooks Apr 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

alex-jw-brooks commented Mar 26, 2026

Purpose

Uh oh!

fhfuih commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alex-jw-brooks commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fhfuih Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

fhfuih left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lishunyang12 Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

alex-jw-brooks Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fhfuih commented Mar 27, 2026 •

edited

Loading

alex-jw-brooks commented Mar 27, 2026 •

edited

Loading

fhfuih left a comment •

edited

Loading

alex-jw-brooks Apr 3, 2026 •

edited

Loading