[BugFix] Refresh TeaCache when num_inference_steps=None#2240
[BugFix] Refresh TeaCache when num_inference_steps=None#2240alex-jw-brooks wants to merge 6 commits intovllm-project:mainfrom
Conversation
Thanks for the PR. A quick question: if I understand it correctly, this PR is only a quick fix It force set the number of inference steps to 0: not None but falsy. This passes the check during cache refreshing, and also yields to pipeline-specific overrides. And a more complete fix is at your cache_refresh branch |
|
Hey @fhfuih! No worries 😆 but yes. My understanding of the flow is
For TeaCache, the refresh does not depend on the timesteps, and is just resetting the TeaCache state (i.e., the Since it's not being called currently, the state is stale from the last execute model call, so instead of creating a new one on the first time step, it gets the old one, so we fall through this check. So this fix is okay for a short-term fix for the behavior for TeaCache, but the other branch will fix it more correctly by passing the actual |
| # FIXME (Alex): When num_inference_steps is None, we defer to | ||
| # pipelines for default, but don't refresh the cache; the right | ||
| # way to do this is to merge the sampling params first, but | ||
| # for now we allow teacache to refresh either way since it does | ||
| # not depend on the num_inference_steps. |
There was a problem hiding this comment.
For the comment, maybe we can explain why we need 0 instead of None for now---it hacks the logic in which places, and TeaCache requires which behavior/bugfix
There was a problem hiding this comment.
Thanks for the explanation. All looks good to me. @SamitHuang @ZJY0516 could you also have a look and decide whether to merge this hotfix? Since it is related to a previous relevant PR. Thanks
13cedd4 to
091d8f2
Compare
lishunyang12
left a comment
There was a problem hiding this comment.
looks reasonable as a short-term workaround, left a couple of nits.
| # pipelines for default, but don't refresh the cache; the right | ||
| # way to do this is to merge the sampling params first, but | ||
| # for now we allow teacache to refresh either way since it does | ||
| # not depend on the num_inference_steps. |
There was a problem hiding this comment.
Nit: the comment explains that teacache doesn't depend on num_inference_steps, but it'd be more useful to add a one-liner about why 0 is safe — i.e. TeaCacheBackend.refresh() ignores the value entirely (just resets hook state). That addresses @fhfuih's earlier feedback too.
There was a problem hiding this comment.
Yup! Added a more clear explanation
65a706e to
cd4e66a
Compare
Signed-off-by: Alex Brooks <albrooks@redhat.com>
Co-authored-by: SYLAR <125541396+lishunyang12@users.noreply.github.com> Signed-off-by: Alex Brooks <albrooks@redhat.com>
Signed-off-by: Alex Brooks <albrooks@redhat.com>
cd4e66a to
ec7424d
Compare
Purpose
Related to #2194
The proper fix for the above issue is to merge the sampling params to get the correct
num_inference_steps, but this PR adds a short-term workaround for teacache, which doesn't depend onnum_inference_steps. It also adds logging if the cache fails to reset for now while I am working on the more general fix.This is needed because the warmup initializes teacache, which replaces forward(), and can cause bad behaviors when running TTI on models that accept image inputs. E.g., for Flux2Klein
Not refreshing before entering the forward pass will blow up because the new modulated inputs don't have an image component, while the previous (stale) ones do.
This PR allows teacache to refresh in this case, and adds a log if we can't refresh the cache while the more correct fix is added.
@Gaohan123 @wtomin @fhfuih could you please take a look?