[Bugfix] Fix Incompatible Multihook Integration (TeaCache <-> CPU Offload)#2689
Conversation
Signed-off-by: Alex Brooks <albrooks@redhat.com>
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
hsliuustc0106
left a comment
There was a problem hiding this comment.
Clean fix. The old dispatch had a latent bug where multi-hook scenarios would skip new_forward overrides entirely. New dispatch correctly separates the two hook types. Tests are thorough.
One observation: hooks with new_forward are expected to call their own pre_forward/post_forward internally (see OverrideAppendHook). This is a subtle contract — if someone writes a new_forward hook that forgets to call its own pre/post, the hook silently does nothing. Worth documenting this expectation on the ModelHook.new_forward docstring.
Signed-off-by: Alex Brooks <albrooks@redhat.com>
|
Thanks for the quick review @hsliuustc0106 🙂
I agree, I think this is very unintuitive. IMO it's better to just let the original model Currently, there are no hooks that override |
|
Good fix. The new One subtlety: the contract for |
Signed-off-by: Alex Brooks <albrooks@redhat.com>
|
Sounds good to me - changed the default |
wtomin
left a comment
There was a problem hiding this comment.
LGTM. This fix is nice and clean
…load) (vllm-project#2689) Signed-off-by: Alex Brooks <albrooks@redhat.com> Co-authored-by: SYLAR <125541396+lishunyang12@users.noreply.github.com>
…load) (vllm-project#2689) Signed-off-by: Alex Brooks <albrooks@redhat.com> Co-authored-by: SYLAR <125541396+lishunyang12@users.noreply.github.com>
…load) (vllm-project#2689) Signed-off-by: Alex Brooks <albrooks@redhat.com> Co-authored-by: SYLAR <125541396+lishunyang12@users.noreply.github.com>
Purpose
Fixes the root cause of #1868
The Hook registry currently has the following behavior:
new_forwardon the call hook -> call post processThis means that any hooks that actually override
new_forward, e.g., TeaCache, will not call their overriddennew_forwardwhen they are run with other hooks like CPU offload.This PR fixes it by ensuring only one hook that overrides
new_forwardis active at a time, and ensures that it's the last in the chain. Then the flow becomes call preprocess on all hooks that don't override the new forward, call the new forward on the hook (which currently encapsulates the pre/post process of that hook as well), and then call the chain backwards. If we try to register multiple hooks that have their own overrides ofnew_forward, it'll also throw aRuntimeErrorsince we currently don't have a way of combining hooks with custom forwards.Test Plan
The hook registry is already tested indirectly through the SP tests, but does not have direct unit tests, so I added a suite for it as well to verify the sorting / execution behavior on simple hooks.
Test Result
Tests pass, and TeaCache and CPU offload are now compatible.
On main, you should see ~ 1x speedup since teacache doesn't actually run if cpu offload is passed. On this branch you should see the speedup of ~1.3x and the resulting grid (top is with the cache backend set, bottom is with no cache backend set)