-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
Add test case for compiling multiple graphs #21044
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
|
cc: @zou3519 @youkaichao |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new ignore_torch_compile decorator and adds test cases for compiling multiple graphs. There is a potential issue with the inheritance of the IGNORE_COMPILE_KEY that needs to be addressed.
zou3519
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me. One minor comment
|
cc @BoyuanFeng |
BoyuanFeng
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good! a few minor comments
c178193 to
0080859
Compare
|
@houseroad the failing tests don't seem related to this PR, can we merge it? e.g. for model tests (voxtral unrecognized), the test also fails on main |
Signed-off-by: Yong Hoon Shin <[email protected]>
Signed-off-by: Yong Hoon Shin <[email protected]>
Signed-off-by: Yong Hoon Shin <[email protected]>
Signed-off-by: Yong Hoon Shin <[email protected]>
Signed-off-by: Yong Hoon Shin <[email protected]>
Signed-off-by: Yong Hoon Shin <[email protected]>
Signed-off-by: Yong Hoon Shin <[email protected]>
… and compile Signed-off-by: Yong Hoon Shin <[email protected]>
0080859 to
7ece902
Compare
| # InductorAdaptor (unfortunately) requires AOTAutogradCache | ||
| # to be turned off to run. It will fail to acquire the hash_str | ||
| # and error if not. | ||
| # StandaloneInductorAdaptor (PyTorch 2.8+) fixes this problem. | ||
| stack.enter_context( | ||
| torch._functorch.config.patch(enable_autograd_cache=False)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw @houseroad @yeqcharlotte this will fix a lot of the compiler cache issues people report
Signed-off-by: Yong Hoon Shin <[email protected]> Signed-off-by: qizixi <[email protected]>
Signed-off-by: Yong Hoon Shin <[email protected]> Signed-off-by: x22x22 <[email protected]>
Signed-off-by: Yong Hoon Shin <[email protected]>
Signed-off-by: Yong Hoon Shin <[email protected]>
Signed-off-by: Yong Hoon Shin <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]>
Signed-off-by: Yong Hoon Shin <[email protected]> Signed-off-by: Paul Pak <[email protected]>
Signed-off-by: Yong Hoon Shin <[email protected]> Signed-off-by: Diego-Castan <[email protected]>
Signed-off-by: Yong Hoon Shin <[email protected]>
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.Purpose
In some cases, we might want to compile multiple subgraphs, e.g. if some submodules in the model are not traceable with torch dynamo. See #14913 and #19719 for motivating scenarios.
This PR adds a test case showcasing how this can be supported in vLLM for piecewise compilation. This requires several components:
set_model_tag(tag)to ensure compile cache is not shared between these subgraphs (initially added in [torch.compile] reorganize the cache directory to support compiling multiple models #19064 to support compiling multiple models)ignore_torch_compilehelper decorator in scenarios where a class inherits a parent module that hassupport_torch_compiledecorator applied but this class does not want to compile its forward(). Addedtest_ignore_torch_compile_decoratorthat tests this.Also disables AOTAutogradCache in
StandaloneInductoradaptor which is needed for compilation caching to work.StandaloneInductorAdaptorin PyTorch 2.8+ will fix this problem (thanks @zou3519)Test Plan
Test Result
Both newly added unit tests (
test_ignore_torch_compile_decoratorandtest_multi_graph_piecewise_compile) pass