-
-
Notifications
You must be signed in to change notification settings - Fork 11.4k
[compile][startup] Disable C++ compilation of symbolic shapes #20836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @anijain2305, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request aims to optimize the startup and compilation process by disabling a specific TorchDynamo feature, C++ compilation of symbolic shapes, which is not required for vLLM's operation. By overriding this default configuration, I'm streamlining the compilation flow and potentially improving performance or avoiding unnecessary overhead.
Highlights
- Configuration Override: I've introduced a new method,
overwrite_compiler_configs, within theCompilerclass invllm/compilation/wrapper.py. This method is designed to adjusttorch.compileconfigurations to better suit vLLM's specific needs. - Symbolic Shape Guard Disablement: Specifically, the
overwrite_compiler_configsmethod setstorch._dynamo.config.enable_cpp_symbolic_shape_guardstoFalse. This disables the C++ compilation of symbolic shapes, as vLLM's architecture already handles or skips these guards, making this compilation unnecessary. - Initialization Integration: The
overwrite_compiler_configsmethod is now called during the__init__phase of theCompilerclass, ensuring that these specifictorch.compileconfigurations are applied early in the compilation process.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request disables C++ compilation of symbolic shapes in torch.compile, which appears to be a necessary fix for vLLM's compilation process. The implementation involves a direct modification of a global torch.dynamo configuration flag. My primary feedback centers on the potential risks associated with modifying global state. I've suggested alternative approaches, such as using a context manager for scoped changes or moving the configuration to a more explicit, high-level initialization point if a global setting is indeed required.
vllm/compilation/wrapper.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method modifies a global configuration state (torch._dynamo.config). This change is permanent for the lifetime of the process and can have unintended side effects on other parts of the application or user code that might also use torch.compile and expect the default behavior.
Here are a couple of suggestions to improve this:
- Scoped Configuration: If this setting is only required for compilations managed by this wrapper, consider using a context manager to scope the change. The
torch._dynamo.config.patch()utility is designed for this purpose. - Global Application-level Setting: If this configuration must be global for the entire vLLM application, it would be better to set it at a single, high-level entry point (e.g., during engine initialization) rather than within the
__init__of this wrapper. This makes the global modification more explicit and easier to track.
Modifying global state inside a class constructor can make the system harder to reason about and debug. Could you clarify if this global change is intentional and why it's being implemented here?
Additionally, as a minor point, the method name overwrite_compiler_configs is a bit general since it only modifies one flag. A more specific name like _disable_cpp_symbolic_shape_guards could improve clarity.
vllm/compilation/wrapper.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should just do a torch._dynamo.config.patch over the call to torch.compile (we want to restore the state), but only when compilation_level >= CompilationLevel.DYNAMO_ONCE. When compilation_config is 1 (DYNAMO_AS_IS) this is basically using regular torch.compile, so the guards should be C++-ified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might not work. We read the config flag during the compilation. Here, in the init method, we are just decorating the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm good point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, we don't need to differentiate between DYNAMO_ONCE and DYNAMO_AS_IS. The C++-fication is only for symbolic shape guards. It is a low priority for vllm.
But I still dont know a better way to do a context manager to restore the state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anijain2305 I think the best place to do this is here.
vllm/vllm/compilation/decorators.py
Lines 231 to 239 in 38efa28
| def patched_inline_call(parent, func, args, kwargs): | |
| code = func.get_code() | |
| self.vllm_config.compilation_config.traced_files.add( | |
| code.co_filename) | |
| return inline_call(parent, func, args, kwargs) | |
| with patch.object(InliningInstructionTranslator, 'inline_call', | |
| patched_inline_call): | |
| output = self.compiled_callable(*args, **kwargs) |
vLLM will call this function to do the initial dynamo trace. We know this is true because it is patching InliningInstructionTranslator lol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think __init__ gets called during decoration, it gets called when the model is actually instantiated. But yeah we can decorate around it because we need it to be restored after Dynamo is called, which is way after __init__ is done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zou3519 Moved to the decorators.py file.
a25aa39 to
0fa5836
Compare
|
cc @BoyuanFeng |
Signed-off-by: Animesh Jain <[email protected]>
|
@anijain2305 @zou3519 I start to see this error showing up on vLLM CPU benchmark job https://github.com/pytorch/pytorch-integration-testing/actions/runs/16679734638/job/47215235063#step:15:3463, which looks related to this change |
I've countered the issue today after pulled recent main and here's the possible reason: So the possible fix would be either check if the attribute exists or upgrade all cpu deps to torch 2.7.0. |
|
Feel free to revert the PR (I don't know how to). I can send a followup with the fix that works for older versions as well. |
Reverting on vLLM needs to be done via a PR, reviewed, and merged normally. I haven't seen many reverts, but forward fixes, on the other hand, are plenty :) |
…vllm-project#20836)" This reverts commit 9659bc7. Signed-off-by: Xiao Liu <[email protected]>
created a revert to unblock, @anijain2305 feel free to resubmit whenever fix is done and verified |
…roject#20836) Signed-off-by: Animesh Jain <[email protected]>
…roject#20836) Signed-off-by: Animesh Jain <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]>
…roject#20836) Signed-off-by: Animesh Jain <[email protected]> Signed-off-by: Noam Gat <[email protected]>
…roject#20836) Signed-off-by: Animesh Jain <[email protected]> Signed-off-by: Paul Pak <[email protected]>
…roject#20836) Signed-off-by: Animesh Jain <[email protected]> Signed-off-by: Diego-Castan <[email protected]>
|
@anijain2305 any update on this? |
Provided a workaround in #23048 |
…roject#20836) Signed-off-by: Animesh Jain <[email protected]>
…roject#20836) Signed-off-by: Animesh Jain <[email protected]>
Purpose
For nightly, pytorch compiles the symbolic guards into C++. This can increase compile time. Since vllm drops the guards anyways, skip the compilation of symbolic guards to C++.
Test Plan
Existing tests
Test Result
Reduces build_guards from ~5 seconds to 0.5 seconds.
(Optional) Documentation Update