Skip to content

Conversation

@anijain2305
Copy link
Contributor

@anijain2305 anijain2305 commented Jul 11, 2025

Purpose

For nightly, pytorch compiles the symbolic guards into C++. This can increase compile time. Since vllm drops the guards anyways, skip the compilation of symbolic guards to C++.

Test Plan

Existing tests

Test Result

Reduces build_guards from ~5 seconds to 0.5 seconds.

(Optional) Documentation Update

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @anijain2305, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request aims to optimize the startup and compilation process by disabling a specific TorchDynamo feature, C++ compilation of symbolic shapes, which is not required for vLLM's operation. By overriding this default configuration, I'm streamlining the compilation flow and potentially improving performance or avoiding unnecessary overhead.

Highlights

  • Configuration Override: I've introduced a new method, overwrite_compiler_configs, within the Compiler class in vllm/compilation/wrapper.py. This method is designed to adjust torch.compile configurations to better suit vLLM's specific needs.
  • Symbolic Shape Guard Disablement: Specifically, the overwrite_compiler_configs method sets torch._dynamo.config.enable_cpp_symbolic_shape_guards to False. This disables the C++ compilation of symbolic shapes, as vLLM's architecture already handles or skips these guards, making this compilation unnecessary.
  • Initialization Integration: The overwrite_compiler_configs method is now called during the __init__ phase of the Compiler class, ensuring that these specific torch.compile configurations are applied early in the compilation process.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request disables C++ compilation of symbolic shapes in torch.compile, which appears to be a necessary fix for vLLM's compilation process. The implementation involves a direct modification of a global torch.dynamo configuration flag. My primary feedback centers on the potential risks associated with modifying global state. I've suggested alternative approaches, such as using a context manager for scoped changes or moving the configuration to a more explicit, high-level initialization point if a global setting is indeed required.

Comment on lines 67 to 74
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This method modifies a global configuration state (torch._dynamo.config). This change is permanent for the lifetime of the process and can have unintended side effects on other parts of the application or user code that might also use torch.compile and expect the default behavior.

Here are a couple of suggestions to improve this:

  • Scoped Configuration: If this setting is only required for compilations managed by this wrapper, consider using a context manager to scope the change. The torch._dynamo.config.patch() utility is designed for this purpose.
  • Global Application-level Setting: If this configuration must be global for the entire vLLM application, it would be better to set it at a single, high-level entry point (e.g., during engine initialization) rather than within the __init__ of this wrapper. This makes the global modification more explicit and easier to track.

Modifying global state inside a class constructor can make the system harder to reason about and debug. Could you clarify if this global change is intentional and why it's being implemented here?

Additionally, as a minor point, the method name overwrite_compiler_configs is a bit general since it only modifies one flag. A more specific name like _disable_cpp_symbolic_shape_guards could improve clarity.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should just do a torch._dynamo.config.patch over the call to torch.compile (we want to restore the state), but only when compilation_level >= CompilationLevel.DYNAMO_ONCE. When compilation_config is 1 (DYNAMO_AS_IS) this is basically using regular torch.compile, so the guards should be C++-ified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might not work. We read the config flag during the compilation. Here, in the init method, we are just decorating the function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm good point

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, we don't need to differentiate between DYNAMO_ONCE and DYNAMO_AS_IS. The C++-fication is only for symbolic shape guards. It is a low priority for vllm.

But I still dont know a better way to do a context manager to restore the state.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anijain2305 I think the best place to do this is here.

def patched_inline_call(parent, func, args, kwargs):
code = func.get_code()
self.vllm_config.compilation_config.traced_files.add(
code.co_filename)
return inline_call(parent, func, args, kwargs)
with patch.object(InliningInstructionTranslator, 'inline_call',
patched_inline_call):
output = self.compiled_callable(*args, **kwargs)

vLLM will call this function to do the initial dynamo trace. We know this is true because it is patching InliningInstructionTranslator lol

Copy link
Collaborator

@ProExpertProg ProExpertProg Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think __init__ gets called during decoration, it gets called when the model is actually instantiated. But yeah we can decorate around it because we need it to be restored after Dynamo is called, which is way after __init__ is done

Copy link
Contributor Author

@anijain2305 anijain2305 Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zou3519 Moved to the decorators.py file.

@zou3519 zou3519 requested a review from ProExpertProg July 14, 2025 14:41
@anijain2305 anijain2305 requested a review from youkaichao as a code owner July 17, 2025 19:08
@anijain2305 anijain2305 force-pushed the comp_time branch 2 times, most recently from a25aa39 to 0fa5836 Compare July 17, 2025 20:35
@zou3519 zou3519 added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 18, 2025
@zou3519
Copy link
Collaborator

zou3519 commented Jul 18, 2025

cc @BoyuanFeng

@vllm-bot vllm-bot merged commit 9659bc7 into vllm-project:main Aug 1, 2025
64 of 66 checks passed
@huydhn
Copy link
Contributor

huydhn commented Aug 2, 2025

@anijain2305 @zou3519 I start to see this error showing up on vLLM CPU benchmark job https://github.com/pytorch/pytorch-integration-testing/actions/runs/16679734638/job/47215235063#step:15:3463, which looks related to this change AttributeError: torch._dynamo.config.enable_cpp_symbolic_shape_guards does not exist. Any thoughts?

@xiszishu
Copy link
Contributor

xiszishu commented Aug 2, 2025

@anijain2305 @zou3519 I start to see this error showing up on vLLM CPU benchmark job https://github.com/pytorch/pytorch-integration-testing/actions/runs/16679734638/job/47215235063#step:15:3463, which looks related to this change AttributeError: torch._dynamo.config.enable_cpp_symbolic_shape_guards does not exist. Any thoughts?

I've countered the issue today after pulled recent main and here's the possible reason:
enable_cpp_symbolic_shape_guards was introduced in torch 2.7.0: https://github.com/pytorch/pytorch/blame/134179474539648ba7dee1317959529fbd0e7f89/torch/_dynamo/config.py#L409
while cpu benchmark still uses torch 2.6.0 and does not have the config

So the possible fix would be either check if the attribute exists or upgrade all cpu deps to torch 2.7.0.

@anijain2305
Copy link
Contributor Author

Feel free to revert the PR (I don't know how to). I can send a followup with the fix that works for older versions as well.

@huydhn
Copy link
Contributor

huydhn commented Aug 2, 2025

Feel free to revert the PR (I don't know how to). I can send a followup with the fix that works for older versions as well.

Reverting on vLLM needs to be done via a PR, reviewed, and merged normally. I haven't seen many reverts, but forward fixes, on the other hand, are plenty :)

@xiszishu
Copy link
Contributor

xiszishu commented Aug 2, 2025

Feel free to revert the PR (I don't know how to). I can send a followup with the fix that works for older versions as well.

Reverting on vLLM needs to be done via a PR, reviewed, and merged normally. I haven't seen many reverts, but forward fixes, on the other hand, are plenty :)

created a revert to unblock, @anijain2305 feel free to resubmit whenever fix is done and verified

npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025
jinzhen-lin pushed a commit to jinzhen-lin/vllm that referenced this pull request Aug 9, 2025
noamgat pushed a commit to noamgat/vllm that referenced this pull request Aug 9, 2025
paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025
diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025
@ProExpertProg
Copy link
Collaborator

@anijain2305 any update on this?

@xiszishu
Copy link
Contributor

@anijain2305 any update on this?

Provided a workaround in #23048

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed startup-ux

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

8 participants