Skip to content

Conversation

@zou3519
Copy link
Collaborator

@zou3519 zou3519 commented Jun 9, 2025

Summary:
Before this PR, V1 always forced the compilation level to be 3 (PIECEWISE). In VLLM V0, users had an option to set the compilation level to be something else. This PR changes V1 to respect the compilation level set by the user.

The other compilation levels are mostly just used for debugging, but they are very helpful for debugging. I've been trying to track down some silent incorrectness and -O1 is using vLLM with stock torch.compile so it helps in seeing if the problem is in stock torch.compile or vLLM.

Test Plan:
Added some new tests to test CompilationLevel 0 and 1. I'm not sure exactly how to write a behavior test for CompilationLevel 2.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @zou3519, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello! Gemini here, providing a summary of this pull request by @zou3519. This PR addresses a bug in VLLM V1 where the compilation level was previously hardcoded to PIECEWISE (level 3), preventing users from utilizing other compilation levels. The primary goal is to restore the user's ability to specify different compilation levels, particularly NO_COMPILATION (level 0) and DYNAMO_AS_IS (level 1), which are valuable for debugging purposes. The changes involve modifying the configuration handling, adding specific logic for the DYNAMO_AS_IS level in the V1 worker, introducing a counter for tracking DYNAMO_AS_IS usage, and adding new tests to cover levels 0 and 1.

Highlights

  • Bugfix: Allow non-PIECEWISE compilation levels in V1: Previously, VLLM V1 forced the compilation level to PIECEWISE. This PR removes that restriction, allowing users to specify other levels like NO_COMPILATION (0) or DYNAMO_AS_IS (1).
  • Configuration Update: The default compilation level in CompilationConfig is updated to be PIECEWISE if VLLM_USE_V1 is enabled, and NO_COMPILATION otherwise. Crucially, the explicit forcing of PIECEWISE within the V1 config initialization (VllmConfig.__post_init__) has been removed.
  • DYNAMO_AS_IS Handling: Specific logic is added in the V1 GPU model runner (vllm/v1/worker/gpu_model_runner.py) to explicitly call model.compile when the DYNAMO_AS_IS compilation level is selected and supported.
  • Compilation Counter: A new counter, dynamo_as_is_count, is added to vllm/compilation/counter.py to track how many times the DYNAMO_AS_IS compilation level is used. This counter is incremented in both the V0 and V1 model loading paths.
  • New Tests: Tests are added in tests/compile/test_config.py to specifically verify the behavior of CompilationLevel.DYNAMO_AS_IS (level 1) and CompilationLevel.NO_COMPILATION (level 0) using the new compilation counter.

Changelog

Click here to see the changelog
  • tests/compile/test_config.py
    • Added test_dynamo_as_is to verify compilation level 1 (DYNAMO_AS_IS).
    • Added test_no_compilation to verify compilation level 0 (NO_COMPILATION).
  • vllm/compilation/counter.py
    • Added dynamo_as_is_count to track usage of CompilationLevel.DYNAMO_AS_IS.
  • vllm/config.py
    • Changed the default value of CompilationConfig.level to be PIECEWISE for V1 and NO_COMPILATION otherwise (lines 3870-3871).
    • Removed the line that explicitly forced self.compilation_config.level = CompilationLevel.PIECEWISE in VllmConfig.__post_init__ (removed line 4446).
  • vllm/v1/worker/gpu_model_runner.py
    • Imported envs, compilation_counter, and supports_dynamo (lines 15, 21, 43).
    • Added logic in load_model to call self.model.compile and increment compilation_counter.dynamo_as_is_count when CompilationLevel.DYNAMO_AS_IS is used (lines 1626-1635).
  • vllm/worker/model_runner.py
    • Imported compilation_counter (line 25).
    • Added line to increment compilation_counter.dynamo_as_is_count within the DYNAMO_AS_IS check in load_model (line 1220).
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@mergify mergify bot added the v1 label Jun 9, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @zou3519, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello team, Gemini here with a summary of this pull request. This PR addresses a limitation in VLLM V1 where the compilation level was previously hardcoded to PIECEWISE (level 3). The main goal is to allow users to specify and utilize other compilation levels, specifically levels 0 (NO_COMPILATION) and 1 (DYNAMO_AS_IS), which are particularly useful for debugging scenarios as highlighted in the description. The changes involve modifying the configuration logic to respect the user-provided compilation level, adding specific handling for the DYNAMO_AS_IS level in the V1 model runner, and introducing new tests to ensure these alternative compilation levels function correctly.

Highlights

  • Compilation Level Flexibility: VLLM V1 now respects the user-configured compilation level instead of forcing PIECEWISE (level 3).
  • Debugging Support: Enables the use of NO_COMPILATION (level 0) and DYNAMO_AS_IS (level 1) for easier debugging of potential issues.
  • Configuration Update: Removed the explicit forcing of PIECEWISE compilation level in the V1 configuration initialization.
  • New Tests: Added tests (test_dynamo_as_is, test_no_compilation) to verify the behavior of compilation levels 1 and 0.
  • Compilation Counter: Introduced a counter (dynamo_as_is_count) to track occurrences of the DYNAMO_AS_IS compilation level being used.

Changelog

Click here to see the changelog
  • tests/compile/test_config.py
    • Added test_dynamo_as_is to verify model loading with compilation level 1 (DYNAMO_AS_IS) and check the corresponding counter.
    • Added test_no_compilation to verify model loading with compilation level 0 (NO_COMPILATION) and check that no graphs or dynamo compilations occur.
  • vllm/compilation/counter.py
    • Added dynamo_as_is_count field to the CompilationCounter class to track the number of times the DYNAMO_AS_IS compilation level is used.
  • vllm/config.py
    • Modified the default value of CompilationConfig.level to be CompilationLevel.PIECEWISE if VLLM_USE_V1 is true, otherwise CompilationLevel.NO_COMPILATION (DIFF 1).
    • Removed the line self.compilation_config.level = CompilationLevel.PIECEWISE from the VllmConfig.__post_init__ method when VLLM_USE_V1 is true, which previously forced this level (DIFF 2).
  • vllm/v1/worker/gpu_model_runner.py
    • Imported envs, compilation_counter, and supports_dynamo.
    • Added a conditional block in the load_model method to explicitly call self.model.compile with the appropriate backend and increment compilation_counter.dynamo_as_is_count when the compilation level is DYNAMO_AS_IS and dynamo is supported.
  • vllm/worker/model_runner.py
    • Imported compilation_counter.
    • Added compilation_counter.dynamo_as_is_count += 1 within the existing torch.compile block for the DYNAMO_AS_IS level in the load_model method (this seems to be for the non-V1 path).
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses the issue where the V1 engine previously forced the compilation level to PIECEWISE. By removing this forced setting and adjusting the default compilation level, users now have the flexibility to specify other compilation levels, which is particularly useful for debugging purposes as highlighted in the PR description.

The changes are clear, and the addition of tests for CompilationLevel.NO_COMPILATION (level 0) and CompilationLevel.DYNAMO_AS_IS (level 1) is a great step towards ensuring correctness. The new counter for DYNAMO_AS_IS compilations is also a good addition for observability and testing.

Regarding the test plan, for CompilationLevel.DYNAMO_ONCE (level 2), you might consider a test that verifies the model is compiled only once. This could involve:

  1. Setting the compilation level to 2.
  2. Running the model with a set of inputs and checking a relevant compilation counter (e.g., num_backend_compilations or a more specific Dynamo counter if available) to confirm compilation occurred.
  3. Running the model again (perhaps with different data but compatible shapes if the compilation is shape-generic, or same-shaped data).
  4. Asserting that the compilation counter did not increment on the second run, indicating the cached compiled graph was used.

Overall, the PR is well-structured and improves the flexibility of the V1 engine's compilation features. Nice work!

Summary of Findings

  • Stylistic Suggestion: Line Continuation: In vllm/v1/worker/gpu_model_runner.py (lines 1627-1628), the if condition uses a \ for line continuation. Since the condition is already enclosed in parentheses, this backslash is redundant and can be removed for slightly improved readability, adhering to PEP 8 guidelines for line breaks within parentheses. This is a minor stylistic point and does not affect functionality.

Merge Readiness

The pull request appears to be in good shape and addresses the intended bugfix effectively. The core changes are sound, and the new tests cover the enabled compilation levels (0 and 1). The identified stylistic point is minor. I believe this PR is ready for merge after considering the feedback. As I am an AI, I am not authorized to approve pull requests; please ensure other reviewers approve this change before merging.

@github-actions
Copy link

github-actions bot commented Jun 9, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses the issue where VLLM V1 previously forced the compilation level to PIECEWISE. The changes allow V1 to respect user-defined compilation levels, which is particularly useful for debugging purposes, as highlighted in the description. The introduction of tests for compilation levels 0 (NO_COMPILATION) and 1 (DYNAMO_AS_IS) is a good addition, and the update to CompilationCounter to track DYNAMO_AS_IS usage is appropriate.

The core logic changes in vllm/config.py to adjust the default compilation level and remove the override in VllmConfig.__post_init__ are correct. The modifications in vllm/v1/worker/gpu_model_runner.py and vllm/worker/model_runner.py to handle DYNAMO_AS_IS and update the counter are also well-implemented.

One area that might need further consideration or clarification is the handling of CompilationLevel.DYNAMO_ONCE (level 2) in the V1 execution path, as detailed in the specific comment.

Summary of Findings

  • Handling of CompilationLevel.DYNAMO_ONCE (level 2) in V1: The V1 gpu_model_runner.py explicitly calls torch.compile for CompilationLevel.DYNAMO_AS_IS but not for CompilationLevel.DYNAMO_ONCE. This differs from V0's behavior. Clarification on whether V1 should also perform a full model compile for level 2 would be beneficial.
  • Use of VLLM_TEST_DYNAMO_FULLGRAPH_CAPTURE: In vllm/v1/worker/gpu_model_runner.py (and vllm/worker/model_runner.py), the fullgraph parameter for torch.compile is set using envs.VLLM_TEST_DYNAMO_FULLGRAPH_CAPTURE. This suggests it's primarily for testing. While this might be acceptable for debugging-focused compilation levels, it's worth noting. (Severity: low, not commented inline due to settings)

Merge Readiness

The pull request is well on its way to achieving its goals. The primary changes for supporting compilation levels 0 and 1 are solid. However, there's a medium-severity point regarding the handling of CompilationLevel.DYNAMO_ONCE in the V1 pathway that should be addressed or clarified before merging. Addressing this will ensure consistent behavior or clear understanding of the differences between V0 and V1 for all compilation levels. I am unable to approve pull requests, so please have another reviewer approve this code before merging.

@zou3519 zou3519 force-pushed the fix_dynamo_once branch 3 times, most recently from 51ae00a to 69e8b20 Compare June 9, 2025 14:20
@mergify
Copy link

mergify bot commented Jun 9, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @zou3519.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jun 9, 2025
@zou3519 zou3519 force-pushed the fix_dynamo_once branch from 69e8b20 to d5f2094 Compare June 9, 2025 14:56
@mergify mergify bot removed the needs-rebase label Jun 9, 2025
@zou3519 zou3519 marked this pull request as ready for review June 9, 2025 16:09
@zou3519 zou3519 requested review from houseroad and youkaichao June 9, 2025 16:10
@zou3519 zou3519 force-pushed the fix_dynamo_once branch from d5f2094 to 8781a91 Compare June 9, 2025 22:13
@mergify
Copy link

mergify bot commented Jun 16, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @zou3519.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jun 16, 2025
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move this to the TorchCompileDispatcher wrapper that already handles the other two levels, or is that not possible?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how it worked in V0, this was separate:

# for CompilationLevel.DYNAMO_AS_IS , the upper level model runner
# will handle the compilation, so we don't need to do anything here.

I don't want to boil the ocean in this PR, but I agree everything is nicer when they're together. I can see if this is possible in a follow-up if that sounds reasonable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep super fair, I have a (bad) habit of asking for improvements on code that's not directly related to the PR

Copy link
Collaborator

@houseroad houseroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable. Left some comment.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should consider removing TEST from this env var name as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pre-existing, also this flag is now True by default, so I will try to delete it in a follow-up.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ProExpertProg
Copy link
Collaborator

@zou3519 should we merge this?

@mergify
Copy link

mergify bot commented Jul 18, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @zou3519.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify
Copy link

mergify bot commented Jul 24, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @zou3519.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jul 24, 2025
vllm/config.py Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will appear in the API and CLI docs, but the note is outside of the docstring so the reader won't know what the default behaviour is

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, I'll move it up here.

Summary:
Before this PR, V1 always forced the compilation level to be 3
(PIECEWISE). In VLLM V0, users had an option to set the compilation
level to be something else. This PR changes V1 to respect the
compilation level set by the user.

The other compilation levels are mostly just used for debugging, but
they are very helpful for debugging. I've been trying to track down some
silent incorrectness and -O1 is using vLLM with stock torch.compile so
it helps in seeing if the problem is in stock torch.compile or vLLM.

Test Plan:
Added some new tests to test CompilationLevel 0 and 1. I'm not sure
exactly how to write a behavior test for CompilationLevel 2.

Signed-off-by: Richard Zou <[email protected]>
@mergify mergify bot removed the needs-rebase label Jul 29, 2025
@zou3519 zou3519 merged commit 04e3850 into vllm-project:main Jul 29, 2025
67 checks passed
liuyumoye pushed a commit to liuyumoye/vllm that referenced this pull request Jul 31, 2025
x22x22 pushed a commit to x22x22/vllm that referenced this pull request Aug 5, 2025
Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Aug 6, 2025
npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025
jinzhen-lin pushed a commit to jinzhen-lin/vllm that referenced this pull request Aug 9, 2025
noamgat pushed a commit to noamgat/vllm that referenced this pull request Aug 9, 2025
paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025
diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants