-
-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[Frontend][torch.compile] CompilationConfig Overhaul (#20283): Set up -O infrastructure #26847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
vllm-bot
merged 85 commits into
vllm-project:main
from
morrison-turnansky:issue-20283-model-config
Nov 27, 2025
Merged
Changes from all commits
Commits
Show all changes
85 commits
Select commit
Hold shift + click to select a range
affd919
added parser for moe detection with test
morrison-turnansky 86cbc77
Set up -O infrastrucutre
adabeyta b82140e
name change and removed editing backedn in _apply_optimization_level …
morrison-turnansky ea16053
updated defaults for each pass config
morrison-turnansky 887f9aa
set cuda graph mode defaults
morrison-turnansky 2abcafe
added skelaton for non model specifc settings, and test to veriy that…
morrison-turnansky 4778dc0
made is_model_moe inaccessible from user
morrison-turnansky 51b7e8e
added parsing function to determine if model is quantized
morrison-turnansky 2a37bfb
added model specific optimizations
morrison-turnansky 3300079
updated default config design
morrison-turnansky 560dccf
added vllm config default test
morrison-turnansky f5ce445
Update vllm/config/vllm.py
morrison-turnansky 1378181
reviewer comments
morrison-turnansky 55a9085
moved functions to optimization.py
morrison-turnansky 5ecaff2
Add lambda-based callable defaults to vLLM config
adabeyta 904ab7e
added quant_fp8 and rms_norm to defaults
morrison-turnansky 5057dca
added optimization level to engine args
morrison-turnansky 77183b3
reviewer comments
morrison-turnansky cd498f2
typos
morrison-turnansky 939d035
added warning
morrison-turnansky e240cd7
Add lambda-based callable defaults to vLLM config
adabeyta af5b48b
Update vllm/engine/arg_utils.py
morrison-turnansky 1229504
Update vllm/config/model.py
morrison-turnansky a414df5
reviewer comments
morrison-turnansky a57a5d9
Update vllm/config/vllm.py
morrison-turnansky f1529a7
Update vllm/config/model.py
morrison-turnansky cd65961
linting
morrison-turnansky cdaa51d
Add tests for explicit config overrides and fix OptimizationLevel import
adabeyta d4c6b88
Fix pre-commit issues
adabeyta 60b2e54
added test cases for model parsing functions
morrison-turnansky 929a88f
changed behavior of -0num and -0 num to map to optimziation level
morrison-turnansky 28df435
Update vllm/config/vllm.py
morrison-turnansky 89fb5bf
changed | None = None to Field(default = None)
morrison-turnansky 7de833b
added disclamier for configs about invalid states
morrison-turnansky a7c86fc
fixed parse arges tests
morrison-turnansky 1ca27ea
doc fix
morrison-turnansky 7ee1a5b
Update vllm/config/compilation.py
morrison-turnansky 23cd230
reviwer comments
morrison-turnansky ec6d6de
Make resolve config take only VllmConfig
adabeyta f112b72
reviewer comments 2
morrison-turnansky 55834f9
added lambda in build_default
morrison-turnansky a8788ea
reviewer comments
morrison-turnansky c4ed382
removed debugging statements
morrison-turnansky 7e474ed
added RedHatAI/Llama-3.1-8B-Instruct-NVFP4 test case
morrison-turnansky 629b9d1
merge
morrison-turnansky a37d6f1
Update vllm/config/vllm.py
morrison-turnansky af3c0ae
Update vllm/config/vllm.py
morrison-turnansky d516761
Update vllm/config/vllm.py
morrison-turnansky bf5e6c8
reviwer comments
morrison-turnansky 6ceb6c0
moved optimization levels to global config
morrison-turnansky e3912df
merge
morrison-turnansky e3d205c
spelling
morrison-turnansky 739abbb
Update vllm/config/vllm.py
morrison-turnansky 3079f2e
Update vllm/config/vllm.py
morrison-turnansky f579455
Update tests/engine/test_arg_utils.py
morrison-turnansky 3d2fd25
Update tests/engine/test_arg_utils.py
morrison-turnansky 457357a
Update tests/engine/test_arg_utils.py
morrison-turnansky 8fd2c07
reviewer comments
morrison-turnansky dba0848
fixed failing tests
morrison-turnansky e8d5839
Update tests/test_config.py
morrison-turnansky c7968a5
Update vllm/config/vllm.py
morrison-turnansky 41adf1f
reverted change to test_vllm_config_defaults
morrison-turnansky 83cebd5
merge
morrison-turnansky 771107b
updated _apply_optimization_level_defaults to recurse any fields in s…
morrison-turnansky 150d2bf
reviewer comments
morrison-turnansky 2369185
reviewer comments
morrison-turnansky 3002fa2
reviewer comments
morrison-turnansky b44255f
added optimization level docs
morrison-turnansky 4025836
reviewer comments
morrison-turnansky 158dfa6
support LLM(optimization_level), updated docs for workflow
morrison-turnansky 7783009
Merge branch 'main' into issue-20283-model-config
morrison-turnansky 128f5cc
linting
morrison-turnansky 120e100
fixed tests/model_executor/test_enabled_custom_ops.py
morrison-turnansky 81e6c81
Merge branch 'main' into issue-20283-model-config
morrison-turnansky ed754d5
ci failures
morrison-turnansky 2e609ac
merge
morrison-turnansky 6287f47
fixed ci failure locally without changing behaviour
morrison-turnansky 5805d6a
merge
morrison-turnansky 0a272da
fixed cacheing of set_current_vllm_config
morrison-turnansky a54f084
ci
morrison-turnansky 2323a3d
Merge branch 'main' into issue-20283-model-config
morrison-turnansky 20d1645
moved cache changes locally to test
morrison-turnansky 3c65d2e
Update vllm/config/vllm.py
morrison-turnansky 5fa2370
removed optimzation_level downgrade
morrison-turnansky e54a161
cleaned up cudagraph mode incompatibility log
morrison-turnansky File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,69 @@ | ||
| <!-- markdownlint-disable --> | ||
|
|
||
| # Optimization Levels | ||
|
|
||
| ## Overview | ||
|
|
||
| vLLM now supports optimization levels (`-O0`, `-O1`, `-O2`, `-O3`). Optimization levels provide an intuitive mechnaism for users to trade startup time for performance. Higher levels have better performance but worse startup time. These optimization levels have associated defaults to help users get desired out of the box performance. Importantly, defaults set by optimization levels are purely defaults; explicit user settings will not be overwritten. | ||
|
|
||
| ## Level Summaries and Usage Examples | ||
| ```bash | ||
| # CLI usage | ||
| python -m vllm.entrypoints.api_server --model RedHatAI/Llama-3.2-1B-FP8 -O0 | ||
|
|
||
| # Python API usage | ||
| from vllm.entrypoints.llm import LLM | ||
|
|
||
| llm = LLM( | ||
| model="RedHatAI/Llama-3.2-1B-FP8", | ||
| optimization_level=0 | ||
| ) | ||
| ``` | ||
|
|
||
| #### `-O1`: Quick Optimizations | ||
| - **Startup**: Moderate startup time | ||
| - **Performance**: Inductor compilation, CUDAGraphMode.PIECEWISE | ||
| - **Use case**: Balance for most development scenarios | ||
|
|
||
| ```bash | ||
| # CLI usage | ||
| python -m vllm.entrypoints.api_server --model RedHatAI/Llama-3.2-1B-FP8 -O1 | ||
|
|
||
| # Python API usage | ||
| from vllm.entrypoints.llm import LLM | ||
|
|
||
| llm = LLM( | ||
| model="RedHatAI/Llama-3.2-1B-FP8", | ||
| optimization_level=1 | ||
| ) | ||
| ``` | ||
|
|
||
| #### `-O2`: Full Optimizations (Default) | ||
| - **Startup**: Longer startup time | ||
| - **Performance**: `-O1` + CUDAGraphMode.FULL_AND_PIECEWISE | ||
| - **Use case**: Production workloads where performance is important. This is the default use case. It is also very similar to the previous default. The primary difference is that noop & fusion flags are enabled. | ||
|
|
||
| ```bash | ||
| # CLI usage (default, so optional) | ||
| python -m vllm.entrypoints.api_server --model RedHatAI/Llama-3.2-1B-FP8 -O2 | ||
|
|
||
| # Python API usage | ||
| from vllm.entrypoints.llm import LLM | ||
|
|
||
| llm = LLM( | ||
| model="RedHatAI/Llama-3.2-1B-FP8", | ||
| optimization_level=2 # This is the default | ||
| ) | ||
| ``` | ||
|
|
||
| #### `-O3`: Full Optimization | ||
| Still in development. Added infrastructure to prevent changing API in future | ||
| release. Currently behaves the same O2. | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| ### Common Issues | ||
|
|
||
| 1. **Startup Time Too Long**: Use `-O0` or `-O1` for faster startup | ||
| 2. **Compilation Errors**: Use `debug_dump_path` for additional debugging information | ||
| 3. **Performance Issues**: Ensure using `-O2` for production |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.