[Doc] Fix multimodal torch.compile troubleshooting to not use removed VLLM_TORCH_COMPILE_LEVEL#44378
Conversation
… VLLM_TORCH_COMPILE_LEVEL
The "Compilation Errors" troubleshooting step in the multimodal
torch.compile design doc tells users to disable compilation with:
VLLM_TORCH_COMPILE_LEVEL=0 vllm serve <model> ...
`VLLM_TORCH_COMPILE_LEVEL` no longer exists anywhere in the codebase —
torch.compile control was moved to `CompilationConfig` / the `-O`
optimization levels, so this env var is silently ignored and the model
still runs with compilation enabled (defeating the troubleshooting step).
Use the documented way to disable torch.compile + CUDA graphs instead:
`--enforce-eager` (see docs/design/debug_vllm_compile.md, where
`--enforce-eager` maps to `enforce_eager=True` = "Turn off torch.compile
and CUDAGraphs").
Signed-off-by: Daoyuan Li <94409450+DaoyuanLi2816@users.noreply.github.com>
|
Documentation preview: https://vllm--44378.org.readthedocs.build/en/44378/ |
|
Hi @hmellor — could you take a look when you have a moment? This is a small doc-correctness fix: the multimodal torch.compile troubleshooting section tells users to run It's been open ~3 days with no reviewer auto-assigned — looks like |
|
@DarkLight1337 — would you have a moment for this small doc fix? It's a 1-line swap: the multimodal torch.compile troubleshooting section tells users to run Same shape as #44128 (which sfeng33 helped merge). CI is green except gates that need a |
|
I think we should use the direct replacement |
Per review feedback from @DarkLight1337: mode=0 (CompilationMode.NONE) is the faithful replacement for the removed VLLM_TORCH_COMPILE_LEVEL=0 — it disables torch.compile while keeping CUDA graphs, whereas --enforce-eager also disables CUDA graphs (a behavior change). Fold mode into the existing --compilation-config JSON. Signed-off-by: Daoyuan Li <94409450+DaoyuanLi2816@users.noreply.github.com>
|
Thanks @DarkLight1337, updated. Since the command already passed vllm serve <model> --compilation-config='{"mode":0,"compile_mm_encoder":"false"}'As you noted, |
… VLLM_TORCH_COMPILE_LEVEL (vllm-project#44378) Signed-off-by: Daoyuan Li <94409450+DaoyuanLi2816@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
… VLLM_TORCH_COMPILE_LEVEL (vllm-project#44378) Signed-off-by: Daoyuan Li <94409450+DaoyuanLi2816@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
… VLLM_TORCH_COMPILE_LEVEL (vllm-project#44378) Signed-off-by: Daoyuan Li <94409450+DaoyuanLi2816@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Signed-off-by: Waqar Ahmed <waqar.ahmed@amd.com>
Purpose
The "Compilation Errors" troubleshooting section in
docs/design/torch_compile_multimodal.mdtells users to disable compilation like this:VLLM_TORCH_COMPILE_LEVELno longer exists anywhere in the codebase — torch.compile control was moved toCompilationConfig/ the-Ooptimization levels. The env var is silently ignored, so following this step does not actually disable compilation: the model still runs compiled, defeating the "verify the model works without compilation" instruction.Change
Use the documented way to turn off torch.compile + CUDA graphs:
Per
docs/design/debug_vllm_compile.md,--enforce-eager(→enforce_eager=True) is "Turn off torch.compile and CUDAGraphs". I kept the explicitcompile_mm_encoder: falseto preserve the original intent of also skipping the MM encoder.One-line doc change.
Not a duplicate
Per
AGENTS.mdduplicate-work checks (onmain@53b88d1d):The only open PR touching
torch_compile_multimodal.mdhistorically is #30549 (WhisperEncoder), which does not modify this troubleshooting section.Test plan
typos,markdownlint-cli2, and the other applicable hooks pass on the changed file. (update-dockerfile-grapherrors withExecutable /bin/bash not found— a Windows-host limitation unrelated to this doc; it runs on Linux CI.)AI-assisted (Claude Code); reviewed end-to-end by the submitter.