From a4bb5d7e1738a94acd60ac71df298e89dae3a3dd Mon Sep 17 00:00:00 2001 From: Daoyuan Li <94409450+DaoyuanLi2816@users.noreply.github.com> Date: Tue, 2 Jun 2026 20:48:38 -0700 Subject: [PATCH 1/2] [Doc] Fix multimodal torch.compile troubleshooting to not use removed VLLM_TORCH_COMPILE_LEVEL MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The "Compilation Errors" troubleshooting step in the multimodal torch.compile design doc tells users to disable compilation with: VLLM_TORCH_COMPILE_LEVEL=0 vllm serve ... `VLLM_TORCH_COMPILE_LEVEL` no longer exists anywhere in the codebase — torch.compile control was moved to `CompilationConfig` / the `-O` optimization levels, so this env var is silently ignored and the model still runs with compilation enabled (defeating the troubleshooting step). Use the documented way to disable torch.compile + CUDA graphs instead: `--enforce-eager` (see docs/design/debug_vllm_compile.md, where `--enforce-eager` maps to `enforce_eager=True` = "Turn off torch.compile and CUDAGraphs"). Signed-off-by: Daoyuan Li <94409450+DaoyuanLi2816@users.noreply.github.com> --- docs/design/torch_compile_multimodal.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/design/torch_compile_multimodal.md b/docs/design/torch_compile_multimodal.md index 8b745c8ce233..0691eef45924 100644 --- a/docs/design/torch_compile_multimodal.md +++ b/docs/design/torch_compile_multimodal.md @@ -88,7 +88,7 @@ If compilation fails for a multimodal model: 1. **Disable and test**: First verify the model works without compilation: ```bash - VLLM_TORCH_COMPILE_LEVEL=0 vllm serve --compilation-config='{"compile_mm_encoder":"false"}' + vllm serve --enforce-eager --compilation-config='{"compile_mm_encoder":"false"}' ``` 2. **Check logs**: Enable debug logging to see compilation details: From 2874498ad4587d7c460cfbe37e4b32ffac89b8ab Mon Sep 17 00:00:00 2001 From: Daoyuan Li <94409450+DaoyuanLi2816@users.noreply.github.com> Date: Sat, 6 Jun 2026 23:20:44 -0700 Subject: [PATCH 2/2] Use --compilation-config mode=0 as the direct replacement MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Per review feedback from @DarkLight1337: mode=0 (CompilationMode.NONE) is the faithful replacement for the removed VLLM_TORCH_COMPILE_LEVEL=0 — it disables torch.compile while keeping CUDA graphs, whereas --enforce-eager also disables CUDA graphs (a behavior change). Fold mode into the existing --compilation-config JSON. Signed-off-by: Daoyuan Li <94409450+DaoyuanLi2816@users.noreply.github.com> --- docs/design/torch_compile_multimodal.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/design/torch_compile_multimodal.md b/docs/design/torch_compile_multimodal.md index 0691eef45924..bb30de56bc14 100644 --- a/docs/design/torch_compile_multimodal.md +++ b/docs/design/torch_compile_multimodal.md @@ -88,7 +88,7 @@ If compilation fails for a multimodal model: 1. **Disable and test**: First verify the model works without compilation: ```bash - vllm serve --enforce-eager --compilation-config='{"compile_mm_encoder":"false"}' + vllm serve --compilation-config='{"mode":0,"compile_mm_encoder":"false"}' ``` 2. **Check logs**: Enable debug logging to see compilation details: