From a4bb5d7e1738a94acd60ac71df298e89dae3a3dd Mon Sep 17 00:00:00 2001
From: Daoyuan Li <94409450+DaoyuanLi2816@users.noreply.github.com>
Date: Tue, 2 Jun 2026 20:48:38 -0700
Subject: [PATCH 1/2] [Doc] Fix multimodal torch.compile troubleshooting to not
 use removed VLLM_TORCH_COMPILE_LEVEL
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The "Compilation Errors" troubleshooting step in the multimodal
torch.compile design doc tells users to disable compilation with:

    VLLM_TORCH_COMPILE_LEVEL=0 vllm serve <model> ...

`VLLM_TORCH_COMPILE_LEVEL` no longer exists anywhere in the codebase —
torch.compile control was moved to `CompilationConfig` / the `-O`
optimization levels, so this env var is silently ignored and the model
still runs with compilation enabled (defeating the troubleshooting step).

Use the documented way to disable torch.compile + CUDA graphs instead:
`--enforce-eager` (see docs/design/debug_vllm_compile.md, where
`--enforce-eager` maps to `enforce_eager=True` = "Turn off torch.compile
and CUDAGraphs").

Signed-off-by: Daoyuan Li <94409450+DaoyuanLi2816@users.noreply.github.com>
---
 docs/design/torch_compile_multimodal.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/docs/design/torch_compile_multimodal.md b/docs/design/torch_compile_multimodal.md
index 8b745c8ce233..0691eef45924 100644
--- a/docs/design/torch_compile_multimodal.md
+++ b/docs/design/torch_compile_multimodal.md
@@ -88,7 +88,7 @@ If compilation fails for a multimodal model:
 
 1. **Disable and test**: First verify the model works without compilation:
    ```bash
-   VLLM_TORCH_COMPILE_LEVEL=0 vllm serve <model> --compilation-config='{"compile_mm_encoder":"false"}'
+   vllm serve <model> --enforce-eager --compilation-config='{"compile_mm_encoder":"false"}'
    ```
 
 2. **Check logs**: Enable debug logging to see compilation details:

From 2874498ad4587d7c460cfbe37e4b32ffac89b8ab Mon Sep 17 00:00:00 2001
From: Daoyuan Li <94409450+DaoyuanLi2816@users.noreply.github.com>
Date: Sat, 6 Jun 2026 23:20:44 -0700
Subject: [PATCH 2/2] Use --compilation-config mode=0 as the direct replacement
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Per review feedback from @DarkLight1337: mode=0 (CompilationMode.NONE) is
the faithful replacement for the removed VLLM_TORCH_COMPILE_LEVEL=0 — it
disables torch.compile while keeping CUDA graphs, whereas --enforce-eager
also disables CUDA graphs (a behavior change). Fold mode into the existing
--compilation-config JSON.

Signed-off-by: Daoyuan Li <94409450+DaoyuanLi2816@users.noreply.github.com>
---
 docs/design/torch_compile_multimodal.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/design/torch_compile_multimodal.md b/docs/design/torch_compile_multimodal.md
index 0691eef45924..bb30de56bc14 100644
--- a/docs/design/torch_compile_multimodal.md
+++ b/docs/design/torch_compile_multimodal.md
@@ -88,7 +88,7 @@ If compilation fails for a multimodal model:
 
 1. **Disable and test**: First verify the model works without compilation:
    ```bash
-   vllm serve <model> --enforce-eager --compilation-config='{"compile_mm_encoder":"false"}'
+   vllm serve <model> --compilation-config='{"mode":0,"compile_mm_encoder":"false"}'
    ```
 
 2. **Check logs**: Enable debug logging to see compilation details: