[CI][Bugfix] Make test_gpt2_cache_hit observable across V1 EngineCore by haosdent · Pull Request #42037 · vllm-project/vllm

haosdent · 2026-05-08T07:06:21Z

Purpose

Fix tests/compile/test_aot_compile.py::test_gpt2_cache_hit. The test patched make_symbol in the parent process, but in V1 the compile runs inside an EngineCore subprocess so the patch never reached the counter. Related to #41953, which made the spawn wrapper eagerly import the test module and indirectly initialize CUDA in the parent, flipping EngineCore from fork to spawn.

Replace the parent-process patch with LLM.collective_rpc(_snap) to read compilation_counter from the EngineCore subprocess. Process-model agnostic, runs under default V1 multiprocessing. collective_rpc(callable) requires VLLM_ALLOW_INSECURE_SERIALIZATION=1.

Test Plan

rm -rf ~/.cache/vllm/torch_compile_cache
pytest tests/compile/test_aot_compile.py::test_gpt2_cache_hit -s -v

Test Result

============= 1 passed, 21 warnings in 67.36s =============

gemini-code-assist

Code Review

This pull request refactors the test_gpt2_cache_hit test to use the compilation_counter utility for verifying AOT compilation and cache hits, replacing the previous manual symbol-counting approach. It also disables V1 multiprocessing for this test to allow the singleton counter to function correctly in-process and includes explicit cleanup of the activation registry and distributed environment to prevent state leakage. The review feedback highlights that the fresh_vllm_cache fixture is unused in the test body and notes that clearing the private _ACTIVATION_REGISTRY._dict is a fragile operation that relies on implementation details.

gemini-code-assist · 2026-05-08T07:11:29Z

 @pytest.mark.skipif(not is_torch_equal_or_newer("2.10.0"), reason="requires torch 2.10")
 @create_new_process_for_each_test("spawn")
-def test_gpt2_cache_hit(monkeypatch: pytest.MonkeyPatch):
+def test_gpt2_cache_hit(monkeypatch: pytest.MonkeyPatch, fresh_vllm_cache: str):


The fresh_vllm_cache fixture is added to the test signature but is not explicitly used within the test body. While it likely performs necessary setup (like setting VLLM_CACHE_ROOT and clearing the directory) via its side effects, it is generally better practice to use the fixture's value if it provides the path, or at least document that it's relied upon for environment setup. If fresh_vllm_cache is intended to replace the manual VLLM_CACHE_ROOT setting seen in the previous version, ensure that the fixture indeed sets this environment variable globally for the test process.

This is refer from test_aot_counters_on_save_and_load, I would keep it

gemini-code-assist · 2026-05-08T07:11:29Z

+    # otherwise mutates compilation_config.disabled_custom_ops between
+    # phase 1 and phase 2 and changes the AOT cache hash.
+    del llm_model
+    vllm.model_executor.layers.activation._ACTIVATION_REGISTRY._dict.clear()


Accessing and clearing _ACTIVATION_REGISTRY._dict is a fragile operation as it relies on private implementation details of the LazyDict class and the activation module. While this is a test and the PR description explains why it's necessary (to prevent state leakage affecting the AOT cache hash), this suggests that the CompilationConfig or the AOT hashing logic might be overly sensitive to global state. Consider if there's a more robust way to ensure a clean state between the two LLM instantiations, or at least add a comment explaining that this is a workaround for global state leakage in the activation registry.

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

NickLucche

cc @ProExpertProg @zou3519

ZJY0516 · 2026-05-08T12:13:50Z

+    Compiling gpt2 twice must produce a fresh AOT compile the first time
+    and a cache load (zero compiles) the second time.
+
+    Forces VLLM_ENABLE_V1_MULTIPROCESSING=0 so the EngineCore runs in this


Are you sure we need to set VLLM_ENABLE_V1_MULTIPROCESSING=0 here? It's been running fine under multiproc for a while before

It looks related to my previous fix #41943 .

Before #41943, the module was lazy loading, and the monkey-patch was executed before init CUDA, so it could monitor the increase count

Old path (cloudpickle): child starts → deserialize test func (lazy) → monkey-patch → LLM(...) → CUDA init happens here → _maybe_force_spawn: no prior CUDA → in-process → compile visible to patch → counter increments ✓

After #41943, the module is imported and triggers CUDA init first, and then the monkey-patch is executed after that, so the failure happens.

New path (importlib): child starts → import_module("tests.compile.test_aot_compile") → top-level import chain triggers CUDA init → monkey-patch → LLM(...) → _maybe_force_spawn: CUDA already init'd → spawns EngineCore subprocess → compile in subprocess, patch in parent → counter stays 0 ✗

Have updated the description, thanks @ZJY0516

I think we may want to make sure these test are running with multiproc enabled which is aligned with the real world usage

I see, let me check if could track the counter without disable multiple processes

Change to keep multiple process way, @ZJY0516 may you take a look at again? Thank you in advance!

The test patches `torch.fx.experimental.symbolic_shapes.make_symbol` in the parent process and counts via a `multiprocessing.Value`. In V1 the actual compile runs inside an `EngineCore` subprocess that vLLM spawns whenever CUDA is initialized in the parent (via `_maybe_force_spawn`), so the parent-process patch never reaches the compile path and the counter stays at 0. Replace the brittle torch-internal patch with `LLM.collective_rpc` to snapshot `compilation_counter` from the EngineCore subprocess itself. This is process-model agnostic — works under default V1 multiprocessing without sharing memory between the test and the engine. Each phase creates a fresh `LLM(...)` and tears it down via `cleanup_dist_env_and_memory()`, so per-phase counters start at zero without needing the previous activation-registry workaround. Phase 2 also sets `VLLM_FORCE_AOT_LOAD=1` as a fail-loud guard (raises FileNotFoundError on cache miss) on top of the counter assertion. `collective_rpc(callable)` requires pickle-based serialization, so the test sets `VLLM_ALLOW_INSECURE_SERIALIZATION=1` (the same pattern other collective_rpc-using tests follow, e.g. `tests/v1/e2e/general/test_pooling_chunked_prefill.py`). Signed-off-by: haosdent <haosdent@gmail.com>

ZJY0516 · 2026-05-09T03:53:09Z

merge this to unblock CI since @ProExpertProg has approved

mergify Bot added the bug Something isn't working label May 8, 2026

haosdent force-pushed the ci-881352c8 branch from ea50d7b to e03d3d1 Compare May 8, 2026 07:11

gemini-code-assist Bot reviewed May 8, 2026

View reviewed changes

haosdent force-pushed the ci-881352c8 branch 2 times, most recently from 507c094 to 1cc211d Compare May 8, 2026 07:26

haosdent changed the title ~~[WIP][CI][Bugfix] Make test_gpt2_cache_hit observable across V1 EngineCore~~ [CI][Bugfix] Make test_gpt2_cache_hit observable across V1 EngineCore May 8, 2026

haosdent marked this pull request as ready for review May 8, 2026 07:33

claude Bot reviewed May 8, 2026

View reviewed changes

noooop added the ready ONLY add when PR is ready to merge/full CI is needed label May 8, 2026

noooop requested a review from ProExpertProg May 8, 2026 07:41

haosdent force-pushed the ci-881352c8 branch from 1cc211d to e5df9af Compare May 8, 2026 07:49

noooop requested review from tjtanaa and zou3519 May 8, 2026 09:54

NickLucche reviewed May 8, 2026

View reviewed changes

ZJY0516 reviewed May 8, 2026

View reviewed changes

haosdent force-pushed the ci-881352c8 branch from e5df9af to 40674ed Compare May 8, 2026 14:17

ProExpertProg approved these changes May 8, 2026

View reviewed changes

haosdent force-pushed the ci-881352c8 branch from 40674ed to 196976d Compare May 8, 2026 14:25

haosdent marked this pull request as draft May 8, 2026 14:26

haosdent force-pushed the ci-881352c8 branch from 196976d to 9fc2239 Compare May 8, 2026 14:29

haosdent marked this pull request as ready for review May 8, 2026 14:32

ZJY0516 merged commit e934e45 into vllm-project:main May 9, 2026
15 checks passed

ZhanqiuHu mentioned this pull request May 9, 2026

[CI Summary 2026-05-09] 9 failed (8 new, 1 recurring), 7 fixed ZhanqiuHu/vllm-ci-watch#115

Open

Uh oh!

Conversation

haosdent commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

haosdent May 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

ZJY0516 May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

haosdent May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

haosdent May 8, 2026

Choose a reason for hiding this comment

Uh oh!

ZJY0516 May 8, 2026

Choose a reason for hiding this comment

Uh oh!

haosdent May 8, 2026

Choose a reason for hiding this comment

Uh oh!

haosdent May 8, 2026

Choose a reason for hiding this comment

Uh oh!

ZJY0516 commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

haosdent commented May 8, 2026 •

edited

Loading

ZJY0516 May 8, 2026 •

edited

Loading

haosdent May 8, 2026 •

edited

Loading