[Engine] Reset KV connector cache in pause_generation cascade by aoshen02 · Pull Request #42695 · vllm-project/vllm

aoshen02 · 2026-05-15T01:39:55Z

Summary

AsyncLLM.pause_generation(clear_cache=True) internally calls EngineCore._reset_caches, which then calls Scheduler.reset_prefix_cache(reset_running_requests=True) — with the new reset_connector parameter (added in #27170) defaulting to False. For engines with a configured external KV store connector (MooncakeStoreConnector, LMCacheConnectorV1, HF3FSConnector, FlexKVConnector, …) that means a user asking the engine to "clear cache" after a weight update gets the local prefix cache cleared but leaves the external store full of KV blocks hashed against the previous policy.

For RL post-training workflows this is a silent stale-cache correctness hole: the next request can read external KV that was written by the old policy and serve it as a prefix hit against the new policy.

Change

Parameterize EngineCore._reset_caches with reset_connector: bool = True and forward the value to reset_prefix_cache. The default flips from False to True so that pause_generation(clear_cache=True) — whose contract is "clear caches" — now actually clears the external KV store too.

Scheduler.reset_connector_cache already gracefully handles the no-connector case (warns + returns False), so this is a no-op for engines that don't configure a connector. Internal callers that want a local-only invalidation can override with reset_connector=False; outside callers can still bypass the cascade by calling scheduler.reset_prefix_cache(reset_connector=False) directly.

This intentionally keeps reset_connector out of the user-facing AsyncLLM.pause_generation signature: threading an opt-in flag all the way up turns the safety net into a footgun (user forgets the flag → silent stale KV). The user-facing surface stays a single clear_cache semantic — "clear means clear". Future internal callers can override via the new kwarg if they need to.

Why this is not a duplicate

gh pr list --repo vllm-project/vllm --state open --search "_reset_caches reset_connector" → empty
gh pr list --repo vllm-project/vllm --state open --search "pause_generation clear_cache reset_connector" → empty
gh issue list --repo vllm-project/vllm --state open --search "pause_generation reset_connector" → empty

Related PRs:

[KVConnector][Feature] Support KV connector cache reset via /reset_prefix_cache #27170 introduced reset_connector on Scheduler.reset_prefix_cache.
[KVConnector][Bugfix] Treat no-connector as no-op success in reset_connector_cache #42693 is a parallel correctness fix for the no-connector branch of the same code path (returns True instead of False when no connector is attached) — independent of this change.
[KVConnector][Mooncake] Wire reset_cache cascade end-to-end #42694 implements MooncakeStoreConnector.reset_cache, the most immediate beneficiary of this cascade.

Test plan

No new test is added: the behavior change is exercised by the existing tests/v1/core/test_reset_prefix_cache_e2e.py (no-connector engine keeps working) and by the new tests in #42694 (Mooncake connector now sees its reset_cache called via pause_generation(clear_cache=True)).

Commands run locally:

pre-commit run --files vllm/v1/engine/core.py
# → ruff check, ruff format, mypy-3.10, SPDX, etc. all Passed

The full pytest run could not be executed in my local sandbox (the editable vllm install in the venv points at a different worktree whose precompiled CUDA flash-attention extensions don't match upstream main). CI will be the source of truth.

AI assistance

This patch was authored with assistance from Claude Opus 4.7 (1M context). Every changed line was reviewed by me end-to-end before opening this PR.

🤖 Generated with Claude Code

`AsyncLLM.pause_generation(clear_cache=True)` internally calls `EngineCore._reset_caches`, which then calls `Scheduler.reset_prefix_cache(reset_running_requests=True)` -- with the new `reset_connector` parameter (added in vllm-project#27170) defaulting to False. For engines with a configured external KV store connector (e.g. `MooncakeStoreConnector`, `LMCacheConnectorV1`, `HF3FSConnector`, ...) that means a user asking the engine to "clear cache" after a weight update gets the local prefix cache cleared but leaves the external store full of KV blocks hashed against the previous policy. For RL post-training workflows this is a silent stale-cache correctness hole: the next request can read external KV that was written by the old policy and serve it as a prefix hit against the new policy. This patch parameterizes `EngineCore._reset_caches` with `reset_connector: bool = True` and forwards the value to `reset_prefix_cache`. The default flips from False to True so that `pause_generation(clear_cache=True)` -- the user-facing API whose contract is "clear caches" -- now actually clears the external KV store too. `Scheduler.reset_connector_cache` already gracefully handles the no-connector case (warns + returns False), so this is a no-op for engines that don't configure a connector. Internal callers that want a local-only invalidation can override with `reset_connector=False`; outside callers can still bypass the cascade by calling `scheduler.reset_prefix_cache(reset_connector=False)` directly. This intentionally keeps `reset_connector` *out* of the user-facing `AsyncLLM.pause_generation` signature: threading an opt-in flag all the way up turns the safety net into a footgun (user forgets the flag -> silent stale KV). The user-facing surface stays a single `clear_cache` semantic ("clear means clear"). Future internal callers can override via the new kwarg if they need to. ## Why this is not a duplicate - `gh pr list --repo vllm-project/vllm --state open --search "_reset_caches reset_connector"` -> empty - `gh pr list --repo vllm-project/vllm --state open --search "pause_generation clear_cache reset_connector"` -> empty - `gh issue list --repo vllm-project/vllm --state open --search "pause_generation reset_connector"` -> empty Related PRs: - vllm-project#27170 introduced `reset_connector` on `Scheduler.reset_prefix_cache`. - vllm-project#42693 is a parallel correctness fix for the no-connector branch of the same code path (returns True instead of False when no connector is attached) -- independent of this change. - vllm-project#42694 implements `MooncakeStoreConnector.reset_cache`, the most immediate beneficiary of this cascade. ## Test plan No new test is added: the behavior change is exercised by the existing `tests/v1/core/test_reset_prefix_cache_e2e.py` (no-connector engine keeps working) and by the new tests in vllm-project#42694 (Mooncake connector now sees its `reset_cache` called via `pause_generation(clear_cache=True)`). Commands run locally: ```bash pre-commit run --files vllm/v1/engine/core.py # -> ruff check, ruff format, mypy-3.10, SPDX, etc. all Passed ``` The full pytest run could not be executed in my local sandbox (the editable `vllm` install in the venv points at a different worktree whose precompiled CUDA flash-attention extensions don't match upstream `main`). CI will be the source of truth. ## AI assistance This patch was authored with assistance from Claude Opus 4.7 (1M context). Every changed line was reviewed by the submitter before opening the PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: aoshen524 <aoshen524@gmail.com>

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

gemini-code-assist

Code Review

This pull request updates the _reset_caches method in vllm/v1/engine/core.py to include a reset_connector parameter, ensuring external KV-store connectors are cleared alongside local caches. Feedback suggests checking for the existence of a connector before triggering the reset to avoid unnecessary warning logs for users who do not have an external KV connector configured.

gemini-code-assist · 2026-05-15T01:42:04Z

+    def _reset_caches(
+        self,
+        reset_running_requests: bool = True,
+        reset_connector: bool = True,
+    ) -> None:
+        # ``reset_connector`` defaults to True so external KV-store
+        # connectors (e.g. MooncakeStoreConnector) drop their state
+        # alongside the local prefix/mm/encoder caches. This matches the
+        # invariant callers of ``pause_generation(clear_cache=True)``
+        # expect: clear all caches, not just the on-engine ones.
+        # ``Scheduler.reset_connector_cache`` already handles the
+        # no-connector case (logs + short-circuits), so this is a no-op
+        # for engines without a configured KV connector.
+        #
+        # Internal callers that genuinely want a local-only invalidation
+        # can pass ``reset_connector=False``; users that need that escape
+        # hatch from outside should call
+        # ``scheduler.reset_prefix_cache(reset_connector=False)``
+        # directly rather than going through this cascade.
+        self.reset_prefix_cache(
+            reset_running_requests=reset_running_requests,
+            reset_connector=reset_connector,
+        )


By defaulting reset_connector to True in _reset_caches, and because _reset_caches is called by default in pause_scheduler(clear_cache=True), most users (who do not use an external KV connector) will now see a warning log: "reset_connector called but no KV connector is configured." from Scheduler.reset_connector_cache every time they pause the engine.

To avoid this log spam for the majority of users while still ensuring the external cache is cleared when present, we should only pass reset_connector=True to the scheduler if a connector is actually configured. The current implementation's claim in the docstring that it is a "no-op" is slightly misleading because of this logging side effect.

def _reset_caches( self, reset_running_requests: bool = True, reset_connector: bool = True, ) -> None: # ``reset_connector`` defaults to True so external KV-store # connectors (e.g. MooncakeStoreConnector) drop their state # alongside the local prefix/mm/encoder caches. This matches the # invariant callers of ``pause_generation(clear_cache=True)`` # expect: clear all caches, not just the on-engine ones. # # Internal callers that genuinely want a local-only invalidation # can pass ``reset_connector=False``; users that need that escape # hatch from outside should call # ``scheduler.reset_prefix_cache(reset_connector=False)`` # directly rather than going through this cascade. has_connector = self.scheduler.get_kv_connector() is not None self.reset_prefix_cache( reset_running_requests=reset_running_requests, reset_connector=reset_connector and has_connector, )

aoshen02 · 2026-05-15T01:54:10Z

Superseded by #42694, which now consolidates this engine cascade change together with the scheduler no-op-success fix (#42693, also being closed) and the Mooncake-side reset_cache implementation into a single end-to-end PR per review feedback.

aoshen02 requested a review from njhill as a code owner May 15, 2026 01:39

claude Bot reviewed May 15, 2026

View reviewed changes

mergify Bot added the v1 label May 15, 2026

gemini-code-assist Bot reviewed May 15, 2026

View reviewed changes

aoshen02 mentioned this pull request May 15, 2026

[KVConnector][Bugfix] Treat no-connector as no-op success in reset_connector_cache #42693

Closed

aoshen02 closed this May 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Engine] Reset KV connector cache in pause_generation cascade#42695

[Engine] Reset KV connector cache in pause_generation cascade#42695
aoshen02 wants to merge 1 commit into
vllm-project:mainfrom
aoshen02:aoshen524/engine-cascade-reset-connector

aoshen02 commented May 15, 2026

Uh oh!

claude Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 15, 2026

Uh oh!

aoshen02 commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

aoshen02 commented May 15, 2026

Summary

Change

Why this is not a duplicate

Test plan

AI assistance

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

aoshen02 commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant