[BugFix] Scheduler: Only set num_external_computed_tokens once by orozery · Pull Request #36859 · vllm-project/vllm

orozery · 2026-03-12T07:03:40Z

Request.num_cached_tokens and Request.num_external_computed_tokens are two fields used for reporting request level cache hit stats. While num_cached_tokens is only set for the first time a request gets schedule, num_external_computed_tokens gets re-set whenever a request tries to gets re-scheduled, in case the request is preempted or when initial allocation fails. This creates a possible inconsistency between the two fields, which can yield to wrongful deduction of the derived stat local_cache_hit, which can cause vLLM to crash in case the wrong value is negative.
This PR fixes it by properly setting these two fields only after a request gets scheduled for the first time (by checking Request.num_preemptions == 0).
This fields may be updated only in the case of an error reported by the connector loading external tokens, We modify a scheduler unit-test for preemptions with KV connector to verify this fields are only set once.

mergify · 2026-03-12T07:04:20Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @orozery.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

gemini-code-assist

Code Review

This pull request addresses a bug where request-level cache hit statistics (num_external_computed_tokens) were being incorrectly reset upon request rescheduling (e.g., after preemption). The fix ensures that both num_cached_tokens and num_external_computed_tokens are set only once, during the initial scheduling of a request, by checking if request.num_preemptions == 0. The logic for updating these stats upon external cache load failures has also been corrected to respect this new rule. The accompanying changes in the test suite correctly verify this new behavior. The changes appear correct and effectively resolve the described issue.

mergify · 2026-03-12T07:12:11Z

Hi @orozery, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Request.num_cached_tokens and Request.num_external_computed_tokens are two fields used for reporting request level cache hit stats. While num_cached_tokens is only set for the first time a request gets schedule, num_external_computed_tokens gets re-set whenever a request tries to gets re-scheduled, in case the request is preempted or when initial allocation fails. This creates a possible inconsistency between the two fields, which can yield to wrongful deduction of the derived stat local_cache_hit, which can cause vLLM to crash in case the wrong value is negative. This commit fixes it by properly setting these two fields only after a request gets scheduled for the first time (by checking Request.num_preemptions == 0). This fields may be updated only in the case of an error reported by the connector loading external tokens, We modify a scheduler unit-test for preemptions with KV connector to verify this fields are only set once. Signed-off-by: Or Ozeri <oro@il.ibm.com>

markmc · 2026-03-12T11:43:00Z

xref #36533 and #36755

markmc · 2026-03-16T11:40:12Z

Still looking, but some initial thoughts ...

Preemption scenario (#36533)

A request is scheduled for the first time -

num_external_computed_tokens and num_cached_tokens are both set
The request gets preempted due to memory pressure
The request is re-scheduled later, num_external_computed_tokens is set, but num_cached_tokens is not

If a later num_external_computed_tokens > first num_cached tokens, we will hit the Counters can only be incremented by non-negative amounts error described in #36533

Failed transfer scenario (#36638)

#36638 describes a scenario where num_external_computed_tokens is reduced on KV transfer failure. I’m less clear on how this could cause this same “non-negative amounts” error though, since reducing num_external_computed_tokens obviously can’t drive it above num_cached_tokens.

Reproducer

I have failed to find a reliable reproducer using KV offloading and llama-3.1-8b-instruct, and an automated sweep of variations like KV offloading size, long/short inputs, long/short outputs, random and sharegpt datasets, different levels of high concurrency, and more … it did reproduce once, but I couldn’t repeat it.

Of course, we could artificially recreate the scenario in a carefully controlled unit test, but that wasn’t my goal.

Metrics Purpose

Let’s consider the metrics flow in isolation, since the error relates to metrics updates.

Request.num_cached_tokens and Request.num_external_computed_tokens get sent back to the frontend in an EngineCoreOutput principally when prefill has completed (e.g. first new tokens produced).

On the frontend side, in the OutputProcess.process_outputs() loop, we assume that the first EngineCoreOput signifies that prefill has completed, and it is only at this point that we (in IterationStats.update_from_output()) consider Request.num_cached_tokens and Request.num_external_computed_tokens. (Streaming inputs is a recent change to this invariant)

Note that in the case of preemption, the frontend only considers prefill to have been completed once, and so these two values are irrelevant for metrics once that initial EngineCoreOutput has been sent.

This is all quite challenging to validate 100% from reading the code. It could make things a lot more clear if we separated any integral scheduler accounting use of these two values from the metrics-related information associated with the “prefill completed event”.

KV Transfer Failures

It seems like KV transfer failure handling it the other purpose for tracking these two values on Request, and the scope of an error here is beyond simply incorrect metrics tracking ... whereas the most important thing is to get the update to Request.num_computed_tokens correct. This too is very twisty to validate.

Minimal fix

Very similar to Or's proposed fix, the simplest thing we can do is to ensure both values only get updated once (except for KV transfer failure handling), and at the same time e.g.

if request.num_cached_tokens < 0:
    request.num_cached_tokens = num_computed_tokens
    request.num_external_computed_tokens =  num_external_computed_tokens

I don't love request.num_preemptions == 0 as a "set only the first time" signal

But I also wonder whether we should just drop the "only set the first time" thing, since we do use the values in KV transfer failure handling, even after preemptions

orozery · 2026-03-16T12:02:29Z

I don't love request.num_preemptions == 0 as a "set only the first time" signal

The reason I prefer this condition over initializing fields to -1 is that -1 is not the true default value for this fields.
The default value is 0.
If for example a request gets aborted before even getting to the point where we set those fields, we will report -1 instead of 0.
But I agree request.num_preemptions == 0 is also somewhat fragile.
Maybe we can keep the -1 and just make sure we initialize to 0 before returning to the user if it wasn't set.

But I also wonder whether we should just drop the "only set the first time" thing, since we do use the values in KV transfer failure handling, even after preemptions

These fields are used to reporting stats to the user.
I believe they were introduced before failure handling piggy-backed on it.
The failure handling code use of this field is hacky IMO and I can think of alternative more robust ways to go without it.
(I actually implemented a fix in one of the revisions of #35223: f02a5c8).

If we allow these fields to be re-set I think we lose the desirable semantics of these stats fields.

markmc · 2026-03-18T16:56:47Z

These fields are used to reporting stats to the user.

Agree, and I think this could be done in a way that more clearly reflects the intended semantics - see #37460

I believe they were introduced before failure handling piggy-backed on it. The failure handling code use of this field is hacky IMO and I can think of alternative more robust ways to go without it. (I actually implemented a fix in one of the revisions of #35223: f02a5c8).

I agree it would be much better to not depend on these fields in the error handling code 👍

markmc · 2026-04-08T10:05:11Z

We've since iterated on #37460 to resolve this, so closing

orozery requested review from ApostaC, WoosukKwon, alexm-redhat, heheda12345, njhill, robertgshaw2-redhat and ywang96 as code owners March 12, 2026 07:03

mergify Bot added the v1 label Mar 12, 2026

mergify Bot added needs-rebase bug Something isn't working labels Mar 12, 2026

gemini-code-assist Bot reviewed Mar 12, 2026

View reviewed changes

orozery force-pushed the stats-set-num-external-tokens-once branch from f4b6538 to a3ac960 Compare March 12, 2026 07:08

mergify Bot removed the needs-rebase label Mar 12, 2026

orozery force-pushed the stats-set-num-external-tokens-once branch from a3ac960 to 0889687 Compare March 12, 2026 07:32

orozery requested a review from markmc March 12, 2026 07:36

markmc added this to Metrics & Tracing Mar 12, 2026

github-project-automation Bot moved this to Backlog in Metrics & Tracing Mar 12, 2026

markmc moved this from Backlog to In Review in Metrics & Tracing Mar 12, 2026

markmc mentioned this pull request Mar 12, 2026

[Metrics] Temporary band-aid for "Counters can only be incremented by non-negative amounts" #36812

Closed

This was referenced Mar 18, 2026

[BugFix] Ensure num_cached_tokens is non-negative for kv transfer failed requests #37354

Closed

[Core][Metrics][BugFix] Replace num_cached_tokens/num_external_computed_tokens with PrefillStats #37460

Merged

chenminghua8 mentioned this pull request Apr 1, 2026

[Bugfix][Core] Fix negative prompt token counter increments with external KV cache accounting #38712

Open

5 tasks

markmc closed this Apr 8, 2026

markmc moved this from In Review to Not planned in Metrics & Tracing Apr 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BugFix] Scheduler: Only set num_external_computed_tokens once#36859

[BugFix] Scheduler: Only set num_external_computed_tokens once#36859
orozery wants to merge 1 commit intovllm-project:mainfrom
orozery:stats-set-num-external-tokens-once

orozery commented Mar 12, 2026 •

edited by github-actions Bot

Loading

Uh oh!

mergify Bot commented Mar 12, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

mergify Bot commented Mar 12, 2026

Uh oh!

markmc commented Mar 12, 2026

Uh oh!

markmc commented Mar 16, 2026

Uh oh!

orozery commented Mar 16, 2026

Uh oh!

markmc commented Mar 18, 2026

Uh oh!

markmc commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

orozery commented Mar 12, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify Bot commented Mar 12, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

mergify Bot commented Mar 12, 2026

Uh oh!

markmc commented Mar 12, 2026

Uh oh!

markmc commented Mar 16, 2026

Preemption scenario (#36533)

Failed transfer scenario (#36638)

Reproducer

Metrics Purpose

KV Transfer Failures

Minimal fix

Uh oh!

orozery commented Mar 16, 2026

Uh oh!

markmc commented Mar 18, 2026

Uh oh!

markmc commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

orozery commented Mar 12, 2026 •

edited by github-actions Bot

Loading