[Core][Metrics][BugFix] Replace num_cached_tokens/num_external_computed_tokens with PrefillStats by markmc · Pull Request #37460 · vllm-project/vllm

markmc · 2026-03-18T16:52:40Z

Related to the discussion in #36859 and the Counters can only be incremented by non-negative amounts error with the vllm:prompt_tokens_by_source_total metric.

In OutputProcessor, we take the first EngineCoreOutput as a signal that prefill has completed, and record certain statistics about it.

On the scheduler side, because of preemption, we might have prefills that are scheduled but never completed, or we might need to recompute an already completed prefill. To add clarity, we use PrefillStats to track the first scheduled prefill so that the stats can be returned to the frontend via EngineCoreOutput.

num_cached_tokens was previously used for KV transfer failure recovery, but this is no longer true as of #38096. We also no longer attempt to correct these prefill metrics if KV transfers failed, since this introduced unjustified brittleness to an already brittle code path.

markmc · 2026-03-18T16:56:07Z

I've just realized an obvious gap - the prefill stats aren't updated on the failed-KV-transfer path

gemini-code-assist

Code Review

This pull request introduces PrefillStats to more accurately track and report prefill-related metrics, which is a valuable improvement for clarity and correctness, especially in scenarios involving preemption. The refactoring to use a dedicated PrefillStats object instead of separate numerical values is a good design choice. The changes are consistently applied across the scheduler, engine outputs, request objects, and tests. I have found one critical issue in the metrics calculation logic that leads to double-counting of tokens, for which I've provided a detailed comment and a suggested fix.

orozery · 2026-03-19T04:21:48Z

num_cached_tokens and num_external_computed_tokens remain in place and untouched for now - next we will look at what the remaining uses of these are.

Right now we basically duplicate these fields into prefill stats.
But we already have problems with these original fields, so I think it's better we set prefill stats directly, removing any connection to the old fields.

And I guess once we do that, we can think whether we want to keep the old fields (probably not?).

markmc · 2026-03-19T07:51:31Z

num_cached_tokens and num_external_computed_tokens remain in place and untouched for now - next we will look at what the remaining uses of these are.

Right now we basically duplicate these fields into prefill stats. But we already have problems with these original fields, so I think it's better we set prefill stats directly, removing any connection to the old fields.

IMO, the PR basically does that (set the stats directly) ?

And I guess once we do that, we can think whether we want to keep the old fields (probably not?).

I'm going to take a look at this, and I expect I can remove/rework these fields without affecting PrefillStats

orozery · 2026-03-19T10:13:16Z

MO, the PR basically does that (set the stats directly) ?

Right, my bad.

markmc · 2026-03-24T23:15:46Z

@orozery from #36859 (comment)

The failure handling code use of this field is hacky IMO and I can think of alternative more robust ways to go without it. (I actually implemented a fix in one of the revisions of #35223: f02a5c8).

I've incorporated this approach from that earlier revision now, PTAL

orozery · 2026-03-25T05:09:13Z

I've incorporated this approach from that earlier revision now, PTAL

I think it's better to create a separate PR for these changes in the failure recovery flow.
They stand for themselves, and don't relay on the other changes in this PR.

markmc · 2026-03-25T11:21:53Z

Moved to draft since #38096 needs to merge first now

mergify · 2026-04-13T02:47:21Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @markmc.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

…s with PrefillStats In OutputProcessor, we take the first EngineCoreOutput as a signal that prefill has completed, and record certain statistics about it. On the scheduler side, because of preemption, we might have prefills that are scheduled but never completed, or we might need to recompute an already completed prefill. To add clarity, we use PrefillStats to track the first scheduled prefill so that the stats can be returned to the frontend via EngineCoreOutput. num_cached_tokens was previously used for KV transfer failure recovery, but this is no longer true as of vllm-project#38096. We also no longer attempt to correct these prefill metrics if KV transfers failed, since this introduced unjustified brittleness to an already brittle code path. Signed-off-by: Mark McLoughlin <markmc@redhat.com>

markmc · 2026-04-13T09:29:22Z

FTR, this is a significant change in the most recent update:

We now track stats from the first time the prefill is scheduled (previously: first time prefill is completed) since the proven-error-prone effort to update the stats in the KV transfer error-handling path seems unjustified

Or points out: > we call take_prefill_stats at update_from_output in the same > step the request was first scheduled to run. So I don't think > there's a way the request will be preempted before we set > Request.prefill_stats to None. Co-authored-by: Or Ozeri <or@ozery.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com>

orozery

LGTM

ZhanqiuHu · 2026-04-14T15:33:19Z

Note: This PR changed the prefill stats reported in the case of multi kv connector (NIXL connector + CPUOffloading connector). It impacts a couple of test cases in tests/v1/kv_connector/nixl_integration/run_multi_connector_edge_case_test.sh, which uses the metrics reported to check if the transferred kv cache and offloading kv cache are expected.

Will post a PR that update the test cases accordingly.

NickLucche · 2026-04-14T16:28:31Z

yep it broke CI for v1/kv_connector/nixl_integration/run_multi_connector_edge_case_test.sh

…ed_tokens with PrefillStats (vllm-project#37460) Related to `Counters can only be incremented by non-negative amounts` error with the `vllm:prompt_tokens_by_source_total` metric. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Or Ozeri <or@ozery.com> Signed-off-by: zengxian <xiangdong.zeng@intel.com>

…fferent vllm version (#8426) ### What this PR does / why we need it? This fix vllm-project/vllm#37460 This PR introduces version-specific logic to handle `num_cached_tokens` and `num_external_computed_tokens` in the scheduler, ensuring compatibility with vLLM 0.19.0 and maintaining legacy support for older versions via `prefill_stats`. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.19.0 - vLLM main: vllm-project/vllm@6f786f2 Signed-off-by: wangli <wangli858794774@gmail.com>

…fferent vllm version (vllm-project#8426) ### What this PR does / why we need it? This fix vllm-project/vllm#37460 This PR introduces version-specific logic to handle `num_cached_tokens` and `num_external_computed_tokens` in the scheduler, ensuring compatibility with vLLM 0.19.0 and maintaining legacy support for older versions via `prefill_stats`. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.19.0 - vLLM main: vllm-project/vllm@6f786f2 Signed-off-by: wangli <wangli858794774@gmail.com>

…fferent vllm version (vllm-project#8426) ### What this PR does / why we need it? This fix vllm-project/vllm#37460 This PR introduces version-specific logic to handle `num_cached_tokens` and `num_external_computed_tokens` in the scheduler, ensuring compatibility with vLLM 0.19.0 and maintaining legacy support for older versions via `prefill_stats`. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.19.0 - vLLM main: vllm-project/vllm@6f786f2 Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: tfhddd <2272751277@qq.com>

…fferent vllm version (vllm-project#8426) ### What this PR does / why we need it? This fix vllm-project/vllm#37460 This PR introduces version-specific logic to handle `num_cached_tokens` and `num_external_computed_tokens` in the scheduler, ensuring compatibility with vLLM 0.19.0 and maintaining legacy support for older versions via `prefill_stats`. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.19.0 - vLLM main: vllm-project/vllm@6f786f2 Signed-off-by: wangli <wangli858794774@gmail.com>

…ed_tokens with PrefillStats (vllm-project#37460) Related to `Counters can only be incremented by non-negative amounts` error with the `vllm:prompt_tokens_by_source_total` metric. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Or Ozeri <or@ozery.com>

Due to vllm-project/vllm#37460

…fferent vllm version (vllm-project#8426) ### What this PR does / why we need it? This fix vllm-project/vllm#37460 This PR introduces version-specific logic to handle `num_cached_tokens` and `num_external_computed_tokens` in the scheduler, ensuring compatibility with vLLM 0.19.0 and maintaining legacy support for older versions via `prefill_stats`. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.19.0 - vLLM main: vllm-project/vllm@6f786f2 Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: guxin108 <1252896542@qq.com>

…ed_tokens with PrefillStats (vllm-project#37460) Related to `Counters can only be incremented by non-negative amounts` error with the `vllm:prompt_tokens_by_source_total` metric. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Or Ozeri <or@ozery.com> Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>

…fferent vllm version (vllm-project#8426) ### What this PR does / why we need it? This fix vllm-project/vllm#37460 This PR introduces version-specific logic to handle `num_cached_tokens` and `num_external_computed_tokens` in the scheduler, ensuring compatibility with vLLM 0.19.0 and maintaining legacy support for older versions via `prefill_stats`. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.19.0 - vLLM main: vllm-project/vllm@6f786f2 Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: zouyida2052 <zouyida2002@gmail.com>

markmc requested review from ApostaC, WoosukKwon, alexm-redhat, heheda12345, njhill, orozery, robertgshaw2-redhat and ywang96 as code owners March 18, 2026 16:52

markmc added this to Metrics & Tracing Mar 18, 2026

github-project-automation Bot moved this to Backlog in Metrics & Tracing Mar 18, 2026

markmc moved this from Backlog to In Review in Metrics & Tracing Mar 18, 2026

mergify Bot added the v1 label Mar 18, 2026

markmc mentioned this pull request Mar 18, 2026

[BugFix] Scheduler: Only set num_external_computed_tokens once #36859

Closed

gemini-code-assist Bot reviewed Mar 18, 2026

View reviewed changes

Comment thread vllm/v1/metrics/stats.py Outdated

mergify Bot added the kv-connector label Mar 18, 2026

markmc added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 18, 2026

markmc force-pushed the prefill-stats branch from 4a394bd to 15db8ca Compare March 24, 2026 15:37

markmc changed the title ~~[Metrics][Core] Add PrefillStats to EngineCoreOutputs~~ [Metrics][Core] Replace num_cached_tokens and num_external_computed_tokens with PrefillStats Mar 24, 2026

markmc mentioned this pull request Mar 25, 2026

[Core][KV Connector] Remove use of num_cached_tokens in error handling #38096

Merged

markmc removed the ready ONLY add when PR is ready to merge/full CI is needed label Mar 25, 2026

markmc marked this pull request as draft March 25, 2026 11:21

markmc force-pushed the prefill-stats branch from 15db8ca to c5bd021 Compare March 25, 2026 11:21

mergify Bot added the needs-rebase label Apr 13, 2026

markmc force-pushed the prefill-stats branch from 38eb997 to 998eb5d Compare April 13, 2026 06:53

mergify Bot removed the needs-rebase label Apr 13, 2026

orozery reviewed Apr 13, 2026

View reviewed changes

Comment thread vllm/v1/metrics/stats.py Outdated

Comment thread vllm/v1/core/sched/scheduler.py Outdated

ZhanqiuHu mentioned this pull request Apr 13, 2026

[CI][Metrics] Fix local_cache_hit assertion after prompt tokens metrics updates #39709

Merged

5 tasks

orozery approved these changes Apr 14, 2026

View reviewed changes

markmc merged commit d3af8c1 into vllm-project:main Apr 14, 2026
56 checks passed

github-project-automation Bot moved this from Ready to Done in Metrics & Tracing Apr 14, 2026

ZhanqiuHu mentioned this pull request Apr 14, 2026

[CI][KVConnector][Metrics] Update multi KV connector edge case according to prefill stats changes #39808

Merged

6 tasks

Potabk mentioned this pull request Apr 20, 2026

[BugFix] Handle num_cached_tokens/num_external_computed_tokens for different vllm version vllm-project/vllm-ascend#8426

Merged

njhill mentioned this pull request Apr 24, 2026

fix: prefill stats added to EngineCoreOutput Inferact/vllm-frontend-rs#133

Merged

njhill added a commit to Inferact/vllm-frontend-rs that referenced this pull request Apr 24, 2026

fix: prefill stats added to EngineCoreOutput (#133)

49f7b3f

Due to vllm-project/vllm#37460

Uh oh!

Conversation

markmc commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

markmc commented Mar 18, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

orozery commented Mar 19, 2026

Uh oh!

markmc commented Mar 19, 2026

Uh oh!

orozery commented Mar 19, 2026

Uh oh!

markmc commented Mar 24, 2026

Uh oh!

orozery commented Mar 25, 2026

Uh oh!

markmc commented Mar 25, 2026

Uh oh!

mergify Bot commented Apr 13, 2026

Uh oh!

markmc commented Apr 13, 2026

Uh oh!

Uh oh!

Uh oh!

orozery left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ZhanqiuHu commented Apr 14, 2026

Uh oh!

NickLucche commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

markmc commented Mar 18, 2026 •

edited

Loading