[Core][Metrics] Remove `vllm:prompt_tokens_recomputed` metric by markmc · Pull Request #38709 · vllm-project/vllm

markmc · 2026-04-01T09:07:17Z

In the case of a full local prefix cache hit (prompt length N), we actually only use N-1 tokens. The vllm:prompt_tokens_recomputed was intended to count how many cached tokens we are effectively discarding because of this.

KVCacheManager.get_computed_blocks():
    ...
    # NOTE: When all tokens hit the cache, we must recompute the last token
    # to obtain logits. [...]
    max_cache_hit_length = request.num_tokens - 1

However, even here, we can't assume the last token would have been a cache hit and should be counted as "recomputed". Given this, the metric seems quite misguided, in retrospect.

The metric was added as a side-effect in #33290 in order to make sense of the fact that:

vllm:prompt_tokens_by_source_total{source="external_kv_transfer"}

will include a token that is recomputed. See this comment:

Note: external_kv_transfer reports the actual number of tokens
transferred (e.g., prompt length N), while prompt_tokens_cached_total
reports the adjusted count (e.g., N-1). The last token is both
transferred AND recomputed locally, so there's overlap.

However, it makes more sense for the external_kv_transfer count to reflect only tokens we actually used, not any recomputed tokens. This will be done in ##37460.

I'm not aware of any user demand for this metric, or anyone relying on it now. So it seems safe to remove it, rather than go through a deprecation period.

markmc · 2026-04-01T09:08:08Z

/cc @ZhanqiuHu

gemini-code-assist

Code Review

This pull request removes the tracking and reporting of recomputed tokens from the metrics system. Specifically, it deletes the vllm:prompt_tokens_recomputed metric, simplifies the PromptTokenStats class by removing the recomputation logic and its associated invariants, and updates the relevant tests to reflect these changes. I have no feedback to provide.

mergify · 2026-04-09T12:17:03Z

Deprecation notice: This pull request comes from a fork and was rebased using bot_account impersonation. This capability will be removed on July 1, 2026. After this date, the rebase action will no longer be able to rebase fork pull requests with this configuration. Please switch to the update action/command to ensure compatibility going forward.

In the case of a full local prefix cache hit (prompt length N), we actually only use N-1 tokens. The `vllm:prompt_tokens_recomputed` was intended to count how many cached tokens we are effectively discarding because of this. ``` KVCacheManager.get_computed_blocks(): ... # NOTE: When all tokens hit the cache, we must recompute the last token # to obtain logits. [...] max_cache_hit_length = request.num_tokens - 1 ``` However, even here, we can't assume the last token would have been a cache hit and should be counted as "recomputed". Given this, the metric seems quite misguided, in retrospect. The metric was added as a side-effect in vllm-project#33290 in order to make sense of the fact that: ``` vllm:prompt_tokens_by_source_total{source="external_kv_transfer"} ``` will include a token that is recomputed. See this comment: > Note: external_kv_transfer reports the actual number of tokens > transferred (e.g., prompt length N), while prompt_tokens_cached_total > reports the adjusted count (e.g., N-1). The last token is both > transferred AND recomputed locally, so there's overlap. However, it makes more sense for the `external_kv_transfer` count to reflect only tokens we actually used, not any recomputed tokens. This will be done in #vllm-project#37460. I'm not aware of any user demand for this metric, or anyone relying on it now. So it seems safe to remove it, rather than go through a deprecation period. Signed-off-by: Mark McLoughlin <markmc@redhat.com>

…roject#38709) Signed-off-by: Mark McLoughlin <markmc@redhat.com>

…roject#38709) Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>

markmc requested a review from orozery April 1, 2026 09:07

mergify Bot added the v1 label Apr 1, 2026

gemini-code-assist Bot reviewed Apr 1, 2026

View reviewed changes

markmc mentioned this pull request Apr 1, 2026

[Core][Metrics][BugFix] Replace num_cached_tokens/num_external_computed_tokens with PrefillStats #37460

Merged

markmc requested review from ApostaC, WoosukKwon, alexm-redhat, heheda12345, njhill, robertgshaw2-redhat and ywang96 as code owners April 1, 2026 09:32

mergify Bot added the kv-connector label Apr 1, 2026

markmc force-pushed the remove-prompt-tokens-recomputed branch from fa8c338 to f857b8d Compare April 1, 2026 09:37

markmc removed request for ApostaC, WoosukKwon, alexm-redhat, heheda12345, njhill, robertgshaw2-redhat and ywang96 April 1, 2026 09:37

markmc added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 1, 2026

markmc mentioned this pull request Apr 1, 2026

[Bugfix][Core] Fix negative prompt token counter increments with external KV cache accounting #38712

Open

5 tasks

markmc force-pushed the remove-prompt-tokens-recomputed branch 2 times, most recently from 309cc9b to bd96727 Compare April 8, 2026 09:52

markmc added this to Metrics & Tracing Apr 8, 2026

github-project-automation Bot moved this to Backlog in Metrics & Tracing Apr 8, 2026

markmc moved this from Backlog to Ready in Metrics & Tracing Apr 8, 2026

orozery approved these changes Apr 9, 2026

View reviewed changes

markmc force-pushed the remove-prompt-tokens-recomputed branch from bd96727 to e773216 Compare April 9, 2026 12:17

markmc force-pushed the remove-prompt-tokens-recomputed branch from e773216 to a167d11 Compare April 10, 2026 11:48

orozery merged commit 72ff142 into vllm-project:main Apr 12, 2026
49 checks passed

github-project-automation Bot moved this from Ready to Done in Metrics & Tracing Apr 12, 2026

ZhanqiuHu mentioned this pull request Apr 13, 2026

[CI][Metrics] Fix local_cache_hit assertion after prompt tokens metrics updates #39709

Merged

5 tasks

wojciech-wais pushed a commit to wojciech-wais/vllm that referenced this pull request Apr 13, 2026

[Core][Metrics] Remove vllm:prompt_tokens_recomputed metric (vllm-p…

3eae463

…roject#38709) Signed-off-by: Mark McLoughlin <markmc@redhat.com>

whk-lab pushed a commit to whk-lab/vllm that referenced this pull request Apr 23, 2026

[Core][Metrics] Remove vllm:prompt_tokens_recomputed metric (vllm-p…

4492bea

…roject#38709) Signed-off-by: Mark McLoughlin <markmc@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Core][Metrics] Remove `vllm:prompt_tokens_recomputed` metric#38709

[Core][Metrics] Remove `vllm:prompt_tokens_recomputed` metric#38709
orozery merged 1 commit intovllm-project:mainfrom
markmc:remove-prompt-tokens-recomputed

markmc commented Apr 1, 2026

Uh oh!

markmc commented Apr 1, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

mergify Bot commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

markmc commented Apr 1, 2026

Uh oh!

markmc commented Apr 1, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

mergify Bot commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants