[CI][Metrics] Fix local_cache_hit assertion after prompt tokens metrics updates by ZhanqiuHu · Pull Request #39709 · vllm-project/vllm

ZhanqiuHu · 2026-04-13T14:11:10Z

Purpose

PR #38709 removed the recomputed token from PromptTokenStats.update_from_output(). When all prompt tokens are cached (local + NIXL), the scheduler reduces num_cached_tokens by 1, which is now absorbed by local_cache_hit in the metric accounting. This temporarily updates the MultiConnector edge case test assertions to match the new metric semantics.

Test Plan

CI runs.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: ZhanqiuHu <zhu@redhat.com>

gemini-code-assist

Code Review

This pull request updates the integration tests in test_multi_connector_edge_cases.py by adjusting the expected local_cache_hit metric. The assertions in both test_full_decode_gpu_cache_hit_metrics and test_partial_decode_gpu_cache_hit_metrics have been modified to expect one fewer cache hit than previously calculated. I have no feedback to provide as no review comments were submitted.

markmc · 2026-04-13T14:21:20Z

Thanks for the quick fix. I don't expect #37460 to restore the old behaviour

If we want to account "correctly" for these recomputed tokens, we need the scheduler to report those metrics correctly rather than try to infer it 👍

…cs updates (vllm-project#39709) Signed-off-by: ZhanqiuHu <zhu@redhat.com> Signed-off-by: Jonathan Chen <chenleejonathan@gmail.com>

…cs updates (vllm-project#39709) Signed-off-by: ZhanqiuHu <zhu@redhat.com>

…cs updates (vllm-project#39709) Signed-off-by: ZhanqiuHu <zhu@redhat.com> Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>

[Test] Fix local_cache_hit assertion after prompt tokens metrics updates

4eb75a0

Signed-off-by: ZhanqiuHu <zhu@redhat.com>

ZhanqiuHu marked this pull request as ready for review April 13, 2026 14:11

ZhanqiuHu requested review from ApostaC and orozery as code owners April 13, 2026 14:11

mergify Bot added v1 kv-connector labels Apr 13, 2026

gemini-code-assist Bot reviewed Apr 13, 2026

View reviewed changes

markmc added ready ONLY add when PR is ready to merge/full CI is needed and removed kv-connector labels Apr 13, 2026

mergify Bot added the kv-connector label Apr 13, 2026

ZhanqiuHu changed the title ~~[Fix CI] Fix local_cache_hit assertion after prompt tokens metrics updates (temp)~~ [Fix CI] Fix local_cache_hit assertion after prompt tokens metrics updates Apr 13, 2026

markmc approved these changes Apr 13, 2026

View reviewed changes

markmc enabled auto-merge (squash) April 13, 2026 14:21

markmc changed the title ~~[Fix CI] Fix local_cache_hit assertion after prompt tokens metrics updates~~ [CI][Metrics] Fix local_cache_hit assertion after prompt tokens metrics updates Apr 13, 2026

markmc added this to Metrics & Tracing Apr 13, 2026

github-project-automation Bot moved this to Backlog in Metrics & Tracing Apr 13, 2026

markmc moved this from Backlog to Ready in Metrics & Tracing Apr 13, 2026

markmc merged commit 10d9872 into vllm-project:main Apr 13, 2026
29 checks passed

github-project-automation Bot moved this from Ready to Done in Metrics & Tracing Apr 13, 2026

wojciech-wais pushed a commit to wojciech-wais/vllm that referenced this pull request Apr 13, 2026

[CI][Metrics] Fix local_cache_hit assertion after prompt tokens metri…

f48382d

…cs updates (vllm-project#39709) Signed-off-by: ZhanqiuHu <zhu@redhat.com>

markmc mentioned this pull request Apr 14, 2026

[Misc] toy_proxy_server handle min_tokens #39706

Merged

whk-lab pushed a commit to whk-lab/vllm that referenced this pull request Apr 23, 2026

[CI][Metrics] Fix local_cache_hit assertion after prompt tokens metri…

045f72b

…cs updates (vllm-project#39709) Signed-off-by: ZhanqiuHu <zhu@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CI][Metrics] Fix local_cache_hit assertion after prompt tokens metrics updates#39709

[CI][Metrics] Fix local_cache_hit assertion after prompt tokens metrics updates#39709
markmc merged 1 commit intovllm-project:mainfrom
ZhanqiuHu:fix-cache-hit-metric-assertion

ZhanqiuHu commented Apr 13, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

markmc commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ZhanqiuHu commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

markmc commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ZhanqiuHu commented Apr 13, 2026 •

edited

Loading