Skip to content

[CI][Metrics] Fix local_cache_hit assertion after prompt tokens metrics updates#39709

Merged
markmc merged 1 commit intovllm-project:mainfrom
ZhanqiuHu:fix-cache-hit-metric-assertion
Apr 13, 2026
Merged

[CI][Metrics] Fix local_cache_hit assertion after prompt tokens metrics updates#39709
markmc merged 1 commit intovllm-project:mainfrom
ZhanqiuHu:fix-cache-hit-metric-assertion

Conversation

@ZhanqiuHu
Copy link
Copy Markdown
Contributor

@ZhanqiuHu ZhanqiuHu commented Apr 13, 2026

Purpose

PR #38709 removed the recomputed token from PromptTokenStats.update_from_output(). When all prompt tokens are cached (local + NIXL), the scheduler reduces num_cached_tokens by 1, which is now absorbed by local_cache_hit in the metric accounting. This temporarily updates the MultiConnector edge case test assertions to match the new metric semantics.

Test Plan

CI runs.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@ZhanqiuHu ZhanqiuHu marked this pull request as ready for review April 13, 2026 14:11
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the integration tests in test_multi_connector_edge_cases.py by adjusting the expected local_cache_hit metric. The assertions in both test_full_decode_gpu_cache_hit_metrics and test_partial_decode_gpu_cache_hit_metrics have been modified to expect one fewer cache hit than previously calculated. I have no feedback to provide as no review comments were submitted.

@markmc markmc added ready ONLY add when PR is ready to merge/full CI is needed and removed kv-connector labels Apr 13, 2026
@mergify mergify Bot added the kv-connector label Apr 13, 2026
@ZhanqiuHu ZhanqiuHu changed the title [Fix CI] Fix local_cache_hit assertion after prompt tokens metrics updates (temp) [Fix CI] Fix local_cache_hit assertion after prompt tokens metrics updates Apr 13, 2026
@markmc
Copy link
Copy Markdown
Member

markmc commented Apr 13, 2026

Thanks for the quick fix. I don't expect #37460 to restore the old behaviour

If we want to account "correctly" for these recomputed tokens, we need the scheduler to report those metrics correctly rather than try to infer it 👍

@markmc markmc enabled auto-merge (squash) April 13, 2026 14:21
@markmc markmc changed the title [Fix CI] Fix local_cache_hit assertion after prompt tokens metrics updates [CI][Metrics] Fix local_cache_hit assertion after prompt tokens metrics updates Apr 13, 2026
@markmc markmc moved this from Backlog to Ready in Metrics & Tracing Apr 13, 2026
@markmc markmc merged commit 10d9872 into vllm-project:main Apr 13, 2026
29 checks passed
@github-project-automation github-project-automation Bot moved this from Ready to Done in Metrics & Tracing Apr 13, 2026
jonathanc-n pushed a commit to jonathanc-n/vllm that referenced this pull request Apr 13, 2026
…cs updates (vllm-project#39709)

Signed-off-by: ZhanqiuHu <zhu@redhat.com>
Signed-off-by: Jonathan Chen <chenleejonathan@gmail.com>
wojciech-wais pushed a commit to wojciech-wais/vllm that referenced this pull request Apr 13, 2026
whk-lab pushed a commit to whk-lab/vllm that referenced this pull request Apr 23, 2026
avinashsingh77 pushed a commit to avinashsingh77/vllm that referenced this pull request Apr 27, 2026
…cs updates (vllm-project#39709)

Signed-off-by: ZhanqiuHu <zhu@redhat.com>
Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kv-connector ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants