Fix(Scheduler): Reset num_cached_tokens on preemption to prevent acco… by xueliangyang-oeuler · Pull Request #36757 · vllm-project/vllm

xueliangyang-oeuler · 2026-03-11T07:59:42Z

…unting crash (#36755)

Purpose

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request correctly addresses a potential crash during preemption by resetting num_cached_tokens. It also includes a fix for a potential dtype mismatch in the TRT-LLM FP8 MoE expert implementation. The changes appear correct and improve the robustness of the scheduler and model execution layers. I have added one comment suggesting a related improvement for consistency.

_{Note: Security Review did not run due to the size of the PR.}

gemini-code-assist · 2026-03-11T08:08:31Z

+        if e_score_correction_bias is not None:
+            e_score_correction_bias = e_score_correction_bias.to(hidden_states.dtype)
+


This explicit type casting is a good safeguard against potential dtype mismatches. For consistency, it would be beneficial to apply the same logic to the _apply_per_tensor method in this file, which also uses e_score_correction_bias but currently lacks this explicit cast. This would improve robustness and prevent similar potential issues there.

For example, you could add this to _apply_per_tensor:

if e_score_correction_bias is not None: e_score_correction_bias = e_score_correction_bias.to(hidden_states.dtype)

… non-negative amounts" Since `num_computed_tokens`, `num_cached_tokens`, and `num_external_computed_tokens` accounting seems quite brittle currently - with preemption reset bugs and P/D disaggregation accounting issues - add a defensive check to detect and prevent instances of Prometheus counter errors: ``` ValueError: Counters can only be incremented by non-negative amounts ``` The invariant check enforces: ``` prompt_len >= num_cached_tokens >= num_external_computed_tokens >= 0 ``` with the additional nuance that when all tokens are cached, the scheduler forces recomputation of the last token, so the: ``` num_external_computed_tokens <= num_cached_tokens + recomputed ``` When the invariant is violated, we log a a warning once with diagnostic details, and discard suspect cache metrics. Obviously, the accounting should be fixed and made more robust and future-proof, at which point we can remove this check (perhaps replacing with a simple assertion). Related to issues vllm-project#36533, vllm-project#36755 and PRs vllm-project#36638, vllm-project#36752, vllm-project#36757. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com>

…orrection_bias dtype conversion (vllm-project#36755) Signed-off-by: xueliangyang-oeuler <yxl546827391@gmail.com>

mergify Bot added nvidia v1 labels Mar 11, 2026

github-project-automation Bot added this to NVIDIA Mar 11, 2026

gemini-code-assist Bot reviewed Mar 11, 2026

View reviewed changes

markmc added this to Metrics & Tracing Mar 11, 2026

github-project-automation Bot moved this to Backlog in Metrics & Tracing Mar 11, 2026

markmc mentioned this pull request Mar 11, 2026

[Metrics] Temporary band-aid for "Counters can only be incremented by non-negative amounts" #36812

Closed

Fix(Scheduler): Reset num_cached_tokens on preemption + add e_score_c…

0e43246

…orrection_bias dtype conversion (vllm-project#36755) Signed-off-by: xueliangyang-oeuler <yxl546827391@gmail.com>

hmellor added the closed-as-slop Pull request determined to be low effort and agent generated label Mar 25, 2026

hmellor closed this Mar 25, 2026

github-project-automation Bot moved this to Done in NVIDIA Mar 25, 2026

markmc moved this from Backlog to In Review in Metrics & Tracing Apr 8, 2026

markmc moved this from In Review to Not planned in Metrics & Tracing Apr 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix(Scheduler): Reset num_cached_tokens on preemption to prevent acco…#36757

Fix(Scheduler): Reset num_cached_tokens on preemption to prevent acco…#36757
xueliangyang-oeuler wants to merge 1 commit intovllm-project:mainfrom
xueliangyang-oeuler:fix-#36755

xueliangyang-oeuler commented Mar 11, 2026 •

edited by github-actions Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		if e_score_correction_bias is not None:
		e_score_correction_bias = e_score_correction_bias.to(hidden_states.dtype)

Uh oh!

Conversation

xueliangyang-oeuler commented Mar 11, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xueliangyang-oeuler commented Mar 11, 2026 •

edited by github-actions Bot

Loading