Skip to content

[Metrics] Temporary band-aid for "Counters can only be incremented by non-negative amounts"#36812

Closed
markmc wants to merge 1 commit intovllm-project:mainfrom
markmc:prompt-token-stats-negative-inc
Closed

[Metrics] Temporary band-aid for "Counters can only be incremented by non-negative amounts"#36812
markmc wants to merge 1 commit intovllm-project:mainfrom
markmc:prompt-token-stats-negative-inc

Conversation

@markmc
Copy link
Copy Markdown
Member

@markmc markmc commented Mar 11, 2026

Since num_computed_tokens, num_cached_tokens, and num_external_computed_tokens accounting seems quite brittle currently - with preemption reset bugs and P/D disaggregation accounting issues - add a defensive check to detect and prevent instances of Prometheus counter errors:

ValueError: Counters can only be incremented by non-negative amounts

The invariant check enforces:

prompt_len >= num_cached_tokens >= num_external_computed_tokens >= 0

with the additional nuance that when all tokens are cached, the scheduler forces recomputation of the last token, so the:

num_external_computed_tokens <= num_cached_tokens + recomputed

When the invariant is violated, we log a a warning once with diagnostic details, and discard suspect cache metrics.

Obviously, the accounting should be fixed and made more robust and future-proof, at which point we can remove this check (perhaps replacing with a simple assertion).

Related to issues #36533, #36755 and PRs #36638, #36752, #36757.

@markmc
Copy link
Copy Markdown
Member Author

markmc commented Mar 11, 2026

/cc @ZhanqiuHu

@mergify mergify Bot added the v1 label Mar 11, 2026
@markmc
Copy link
Copy Markdown
Member Author

markmc commented Mar 11, 2026

To be clear, I don't like this, I see it as purely a temporary "stop the bleeding" band-aid that is also backportable to v0.16.0 where this first showed up

We still need to solidify the num_computed_tokens, num_cached_tokens, and num_external_computed_tokens accounting ... hopefully in a way that easier to maintain

@markmc markmc added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 11, 2026
@markmc markmc moved this from Backlog to Ready in Metrics & Tracing Mar 11, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a defensive check to prevent crashes from metrics accounting bugs, specifically when Prometheus counters are incremented by negative values. The changes involve adding an invariant check for token counts and logging a warning when the invariant is violated. I've found a critical issue with the invariant check itself, as it doesn't fully prevent negative values in all cases. I've also identified an issue with the diagnostic logging that would make debugging harder. My review includes suggestions to address both of these points.

Comment thread vllm/v1/metrics/stats.py
Comment thread vllm/v1/metrics/stats.py Outdated
… non-negative amounts"

Since `num_computed_tokens`, `num_cached_tokens`, and
`num_external_computed_tokens` accounting seems quite brittle currently -
with preemption reset bugs and P/D disaggregation accounting issues -
add a defensive check to detect and prevent instances of Prometheus
counter errors:

```
ValueError: Counters can only be incremented by non-negative amounts
```

The invariant check enforces:

```
prompt_len >= num_cached_tokens >= num_external_computed_tokens >= 0
```

with the additional nuance that when all tokens are cached, the scheduler
forces recomputation of the last token, so the:

```
num_external_computed_tokens <= num_cached_tokens + recomputed
```

When the invariant is violated, we log a a warning once with diagnostic
details, and discard suspect cache metrics.

Obviously, the accounting should be fixed and made more robust and
future-proof, at which point we can remove this check (perhaps replacing
with a simple assertion).

Related to issues vllm-project#36533, vllm-project#36755 and PRs vllm-project#36638, vllm-project#36752, vllm-project#36757.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
@markmc markmc force-pushed the prompt-token-stats-negative-inc branch from 6b2de38 to 04a4886 Compare March 11, 2026 19:28
@markmc
Copy link
Copy Markdown
Member Author

markmc commented Mar 12, 2026

From Slack, @orozery view is that #36859 is a correct, simpler fix and there's no need for a temporary, defensive check like this

@markmc
Copy link
Copy Markdown
Member Author

markmc commented Apr 8, 2026

The issue has been fixed on main since #37160 introduced this band-aid:

        self.local_cache_hit += max(
            0, (num_cached_tokens + recomputed - num_external_computed_tokens)

#37460 is the current candidate for a long-term fix

@markmc markmc closed this Apr 8, 2026
@markmc markmc moved this from Ready to Not planned in Metrics & Tracing Apr 8, 2026
@ZhanqiuHu
Copy link
Copy Markdown
Contributor

Quick question, do we have some metrics to distinguish CPU cache hit vs. GPU cache hit when CPU offloading is on?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

Status: Not planned

Development

Successfully merging this pull request may close these issues.

2 participants