Skip to content

[Metrics] Complete removal of deprecated vllm:time_per_output_token_seconds metric#32661

Merged
markmc merged 1 commit intovllm-project:mainfrom
carlory:metrics-0-13-removal
Jan 20, 2026
Merged

[Metrics] Complete removal of deprecated vllm:time_per_output_token_seconds metric#32661
markmc merged 1 commit intovllm-project:mainfrom
carlory:metrics-0-13-removal

Conversation

@carlory
Copy link
Copy Markdown
Contributor

@carlory carlory commented Jan 20, 2026

Summary

This PR completes the removal of the deprecated vllm:time_per_output_token_seconds metric that was deprecated in v0.11, hidden in v0.12, and scheduled for removal in v0.13.

Changes Made

1. Code Removal (vllm/v1/metrics/loggers.py)

  • Removed deprecated histogram definition (39 lines)
  • Removed conditional observation to deprecated metric

2. Test Updates (tests/entrypoints/instrumentator/test_metrics.py)

  • Removed from HIDDEN_DEPRECATED_METRICS list
  • Updated _get_expected_values() to use vllm:inter_token_latency_seconds
  • Removed from EXPECTED_METRICS_V1 list

3. Dashboard Updates

  • Grafana: 10 references updated to vllm:inter_token_latency_seconds
  • Perses: 10 references updated to vllm:inter_token_latency_seconds

4. Documentation

  • Updated metrics.md with correct metric reference

Test Validation

✅ Python syntax checks passed
✅ JSON validation passed
✅ YAML validation passed
✅ No deprecated metric references remain
✅ 34+ replacement metric references confirmed

Notes

  • vllm:request_time_per_output_token_seconds (different metric) preserved
  • Replacement metric has identical functionality and buckets
  • Complete removal following v0.13 deprecation policy

@mergify
Copy link
Copy Markdown

mergify bot commented Jan 20, 2026

Documentation preview: https://vllm--32661.org.readthedocs.build/en/32661/

@mergify mergify bot added documentation Improvements or additions to documentation v1 labels Jan 20, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request completes the removal of the deprecated vllm:time_per_output_token_seconds metric. The changes are comprehensive, covering code, tests, documentation, and dashboard configurations. The deprecated metric is consistently replaced with vllm:inter_token_latency_seconds. The removal of the old metric's definition and observation logic in vllm/v1/metrics/loggers.py is clean. The corresponding test updates in tests/entrypoints/instrumentator/test_metrics.py correctly reflect this removal. The updates to Grafana and Perses dashboards, as well as the documentation, are also correct. The changes are well-executed and I have no issues to report.

…econds metric

This commit completes the removal of the deprecated metrics that were:
- Deprecated in v0.11 (replaced by vllm:inter_token_latency_seconds)
- Hidden in v0.12 (behind --show-hidden-metrics-for-version=0.11 flag)
- Completely removed in v0.13 (this commit)

Changes:
1. Removed deprecated histogram definition from PrometheusStatLogger
2. Updated test files to use replacement metric vllm:inter_token_latency_seconds
3. Updated Grafana dashboard with replacement metric (10 references)
4. Updated Perses dashboard with replacement metric (10 references)
5. Updated design documentation to reflect current metrics

The replacement metric vllm:inter_token_latency_seconds has identical functionality
and bucket definitions. The different metric vllm:request_time_per_output_token_seconds
is preserved as it is still actively used.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Signed-off-by: carlory <baofa.fan@daocloud.io>
@markmc
Copy link
Copy Markdown
Member

markmc commented Jan 20, 2026

Thank you!

This duplicates #30992 and #31675 but looks more comprehensive

@markmc markmc moved this from Backlog to Ready in Metrics & Tracing Jan 20, 2026
@markmc markmc added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 20, 2026
@markmc markmc enabled auto-merge (squash) January 20, 2026 10:47
@carlory
Copy link
Copy Markdown
Contributor Author

carlory commented Jan 20, 2026

This duplicates #30992 and #31675

Sorry for that, I didn't know there're another PRs.

@markmc markmc merged commit bb91720 into vllm-project:main Jan 20, 2026
49 checks passed
@github-project-automation github-project-automation bot moved this from Ready to Done in Metrics & Tracing Jan 20, 2026
gopalsarda pushed a commit to gopalsarda/vllm that referenced this pull request Jan 20, 2026
…econds metric (vllm-project#32661)

This PR completes the removal of the deprecated vllm:time_per_output_token_seconds
metric that was deprecated in v0.11, hidden in v0.12, scheduled for removal in v0.13,
but delayed until v0.15.

Signed-off-by: carlory <baofa.fan@daocloud.io>
Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
@carlory carlory deleted the metrics-0-13-removal branch January 21, 2026 02:04
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
…econds metric (vllm-project#32661)

This PR completes the removal of the deprecated vllm:time_per_output_token_seconds
metric that was deprecated in v0.11, hidden in v0.12, scheduled for removal in v0.13,
but delayed until v0.15.

Signed-off-by: carlory <baofa.fan@daocloud.io>
Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
monajafi-amd pushed a commit to monajafi-amd/vllm that referenced this pull request Jan 23, 2026
…econds metric (vllm-project#32661)

This PR completes the removal of the deprecated vllm:time_per_output_token_seconds
metric that was deprecated in v0.11, hidden in v0.12, scheduled for removal in v0.13,
but delayed until v0.15.

Signed-off-by: carlory <baofa.fan@daocloud.io>
Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
Signed-off-by: mohammad najafi <mohammad.najafi@amd.com>
lapy pushed a commit to lapy/vllm that referenced this pull request Jan 27, 2026
…econds metric (vllm-project#32661)

This PR completes the removal of the deprecated vllm:time_per_output_token_seconds
metric that was deprecated in v0.11, hidden in v0.12, scheduled for removal in v0.13,
but delayed until v0.15.

Signed-off-by: carlory <baofa.fan@daocloud.io>
Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
@markmc markmc moved this from Done to Done - 0.15 in Metrics & Tracing Feb 4, 2026
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
…econds metric (vllm-project#32661)

This PR completes the removal of the deprecated vllm:time_per_output_token_seconds
metric that was deprecated in v0.11, hidden in v0.12, scheduled for removal in v0.13,
but delayed until v0.15.

Signed-off-by: carlory <baofa.fan@daocloud.io>
Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

Status: Done - 0.15

Development

Successfully merging this pull request may close these issues.

2 participants