[Metrics] Complete removal of deprecated vllm:time_per_output_token_seconds metric#32661
Conversation
|
Documentation preview: https://vllm--32661.org.readthedocs.build/en/32661/ |
There was a problem hiding this comment.
Code Review
This pull request completes the removal of the deprecated vllm:time_per_output_token_seconds metric. The changes are comprehensive, covering code, tests, documentation, and dashboard configurations. The deprecated metric is consistently replaced with vllm:inter_token_latency_seconds. The removal of the old metric's definition and observation logic in vllm/v1/metrics/loggers.py is clean. The corresponding test updates in tests/entrypoints/instrumentator/test_metrics.py correctly reflect this removal. The updates to Grafana and Perses dashboards, as well as the documentation, are also correct. The changes are well-executed and I have no issues to report.
…econds metric This commit completes the removal of the deprecated metrics that were: - Deprecated in v0.11 (replaced by vllm:inter_token_latency_seconds) - Hidden in v0.12 (behind --show-hidden-metrics-for-version=0.11 flag) - Completely removed in v0.13 (this commit) Changes: 1. Removed deprecated histogram definition from PrometheusStatLogger 2. Updated test files to use replacement metric vllm:inter_token_latency_seconds 3. Updated Grafana dashboard with replacement metric (10 references) 4. Updated Perses dashboard with replacement metric (10 references) 5. Updated design documentation to reflect current metrics The replacement metric vllm:inter_token_latency_seconds has identical functionality and bucket definitions. The different metric vllm:request_time_per_output_token_seconds is preserved as it is still actively used. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com> Signed-off-by: carlory <baofa.fan@daocloud.io>
7d9f21f to
3c455d7
Compare
…econds metric (vllm-project#32661) This PR completes the removal of the deprecated vllm:time_per_output_token_seconds metric that was deprecated in v0.11, hidden in v0.12, scheduled for removal in v0.13, but delayed until v0.15. Signed-off-by: carlory <baofa.fan@daocloud.io> Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
…econds metric (vllm-project#32661) This PR completes the removal of the deprecated vllm:time_per_output_token_seconds metric that was deprecated in v0.11, hidden in v0.12, scheduled for removal in v0.13, but delayed until v0.15. Signed-off-by: carlory <baofa.fan@daocloud.io> Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
…econds metric (vllm-project#32661) This PR completes the removal of the deprecated vllm:time_per_output_token_seconds metric that was deprecated in v0.11, hidden in v0.12, scheduled for removal in v0.13, but delayed until v0.15. Signed-off-by: carlory <baofa.fan@daocloud.io> Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com> Signed-off-by: mohammad najafi <mohammad.najafi@amd.com>
…econds metric (vllm-project#32661) This PR completes the removal of the deprecated vllm:time_per_output_token_seconds metric that was deprecated in v0.11, hidden in v0.12, scheduled for removal in v0.13, but delayed until v0.15. Signed-off-by: carlory <baofa.fan@daocloud.io> Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
…econds metric (vllm-project#32661) This PR completes the removal of the deprecated vllm:time_per_output_token_seconds metric that was deprecated in v0.11, hidden in v0.12, scheduled for removal in v0.13, but delayed until v0.15. Signed-off-by: carlory <baofa.fan@daocloud.io> Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
Summary
This PR completes the removal of the deprecated
vllm:time_per_output_token_secondsmetric that was deprecated in v0.11, hidden in v0.12, and scheduled for removal in v0.13.Changes Made
1. Code Removal (vllm/v1/metrics/loggers.py)
2. Test Updates (tests/entrypoints/instrumentator/test_metrics.py)
vllm:inter_token_latency_seconds3. Dashboard Updates
vllm:inter_token_latency_secondsvllm:inter_token_latency_seconds4. Documentation
Test Validation
✅ Python syntax checks passed
✅ JSON validation passed
✅ YAML validation passed
✅ No deprecated metric references remain
✅ 34+ replacement metric references confirmed
Notes
vllm:request_time_per_output_token_seconds(different metric) preserved