[Misc] Remove deprecated metric vllm:time_per_output_token_seconds for v0.13 release#30992
Conversation
There was a problem hiding this comment.
Code Review
This pull request correctly removes the deprecated metric vllm:time_per_output_token_seconds as planned for the v0.13 release. The changes are confined to vllm/v1/metrics/loggers.py, where both the metric's definition and its usage within PrometheusStatLogger are cleanly removed. The removal is consistent with the deprecation notice and the provided context. The code is now cleaner and free of this obsolete metric. The changes are well-contained and look good to merge.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| histogram_time_to_first_token, engine_indexes, model_name | ||
| ) | ||
|
|
||
| # Deprecated in 0.11 - Renamed as vllm:inter_token_latency_seconds | ||
| # With 0.12.x you can enable with --show-hidden-metrics-for-version=0.11 | ||
| # TODO: remove in 0.13.0 | ||
| if self.show_hidden_metrics: | ||
| histogram_time_per_output_token = self._histogram_cls( | ||
| name="vllm:time_per_output_token_seconds", | ||
| documentation=( | ||
| "Histogram of time per output token in seconds." | ||
| "DEPRECATED: Use vllm:inter_token_latency_seconds instead." | ||
| ), | ||
| buckets=[ | ||
| 0.01, | ||
| 0.025, | ||
| 0.05, | ||
| 0.075, | ||
| 0.1, | ||
| 0.15, | ||
| 0.2, | ||
| 0.3, | ||
| 0.4, | ||
| 0.5, | ||
| 0.75, | ||
| 1.0, | ||
| 2.5, | ||
| 5.0, | ||
| 7.5, | ||
| 10.0, | ||
| 20.0, | ||
| 40.0, | ||
| 80.0, | ||
| ], | ||
| labelnames=labelnames, | ||
| ) | ||
| self.histogram_time_per_output_token = make_per_engine( | ||
| histogram_time_per_output_token, engine_indexes, model_name | ||
| ) | ||
|
|
||
| histogram_inter_token_latency = self._histogram_cls( | ||
| name="vllm:inter_token_latency_seconds", | ||
| documentation="Histogram of inter-token latency in seconds.", |
There was a problem hiding this comment.
Remove hidden metric but keep tests expecting it
This block used to build vllm:time_per_output_token_seconds when --show-hidden-metrics-for-version was enabled; with the removal, the metric is no longer exported even when show_hidden_metrics is true. tests/entrypoints/instrumentator/test_metrics.py still lists vllm:time_per_output_token_seconds_* in HIDDEN_DEPRECATED_METRICS and asserts they appear when the server is started with the hidden-metrics flag, so those test cases (and any users still relying on the migration flag for that alias) now fail because the metric is absent. Please drop the expectation or leave the metric available under the flag.
Useful? React with 👍 / 👎.
…r v0.13 release This metric was deprecated in v0.11 and renamed to vllm:inter_token_latency_seconds. The TODO comment indicated it should be removed in v0.13.0. Follows up on vllm-project#30396 which removed other deprecated items for v0.13. Signed-off-by: Jack Liu <jacklau9515@gmail.com>
ba9359e to
8985cc6
Compare
|
Thank you, #32661 duplicates this but looks more comprehensive (e.g. updates dashboards) |
This metric was deprecated in v0.11 and renamed to vllm:inter_token_latency_seconds. The TODO comment indicated it should be removed in v0.13.0.
Follows up on #30396 which removed other deprecated items for v0.13.
Purpose
vllm:inter_token_latency_secondsTest Plan
Test Result
pytest tests/v1/metrics/(39/42 pass, 3 failures due to GPU OOM, unrelated to this change)Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.