[Misc] Remove deprecated metric vllm:time_per_output_token_seconds for v0.13 release by jliu9515 · Pull Request #30992 · vllm-project/vllm

jliu9515 · 2025-12-18T23:18:30Z

This metric was deprecated in v0.11 and renamed to vllm:inter_token_latency_seconds. The TODO comment indicated it should be removed in v0.13.0.

Follows up on #30396 which removed other deprecated items for v0.13.

Purpose

This metric was deprecated in v0.11 and renamed to vllm:inter_token_latency_seconds
The TODO comment indicated it should be removed in v0.13.0
Follows up on [Deprecation] Remove deprecated plugin and compilation fields for v0.13 release #30396 which removed other deprecated items for v0.13

Test Plan

Pre-commit checks passed (ruff, mypy, typos, etc.)

Test Result

Related tests passed: pytest tests/v1/metrics/ (39/42 pass, 3 failures due to GPU OOM, unrelated to this change)

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request correctly removes the deprecated metric vllm:time_per_output_token_seconds as planned for the v0.13 release. The changes are confined to vllm/v1/metrics/loggers.py, where both the metric's definition and its usage within PrometheusStatLogger are cleanly removed. The removal is consistent with the deprecation notice and the provided context. The code is now cleaner and free of this obsolete metric. The changes are well-contained and look good to merge.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2025-12-18T23:22:58Z

vllm/v1/metrics/loggers.py

            histogram_time_to_first_token, engine_indexes, model_name
        )

-        # Deprecated in 0.11 - Renamed as vllm:inter_token_latency_seconds
-        # With 0.12.x you can enable with --show-hidden-metrics-for-version=0.11
-        # TODO: remove in 0.13.0
-        if self.show_hidden_metrics:
-            histogram_time_per_output_token = self._histogram_cls(
-                name="vllm:time_per_output_token_seconds",
-                documentation=(
-                    "Histogram of time per output token in seconds."
-                    "DEPRECATED: Use vllm:inter_token_latency_seconds instead."
-                ),
-                buckets=[
-                    0.01,
-                    0.025,
-                    0.05,
-                    0.075,
-                    0.1,
-                    0.15,
-                    0.2,
-                    0.3,
-                    0.4,
-                    0.5,
-                    0.75,
-                    1.0,
-                    2.5,
-                    5.0,
-                    7.5,
-                    10.0,
-                    20.0,
-                    40.0,
-                    80.0,
-                ],
-                labelnames=labelnames,
-            )
-            self.histogram_time_per_output_token = make_per_engine(
-                histogram_time_per_output_token, engine_indexes, model_name
-            )
-
        histogram_inter_token_latency = self._histogram_cls(
            name="vllm:inter_token_latency_seconds",
            documentation="Histogram of inter-token latency in seconds.",


Remove hidden metric but keep tests expecting it

This block used to build vllm:time_per_output_token_seconds when --show-hidden-metrics-for-version was enabled; with the removal, the metric is no longer exported even when show_hidden_metrics is true. tests/entrypoints/instrumentator/test_metrics.py still lists vllm:time_per_output_token_seconds_* in HIDDEN_DEPRECATED_METRICS and asserts they appear when the server is started with the hidden-metrics flag, so those test cases (and any users still relying on the migration flag for that alias) now fail because the metric is absent. Please drop the expectation or leave the metric available under the flag.

Useful? React with 👍 / 👎.

…r v0.13 release This metric was deprecated in v0.11 and renamed to vllm:inter_token_latency_seconds. The TODO comment indicated it should be removed in v0.13.0. Follows up on vllm-project#30396 which removed other deprecated items for v0.13. Signed-off-by: Jack Liu <jacklau9515@gmail.com>

markmc · 2026-01-20T10:49:59Z

Thank you, #32661 duplicates this but looks more comprehensive (e.g. updates dashboards)

jliu9515 requested a review from markmc as a code owner December 18, 2025 23:18

gemini-code-assist bot reviewed Dec 18, 2025

View reviewed changes

mergify bot added the v1 label Dec 18, 2025

chatgpt-codex-connector bot reviewed Dec 18, 2025

View reviewed changes

jliu9515 force-pushed the cleanup/remove-deprecated-time-per-output-token-metric branch from ba9359e to 8985cc6 Compare December 18, 2025 23:23

markmc added this to Metrics & Tracing Dec 19, 2025

github-project-automation bot moved this to Backlog in Metrics & Tracing Dec 19, 2025

markmc moved this from Backlog to P1 in Metrics & Tracing Dec 19, 2025

LucasWilkinson assigned markmc Dec 19, 2025

markmc mentioned this pull request Jan 20, 2026

[Metrics] Complete removal of deprecated vllm:time_per_output_token_seconds metric #32661

Merged

markmc closed this Jan 20, 2026

markmc moved this from P1 to Not planned in Metrics & Tracing Jan 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Misc] Remove deprecated metric vllm:time_per_output_token_seconds for v0.13 release#30992

[Misc] Remove deprecated metric vllm:time_per_output_token_seconds for v0.13 release#30992
jliu9515 wants to merge 1 commit intovllm-project:mainfrom
jliu9515:cleanup/remove-deprecated-time-per-output-token-metric

jliu9515 commented Dec 18, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Dec 18, 2025

Uh oh!

markmc commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jliu9515 commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

markmc commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jliu9515 commented Dec 18, 2025 •

edited

Loading