Skip to content

[Misc] Remove deprecated metric vllm:time_per_output_token_seconds for v0.13 release#30992

Closed
jliu9515 wants to merge 1 commit intovllm-project:mainfrom
jliu9515:cleanup/remove-deprecated-time-per-output-token-metric
Closed

[Misc] Remove deprecated metric vllm:time_per_output_token_seconds for v0.13 release#30992
jliu9515 wants to merge 1 commit intovllm-project:mainfrom
jliu9515:cleanup/remove-deprecated-time-per-output-token-metric

Conversation

@jliu9515
Copy link
Copy Markdown

@jliu9515 jliu9515 commented Dec 18, 2025

This metric was deprecated in v0.11 and renamed to vllm:inter_token_latency_seconds. The TODO comment indicated it should be removed in v0.13.0.

Follows up on #30396 which removed other deprecated items for v0.13.

Purpose

Test Plan

  • Pre-commit checks passed (ruff, mypy, typos, etc.)

Test Result

  • Related tests passed: pytest tests/v1/metrics/ (39/42 pass, 3 failures due to GPU OOM, unrelated to this change)

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@jliu9515 jliu9515 requested a review from markmc as a code owner December 18, 2025 23:18
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly removes the deprecated metric vllm:time_per_output_token_seconds as planned for the v0.13 release. The changes are confined to vllm/v1/metrics/loggers.py, where both the metric's definition and its usage within PrometheusStatLogger are cleanly removed. The removal is consistent with the deprecation notice and the provided context. The code is now cleaner and free of this obsolete metric. The changes are well-contained and look good to merge.

@mergify mergify bot added the v1 label Dec 18, 2025
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 715 to 720
histogram_time_to_first_token, engine_indexes, model_name
)

# Deprecated in 0.11 - Renamed as vllm:inter_token_latency_seconds
# With 0.12.x you can enable with --show-hidden-metrics-for-version=0.11
# TODO: remove in 0.13.0
if self.show_hidden_metrics:
histogram_time_per_output_token = self._histogram_cls(
name="vllm:time_per_output_token_seconds",
documentation=(
"Histogram of time per output token in seconds."
"DEPRECATED: Use vllm:inter_token_latency_seconds instead."
),
buckets=[
0.01,
0.025,
0.05,
0.075,
0.1,
0.15,
0.2,
0.3,
0.4,
0.5,
0.75,
1.0,
2.5,
5.0,
7.5,
10.0,
20.0,
40.0,
80.0,
],
labelnames=labelnames,
)
self.histogram_time_per_output_token = make_per_engine(
histogram_time_per_output_token, engine_indexes, model_name
)

histogram_inter_token_latency = self._histogram_cls(
name="vllm:inter_token_latency_seconds",
documentation="Histogram of inter-token latency in seconds.",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove hidden metric but keep tests expecting it

This block used to build vllm:time_per_output_token_seconds when --show-hidden-metrics-for-version was enabled; with the removal, the metric is no longer exported even when show_hidden_metrics is true. tests/entrypoints/instrumentator/test_metrics.py still lists vllm:time_per_output_token_seconds_* in HIDDEN_DEPRECATED_METRICS and asserts they appear when the server is started with the hidden-metrics flag, so those test cases (and any users still relying on the migration flag for that alias) now fail because the metric is absent. Please drop the expectation or leave the metric available under the flag.

Useful? React with 👍 / 👎.

…r v0.13 release

This metric was deprecated in v0.11 and renamed to vllm:inter_token_latency_seconds.
The TODO comment indicated it should be removed in v0.13.0.

Follows up on vllm-project#30396 which removed other deprecated items for v0.13.

Signed-off-by: Jack Liu <jacklau9515@gmail.com>
@markmc
Copy link
Copy Markdown
Member

markmc commented Jan 20, 2026

Thank you, #32661 duplicates this but looks more comprehensive (e.g. updates dashboards)

@markmc markmc closed this Jan 20, 2026
@markmc markmc moved this from P1 to Not planned in Metrics & Tracing Jan 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Not planned

Development

Successfully merging this pull request may close these issues.

2 participants