Skip to content

[V1][Metrics] Deprecate metrics with gpu_ prefix for non GPU specific metrics.#18354

Merged
DarkLight1337 merged 8 commits intovllm-project:mainfrom
krai:remove_gpu_prefix
Jun 14, 2025
Merged

[V1][Metrics] Deprecate metrics with gpu_ prefix for non GPU specific metrics.#18354
DarkLight1337 merged 8 commits intovllm-project:mainfrom
krai:remove_gpu_prefix

Conversation

@sahelib25
Copy link
Copy Markdown
Contributor

@sahelib25 sahelib25 commented May 19, 2025

This PR deprecates metrics with gpu_ prefix for existing non-GPU specific metrics-

  • gpu_cache_usage
  • gpu_prefix_cache_queries
  • gpu_prefix_cache_hits

and, introduce new metrics after renaming-

  • kv_cache_usage
  • prefix_cache_queries
  • prefix_cache_hits

@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the v1 label May 19, 2025
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renaming existing metrics will break anyone relying on this. Right way to do this would be to add new metrics with the new name and deprecate the old ones so we give enough notice before removing them in a future release. I'm not sure of the exact deprecation policy with vLLM, but it would be good to follow the policy here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @achandrasekar , makes sense!
Referring to the metrics deprecation policy here,

Note: when metrics are deprecated in version X.Y, they are hidden in version X.Y+1 but can be re-enabled using the --show-hidden-metrics-for-version=X.Y escape hatch, and are then removed in version X.Y+2.

I have declared gpu_prefix_cache_queries and gpu_prefix_cache_hits as deprecated, and introduced the new ones. Could you please take a look at it?

It looks like we need to create separate pull requests for hiding and removing the metrics, following this one gets merged?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a plan to rename gpu_cache_usage too?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @achandrasekar ,
it looks like gpu_cache_usage is calculated in BlockPool.get_usage() as 1.0 - (self.get_num_free_blocks() / self.num_gpu_blocks), it could be a GPU specific metric?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is applicable for TPUs too. We should probably name this kv_cache_usage instead of gpu_cache_usage.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the script, please have a look. Thanks!

Add metrics prefix_cache_queries and prefix_cache_hits

Signed-off-by: Saheli Bhattacharjee <saheli@krai.ai>
@sahelib25 sahelib25 force-pushed the remove_gpu_prefix branch from aba4fa0 to 50e4828 Compare May 22, 2025 11:37
@sahelib25 sahelib25 changed the title [V1][Metrics] Remove gpu_ prefix from non GPU specific metrics. [V1][Metrics] Deprecate metrics with gpu_ prefix from non GPU specific metrics. May 22, 2025
@sahelib25 sahelib25 changed the title [V1][Metrics] Deprecate metrics with gpu_ prefix from non GPU specific metrics. [V1][Metrics] Deprecate metrics with gpu_ prefix for non GPU specific metrics. May 22, 2025
Signed-off-by: Saheli Bhattacharjee <saheli@krai.ai>
Copy link
Copy Markdown
Member

@markmc markmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for following the deprecation policy, lgtm

@mergify
Copy link
Copy Markdown

mergify bot commented May 30, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @sahelib25.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label May 30, 2025
@mergify mergify bot removed the needs-rebase label May 30, 2025
Signed-off-by: Saheli Bhattacharjee <saheli@krai.ai>
@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 13, 2025
@DarkLight1337 DarkLight1337 merged commit d1e34cc into vllm-project:main Jun 14, 2025
70 checks passed
@psyhtest psyhtest deleted the remove_gpu_prefix branch August 8, 2025 10:48
markmc added a commit to markmc/vllm that referenced this pull request Sep 9, 2025
In vllm-project#18354, these metrics were deprecated, and the change
was included in the v0.9.2 release.

We probably should only deprecate things in a v0.N.0 minor
release, so let's say these were deprecated in v0.10.0.

According to https://docs.vllm.ai/en/latest/usage/metrics.html:

> Note: when metrics are deprecated in version X.Y, they are hidden in
>  version X.Y+1 but can be re-enabled using the
>  --show-hidden-metrics-for-version=X.Y escape hatch, and are then
>  removed in version X.Y+2.

The deprecated metrics should be hidden in the v0.11.0 release,
but with a --show-hidden-metrics-for-version=0.10 escape hatch.

They should then be removed in the v0.12.0 release.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
markmc added a commit to markmc/vllm that referenced this pull request Nov 24, 2025
The following are due for removal:

- `vllm:gpu_cache_usage_perc`
- `vllm:gpu_prefix_cache_queries`
- `vllm:gpu_prefix_cache_hits`

See vllm-project#18354

And the following is due to be hidden:

- `vllm:time_per_output_token_seconds`

See vllm-project#24110

The deprecation policy is documented [here](https://docs.vllm.ai/en/latest/usage/metrics/)

> when metrics are deprecated in version X.Y, they are
> hidden in version X.Y+1 but can be re-enabled using
> the --show-hidden-metrics-for-version=X.Y escape hatch,
> and are then removed in version X.Y+2.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants