[V1][Metrics] Deprecate metrics with gpu_ prefix for non GPU specific metrics.#18354
[V1][Metrics] Deprecate metrics with gpu_ prefix for non GPU specific metrics.#18354DarkLight1337 merged 8 commits intovllm-project:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
vllm/v1/metrics/loggers.py
Outdated
There was a problem hiding this comment.
Renaming existing metrics will break anyone relying on this. Right way to do this would be to add new metrics with the new name and deprecate the old ones so we give enough notice before removing them in a future release. I'm not sure of the exact deprecation policy with vLLM, but it would be good to follow the policy here.
There was a problem hiding this comment.
Thanks @achandrasekar , makes sense!
Referring to the metrics deprecation policy here,
Note: when metrics are deprecated in version X.Y, they are hidden in version X.Y+1 but can be re-enabled using the --show-hidden-metrics-for-version=X.Y escape hatch, and are then removed in version X.Y+2.
I have declared gpu_prefix_cache_queries and gpu_prefix_cache_hits as deprecated, and introduced the new ones. Could you please take a look at it?
It looks like we need to create separate pull requests for hiding and removing the metrics, following this one gets merged?
vllm/v1/metrics/loggers.py
Outdated
There was a problem hiding this comment.
Is there a plan to rename gpu_cache_usage too?
There was a problem hiding this comment.
Hi @achandrasekar ,
it looks like gpu_cache_usage is calculated in BlockPool.get_usage() as 1.0 - (self.get_num_free_blocks() / self.num_gpu_blocks), it could be a GPU specific metric?
There was a problem hiding this comment.
This is applicable for TPUs too. We should probably name this kv_cache_usage instead of gpu_cache_usage.
There was a problem hiding this comment.
I've updated the script, please have a look. Thanks!
Add metrics prefix_cache_queries and prefix_cache_hits Signed-off-by: Saheli Bhattacharjee <saheli@krai.ai>
aba4fa0 to
50e4828
Compare
Signed-off-by: Saheli Bhattacharjee <saheli@krai.ai>
markmc
left a comment
There was a problem hiding this comment.
Thank you for following the deprecation policy, lgtm
Signed-off-by: Saheli Bhattacharjee <saheli@krai.ai>
Signed-off-by: Saheli Bhattacharjee <saheli@krai.ai>
|
This pull request has merge conflicts that must be resolved before it can be |
In vllm-project#18354, these metrics were deprecated, and the change was included in the v0.9.2 release. We probably should only deprecate things in a v0.N.0 minor release, so let's say these were deprecated in v0.10.0. According to https://docs.vllm.ai/en/latest/usage/metrics.html: > Note: when metrics are deprecated in version X.Y, they are hidden in > version X.Y+1 but can be re-enabled using the > --show-hidden-metrics-for-version=X.Y escape hatch, and are then > removed in version X.Y+2. The deprecated metrics should be hidden in the v0.11.0 release, but with a --show-hidden-metrics-for-version=0.10 escape hatch. They should then be removed in the v0.12.0 release. Signed-off-by: Mark McLoughlin <markmc@redhat.com>
The following are due for removal: - `vllm:gpu_cache_usage_perc` - `vllm:gpu_prefix_cache_queries` - `vllm:gpu_prefix_cache_hits` See vllm-project#18354 And the following is due to be hidden: - `vllm:time_per_output_token_seconds` See vllm-project#24110 The deprecation policy is documented [here](https://docs.vllm.ai/en/latest/usage/metrics/) > when metrics are deprecated in version X.Y, they are > hidden in version X.Y+1 but can be re-enabled using > the --show-hidden-metrics-for-version=X.Y escape hatch, > and are then removed in version X.Y+2. Signed-off-by: Mark McLoughlin <markmc@redhat.com>
This PR deprecates metrics with
gpu_prefix for existing non-GPU specific metrics-gpu_cache_usagegpu_prefix_cache_queriesgpu_prefix_cache_hitsand, introduce new metrics after renaming-
kv_cache_usageprefix_cache_queriesprefix_cache_hits