-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[DP] Fix Prometheus Logging #21257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
vllm-bot
merged 53 commits into
vllm-project:main
from
robertgshaw2-redhat:fix-prom-logging-in-dp-case
Jul 21, 2025
Merged
[DP] Fix Prometheus Logging #21257
vllm-bot
merged 53 commits into
vllm-project:main
from
robertgshaw2-redhat:fix-prom-logging-in-dp-case
Jul 21, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Collaborator
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Robert Shaw <[email protected]>
1 task
4 tasks
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Member
|
Merging to unblock release |
4 tasks
Member
|
I will do a retroactive review :) |
eicherseiji
added a commit
to eicherseiji/vllm
that referenced
this pull request
Jul 22, 2025
Signed-off-by: Seiji Eicher <[email protected]>
x22x22
pushed a commit
to x22x22/vllm
that referenced
this pull request
Aug 5, 2025
Signed-off-by: Robert Shaw <[email protected]> Co-authored-by: Robert Shaw <[email protected]> Signed-off-by: x22x22 <[email protected]>
Pradyun92
pushed a commit
to Pradyun92/vllm
that referenced
this pull request
Aug 6, 2025
Signed-off-by: Robert Shaw <[email protected]> Co-authored-by: Robert Shaw <[email protected]>
npanpaliya
pushed a commit
to odh-on-pz/vllm-upstream
that referenced
this pull request
Aug 6, 2025
Signed-off-by: Robert Shaw <[email protected]> Co-authored-by: Robert Shaw <[email protected]>
jinzhen-lin
pushed a commit
to jinzhen-lin/vllm
that referenced
this pull request
Aug 9, 2025
Signed-off-by: Robert Shaw <[email protected]> Co-authored-by: Robert Shaw <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]>
paulpak58
pushed a commit
to paulpak58/vllm
that referenced
this pull request
Aug 13, 2025
Signed-off-by: Robert Shaw <[email protected]> Co-authored-by: Robert Shaw <[email protected]> Signed-off-by: Paul Pak <[email protected]>
diegocastanibm
pushed a commit
to diegocastanibm/vllm
that referenced
this pull request
Aug 15, 2025
Signed-off-by: Robert Shaw <[email protected]> Co-authored-by: Robert Shaw <[email protected]> Signed-off-by: Diego-Castan <[email protected]>
epwalsh
pushed a commit
to epwalsh/vllm
that referenced
this pull request
Aug 27, 2025
Signed-off-by: Robert Shaw <[email protected]> Co-authored-by: Robert Shaw <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.Purpose
PrometheusStatLoggerfor eachEngineCore. This appears okay on the surface, but what was happening is that only the finalEngineCorewould be able to log stats since we reset the Prometheus state in each constructorunregister_vllm_metricsunregister_vllm_metricsdoes not work, because we can only * create * the metrics once. We just want to have multiple labels for the same metric not multiple metricsStatLoggerManangerto deal with this:PrometheusStatLoggerto enable logging from multiple engine coresAsyncLLMonly logs the metrics of the EngineCores that it is directly managingFollow up:
Test Plan
existing CI
justfile
Test Result
Sample:
just dp_a_internal_lb 8100 just dp_b_internal_lb 8100 just eval 8100 just metrics 8100vllm:kv_cache_usage_perc{engine="0",model_name="Qwen/Qwen3-30B-A3B-FP8"} 2.6821875921956284e-05 vllm:kv_cache_usage_perc{engine="1",model_name="Qwen/Qwen3-30B-A3B-FP8"} 2.686799752815716e-05 vllm:kv_cache_usage_perc{engine="2",model_name="Qwen/Qwen3-30B-A3B-FP8"} 2.686799752815716e-05 vllm:kv_cache_usage_perc{engine="3",model_name="Qwen/Qwen3-30B-A3B-FP8"} 2.663896214605277e-05 INFO 07-20 18:10:58 [loggers.py:122] Engine 000: Avg prompt throughput: 3064.0 tokens/s, Avg generation throughput: 359.0 tokens/s, Running: 26 reqs, Waiting: 0 reqs, GPU KV cache usage: 5.3%, Prefix cache hit rate: 0.0% INFO 07-20 18:10:58 [loggers.py:122] Engine 001: Avg prompt throughput: 2508.4 tokens/s, Avg generation throughput: 353.7 tokens/s, Running: 25 reqs, Waiting: 0 reqs, GPU KV cache usage: 5.4%, Prefix cache hit rate: 0.0% INFO 07-20 18:10:58 [loggers.py:122] Engine 002: Avg prompt throughput: 1962.9 tokens/s, Avg generation throughput: 353.5 tokens/s, Running: 24 reqs, Waiting: 0 reqs, GPU KV cache usage: 5.3%, Prefix cache hit rate: 0.0% INFO 07-20 18:10:58 [loggers.py:122] Engine 003: Avg prompt throughput: 2619.2 tokens/s, Avg generation throughput: 354.6 tokens/s, Running: 25 reqs, Waiting: 0 reqs, GPU KV cache usage: 5.2%, Prefix cache hit rate: 0.6%rank 0:
vllm:request_success_total{engine="0",finished_reason="stop",model_name="Qwen/Qwen3-30B-A3B-FP8"} 88.0 vllm:request_success_total{engine="0",finished_reason="length",model_name="Qwen/Qwen3-30B-A3B-FP8"} 12.0 vllm:request_success_total{engine="0",finished_reason="abort",model_name="Qwen/Qwen3-30B-A3B-FP8"} 0.0 INFO 07-20 18:15:37 [loggers.py:122] Engine 000: Avg prompt throughput: 10130.6 tokens/s, Avg generation throughput: 506.1 tokens/s, Running: 99 reqs, Waiting: 0 reqs, GPU KV cache usage: 20.0%, Prefix cache hit rate: 0.0%rank 1:
vllm:request_success_total{engine="1",finished_reason="stop",model_name="Qwen/Qwen3-30B-A3B-FP8"} 88.0 vllm:request_success_total{engine="1",finished_reason="length",model_name="Qwen/Qwen3-30B-A3B-FP8"} 12.0 vllm:request_success_total{engine="1",finished_reason="abort",model_name="Qwen/Qwen3-30B-A3B-FP8"} 0.0 INFO 07-20 18:15:47 [loggers.py:122] Engine 001: Avg prompt throughput: 10129.2 tokens/s, Avg generation throughput: 894.6 tokens/s, Running: 69 reqs, Waiting: 0 reqs, GPU KV cache usage: 14.4%, Prefix cache hit rate: 0.0%(Optional) Documentation Update