[Misc][DP] Fix AsyncLLM metrics for multi-API server deployments#18053
Closed
kouroshHakha wants to merge 47 commits intovllm-project:mainfrom
Closed
[Misc][DP] Fix AsyncLLM metrics for multi-API server deployments#18053kouroshHakha wants to merge 47 commits intovllm-project:mainfrom
kouroshHakha wants to merge 47 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
…-engines Signed-off-by: Nick Hill <nhill@redhat.com> # Conflicts: # vllm/v1/engine/core_client.py # vllm/v1/utils.py
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com> # Conflicts: # vllm/config.py # vllm/engine/arg_utils.py # vllm/v1/engine/core.py # vllm/v1/engine/core_client.py
Signed-off-by: Nick Hill <nhill@redhat.com> # Conflicts: # vllm/v1/engine/core.py # vllm/v1/engine/core_client.py
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
…-engines Signed-off-by: Nick Hill <nhill@redhat.com> # Conflicts: # vllm/config.py # vllm/v1/engine/core.py
Signed-off-by: Nick Hill <nhill@redhat.com> # Conflicts: # vllm/v1/engine/core_client.py # vllm/v1/utils.py
…-engines Signed-off-by: Nick Hill <nhill@redhat.com> # Conflicts: # vllm/v1/engine/core.py # vllm/v1/engine/core_client.py
Signed-off-by: Nick Hill <nhill@redhat.com>
Avoid exception but still needs more work to be functional with multiple api server procs. Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com> # Conflicts: # vllm/v1/engine/core_client.py
…nto all-to-all Signed-off-by: Nick Hill <nhill@redhat.com> # Conflicts: # vllm/entrypoints/openai/api_server.py # vllm/v1/engine/core.py # vllm/v1/engine/core_client.py
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
# Conflicts: # vllm/v1/core/sched/scheduler.py
# Conflicts: # vllm/v1/engine/core.py
Member
|
Thanks for this @kouroshHakha ... I've posted a review on the corresponding PR into my branch here: njhill#6 (review) |
|
This pull request has merge conflicts that must be resolved before it can be |
eicherseiji
added a commit
to eicherseiji/vllm
that referenced
this pull request
May 15, 2025
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
eicherseiji
added a commit
to eicherseiji/vllm
that referenced
this pull request
May 15, 2025
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
… module Signed-off-by: kouroshhakha <kourosh@anyscale.com>
Collaborator
Author
|
already merged in #17546 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
This PR addresses one of the problem with #17546 regarding metrics inconsistency when num api_servers > 1 in a multi-api server setup. When running multiple API server instances with the V1 implementation, metrics were inconsistently collected and aggregated. This happened because:
PROMETHEUS_MULTIPROC_DIRenvironment variable needed for proper multi-process metrics collection.Solution
This PR ensures consistent metrics handling across multiple API servers in V1 by:
Setting up
PROMETHEUS_MULTIPROC_DIRenvironment variable in the AsyncLLM initialization, borrowing some of the existing tricks in V0:lora_request_info, etc.Result
Comparing
-asc=1and-asc=16and a counter metric like num_prompt_tokens over time on a fixed workload that has ~2M input tokens.Known issues that still need to be addressed (later)
The histogram of
vllm:iteration_tokens_totalwill not be accurate when asc > 1The current fundamental assumption is that by analyzing all the requests that came back from engines we can construct
IterationStatswhich includesnum_generation_tokens,num_preempted_tokens,num_prompt_tokens, etc.This assumption is not true anymore with multiple api_servers. With multiple api_server processes, each front-end will get a sub-batch of requests that came from the same engine step. Therefore the IterationStats constructed off of these requests will have a partial view. For example
num_generation_tokenswill not benum_generation_tokensper that iteration. It will be just part of it.Most of the metrics in IterationStats are fine, because they fall into two categories:
num_generation_tokensis logged in prometheus and is setup as a counter, it will be summed anyways.The histogram of
vllm:iteration_tokens_totaldoes not fall into either of these categories. Proof:You can also observe the diff on
vllm:iteration_tokens_totalon the same workload. Solving this at first glance is not straight forward, as I think it would need making scheduler logic more complex to just be able to keep track of some of these iteration level metrics. Since this metric is not that important it's not that urgent to solve this issue right now. The caveat is that, tomorrow if we add any new histogram with other metrics likenum_generation_tokensthey will have the same problem in asc > 1 case.NOTE to Reviewer
This PR is built on top of #17546 so that has to be merged first.