Skip to content

[Misc][DP] Fix AsyncLLM metrics for multi-API server deployments#18053

Closed
kouroshHakha wants to merge 47 commits intovllm-project:mainfrom
kouroshHakha:kh/fix-a2a-metrics
Closed

[Misc][DP] Fix AsyncLLM metrics for multi-API server deployments#18053
kouroshHakha wants to merge 47 commits intovllm-project:mainfrom
kouroshHakha:kh/fix-a2a-metrics

Conversation

@kouroshHakha
Copy link
Copy Markdown
Collaborator

@kouroshHakha kouroshHakha commented May 13, 2025

Problem

This PR addresses one of the problem with #17546 regarding metrics inconsistency when num api_servers > 1 in a multi-api server setup. When running multiple API server instances with the V1 implementation, metrics were inconsistently collected and aggregated. This happened because:

  1. In V1, the AsyncLLM wasn't setting up the PROMETHEUS_MULTIPROC_DIR environment variable needed for proper multi-process metrics collection.
  2. The multiprocess modes for different metric types (gauges, counters, histograms) were incorrectly configured, causing metrics to be double-counted or improperly aggregated.
  3. Process cleanup code was missing, preventing metrics from being marked as dead when processes exit

Solution

This PR ensures consistent metrics handling across multiple API servers in V1 by:
Setting up PROMETHEUS_MULTIPROC_DIR environment variable in the AsyncLLM initialization, borrowing some of the existing tricks in V0:

  1. Updating the multiprocess_mode settings in metric loggers to match V0 implementation for some gauges like lora_request_info, etc.
  2. Set "mostrecent" as default mode for gauges
  3. Added proper process cleanup code to ensure metrics are correctly handled when processes exit

Result

Comparing -asc=1 and -asc=16 and a counter metric like num_prompt_tokens over time on a fixed workload that has ~2M input tokens.

Screenshot 2025-05-14 at 8 47 17 PM

Known issues that still need to be addressed (later)

The histogram of vllm:iteration_tokens_total will not be accurate when asc > 1

The current fundamental assumption is that by analyzing all the requests that came back from engines we can construct IterationStats which includes num_generation_tokens, num_preempted_tokens, num_prompt_tokens, etc.

This assumption is not true anymore with multiple api_servers. With multiple api_server processes, each front-end will get a sub-batch of requests that came from the same engine step. Therefore the IterationStats constructed off of these requests will have a partial view. For example num_generation_tokens will not be num_generation_tokens per that iteration. It will be just part of it.

Most of the metrics in IterationStats are fine, because they fall into two categories:

  1. They are invariant to the notion of actual engine iteration. For example things like ttft, etc won't be affected
  2. They are counter. If num_generation_tokens is logged in prometheus and is setup as a counter, it will be summed anyways.

The histogram of vllm:iteration_tokens_total does not fall into either of these categories. Proof:

Screenshot 2025-05-14 at 9 38 52 PM

You can also observe the diff on vllm:iteration_tokens_total on the same workload. Solving this at first glance is not straight forward, as I think it would need making scheduler logic more complex to just be able to keep track of some of these iteration level metrics. Since this metric is not that important it's not that urgent to solve this issue right now. The caveat is that, tomorrow if we add any new histogram with other metrics like num_generation_tokens they will have the same problem in asc > 1 case.

NOTE to Reviewer

This PR is built on top of #17546 so that has to be merged first.

njhill added 30 commits April 4, 2025 17:04
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
…-engines

Signed-off-by: Nick Hill <nhill@redhat.com>

# Conflicts:
#	vllm/v1/engine/core_client.py
#	vllm/v1/utils.py
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>

# Conflicts:
#	vllm/config.py
#	vllm/engine/arg_utils.py
#	vllm/v1/engine/core.py
#	vllm/v1/engine/core_client.py
Signed-off-by: Nick Hill <nhill@redhat.com>

# Conflicts:
#	vllm/v1/engine/core.py
#	vllm/v1/engine/core_client.py
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
…-engines

Signed-off-by: Nick Hill <nhill@redhat.com>

# Conflicts:
#	vllm/config.py
#	vllm/v1/engine/core.py
Signed-off-by: Nick Hill <nhill@redhat.com>

# Conflicts:
#	vllm/v1/engine/core_client.py
#	vllm/v1/utils.py
…-engines

Signed-off-by: Nick Hill <nhill@redhat.com>

# Conflicts:
#	vllm/v1/engine/core.py
#	vllm/v1/engine/core_client.py
Signed-off-by: Nick Hill <nhill@redhat.com>
Avoid exception but still needs more work to be functional with multiple api server procs.

Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>

# Conflicts:
#	vllm/v1/engine/core_client.py
…nto all-to-all

Signed-off-by: Nick Hill <nhill@redhat.com>

# Conflicts:
#	vllm/entrypoints/openai/api_server.py
#	vllm/v1/engine/core.py
#	vllm/v1/engine/core_client.py
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
# Conflicts:
#	vllm/v1/core/sched/scheduler.py
Signed-off-by: kouroshhakha <kourosh@anyscale.com>
@njhill
Copy link
Copy Markdown
Member

njhill commented May 14, 2025

Thanks for this @kouroshHakha ... I've posted a review on the corresponding PR into my branch here: njhill#6 (review)

@mergify
Copy link
Copy Markdown

mergify bot commented May 15, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @kouroshHakha.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label May 15, 2025
Signed-off-by: kouroshhakha <kourosh@anyscale.com>
eicherseiji added a commit to eicherseiji/vllm that referenced this pull request May 15, 2025
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
eicherseiji added a commit to eicherseiji/vllm that referenced this pull request May 15, 2025
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
… module

Signed-off-by: kouroshhakha <kourosh@anyscale.com>
Signed-off-by: kouroshhakha <kourosh@anyscale.com>
Signed-off-by: kouroshhakha <kourosh@anyscale.com>
Signed-off-by: kouroshhakha <kourosh@anyscale.com>
Signed-off-by: kouroshhakha <kourosh@anyscale.com>
Signed-off-by: kouroshhakha <kourosh@anyscale.com>
Signed-off-by: kouroshhakha <kourosh@anyscale.com>
Signed-off-by: kouroshhakha <kourosh@anyscale.com>
Signed-off-by: kouroshhakha <kourosh@anyscale.com>
@kouroshHakha
Copy link
Copy Markdown
Collaborator Author

already merged in #17546

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants