[benchmark] Make request IDs unique across clients by default by eicherseiji · Pull Request #27723 · vllm-project/vllm

eicherseiji · 2025-10-29T06:31:29Z

Purpose

Resolve [Bug][0.11.1rc3]: Engine crash with multiple API servers + multiple vllm bench serve clients #27711
RCA: scaled-out API servers don't coordinate request_states, leading to engine receiving non-unique request IDs (and bad downstream effects) instead of raising ValueError that a given request ID is already running. I think the standard ValueError is not seen with api-server-count 1 due to queuing.
Fix: Ensure vllm bench serve sends unique request IDs when clients run in parallel

Test Plan

MODEL_PATH="Qwen/Qwen2.5-0.5B-Instruct"
MODEL_ID="qw-0.5B"
vllm serve "$MODEL_PATH" \
  --served-model-name "$MODEL_ID" \
  --api-server-count 2

Executed twice, simultaneously:

MODEL_ID="qw-0.5B"
MODEL_PATH="Qwen/Qwen2.5-0.5B-Instruct"

    vllm bench serve \
        --model "$MODEL_PATH" \
        --served-model-name "$MODEL_ID" \

Test Result

No engine crash due to duplicate request IDs is seen

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

gemini-code-assist

Code Review

This pull request effectively addresses a crash issue caused by duplicate request IDs when running benchmark clients in parallel. The fix, which prepends the process ID to the request ID, ensures uniqueness for clients on a single machine. I've provided one suggestion to make this even more robust by using a random prefix, which would guarantee uniqueness in distributed scenarios across multiple machines.

vllm/benchmarks/lib/endpoint_request_func.py

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

vllm/v1/core/sched/scheduler.py

markmc · 2025-10-29T13:29:05Z

Thank you for the PR

Indeed, it looks like when we added X-Request-Id: benchmark-serving-{request index} to requests from vllm serve bench in #23065 (v0.10.2)

parser.add_argument(
        "--request-id-prefix",
        type=str,
        required=False,
        default="benchmark-serving",
        help="Specify the prefix of request id.",
    )
    
ind = 0
for item in self.data:
    sampled_requests.append(
                SampleRequest(
                    ...
                    request_id=request_id_prefix + str(ind),
                )
            )
    ind += 1

See this comment in #26929 for where we saw this issue recently in P/D setups. The reporter confirmed on Slack that they were using vllm serve bench

the workload generator here was vllm bench serve launched in multiple pods near simultaneously

I think the simplest solution here is to just make --request-id-prefix to something unique, for example:

default=f"bench-{uuid.uuid4().hex[:8]}-"

markmc · 2025-10-29T13:30:28Z

vllm/v1/core/sched/scheduler.py

    def add_request(self, request: Request) -> None:
+        request_id = request.request_id
+        if request_id in self.requests:
+            raise ValueError(f"Request id {request_id} already exists.")


This is not safe - it will cause the engine to exit

Ideally we would just return an error for this single request, but that is not a trivial change

Let's remove the scheduler part from this PR and just fix the request IDs used by vllm serve bench

I think either way, the engine will crash due to duplicate IDs. But point taken, removed in favor of backing out/failing the request the right way.

markmc · 2025-10-29T13:30:55Z

xref #27189

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

eicherseiji · 2025-10-29T14:11:39Z

Thanks for the review @markmc! Updated the PR with suggestion.

kouroshHakha · 2025-10-29T14:56:51Z

vllm/benchmarks/serve.py

        type=str,
        required=False,
-        default="benchmark-serving",
+        default=f"bench-{uuid.uuid4().hex[:8]}-",


Could we keep this and simply add the uuid to the prefix?

So request_id will be

<prefix>-<uuid>-<cnt>

Do you mean add the uuid in main() so that we will also add it to any user-supplied prefix?

@kouroshHakha I'm not sure this works, because then users won't have full control over their prefix. I.e. a user must edit the code to get around us adding a uuid to their chosen prefix.

not a hard limit on this pr obv. but I was thinking we would need to guarantee uniquness of the request-ids even if the end user wants to override the prefix. that way uuid is part of the auto appended suffix.

…roject#27723) Signed-off-by: Seiji Eicher <seiji@anyscale.com>

Since vllm-project#9550 and vllm-project#10968 we support client's supplying a custom request ID. The motivation for this is that it can be very helpful when you need to correlate vLLM logs with logs of a related service. Since the request ID is used ubiquitously across vLLM as a unique key, it obviously is problematic if we ever have multiple in-flight requests using the same client-provided request ID. We saw this happening recently when `vllm serve bench` started including a request ID and the request IDs from multiple concurrent instances caused collisions. See vllm-project#27723 We try to guard against request ID collisions currently in the frontend in OutputProcessor: ``` def add_request(...): if request_id in self.request_states: raise ValueError(f"Request id {request_id} already running.") ``` however, this is not always effective: 1) We can have abort race conditions where a request is no longer tracked by the frontend, but still not completed in the engine. See vllm-project#15326 for an attempt to fix this. 2) We can have async scheduling race conditions where a request ID is removed from the output processor and being scheduled while the older request with that ID is still being completed by the model runner. See vllm-project#29355 3) With P/D, a request will continue to be tracked by the prefill engine long after the prefill request has been completed in the frontend, while we wait for the decode side to fetch the KV blocks. See vllm-project#20139 Let's instead ensure we use a unique request ID internally, even when a client provides a custom request ID. We can do this simply by appending a short random suffix to any request ID provided by the frontend. A full 32 character random UUID would be overkill as a suffix, so how many random characters would be sufficient? 8 characters gives us 32 bits of entropy, or 16^8 possible prefixes. Using the collision probability approximation from https://preshing.com/20110504/hash-collision-probabilities: N = 16^8 and k is the number of generated suffixes, then the probability of collision is (k^2)/(2N), so If a client somehow caused vLLM to hold 10k requests that reuse the same client-provided ID, then there would be a 1.16% chance of collision: ``` >>> (k**2)/(2*N) 0.011641532182693481 ``` That seems (super good enough)[https://hownot2.com/products/hownot2-super-good-enough-t-shirt]. The key changes to support this are: 1. `InputProcessor.process_inputs()` - we add some randomness to the request ID just before creating an `EngineCoreRequest`, and store both the random "internal" request ID (as `request_id`) and the supplied "external" request ID (as `external_req_id`) in the `EngineCoreRequest`. 2. `RequestState.make_request_output()` - we ensure that `RequestOutput.request_id` continues to be the external request ID (for backwards compat) and add `internal_request_id`. 3. `OutputProcessor.abort_requests()` - we make `OutputProcessor` track a mapping from external request ID to internal request IDs, so `abort_requests()` can abort based on either ID. 4. `AsyncLLM` - we use `RequestOutputCollector` to track the internal request ID, so we can use the internal ID to abort an in-progress request. We also add an `internal` boolean flag to `abort()` so API users can abort based on either ID. 5. `ParentRequest` - in the case of parallel sampling, we need to track both the internal and external ID for the later creation of `RequestOutput` aggregating the child outputs. We need to ensure we track the external->internal request ID mapping because abort() will be supplied an external request ID. In the case where an external request ID maps to multiple running requests, we assume the caller requires all of those requests to be aborted. The caller can use EngineCoreRequest.request_id as the request ID if they want to be more specific. Signed-off-by: Mark McLoughlin <markmc@redhat.com>

Make request IDs unique

c506ad7

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

mergify bot added the performance Performance-related issues label Oct 29, 2025

gemini-code-assist bot reviewed Oct 29, 2025

View reviewed changes

vllm/benchmarks/lib/endpoint_request_func.py Outdated Show resolved Hide resolved

Verify request ID uniqueness in the scheduler

9032260

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

eicherseiji requested review from ApostaC, WoosukKwon, alexm-redhat, comaniac, heheda12345, njhill, robertgshaw2-redhat and ywang96 as code owners October 29, 2025 07:09

mergify bot added the v1 label Oct 29, 2025

eicherseiji commented Oct 29, 2025

View reviewed changes

vllm/v1/core/sched/scheduler.py Outdated Show resolved Hide resolved

markmc reviewed Oct 29, 2025

View reviewed changes

Make --request-id-prefix default unique

75c9136

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

kouroshHakha reviewed Oct 29, 2025

View reviewed changes

markmc approved these changes Oct 30, 2025

View reviewed changes

markmc added ready ONLY add when PR is ready to merge/full CI is needed and removed performance Performance-related issues labels Oct 30, 2025

mergify bot added the performance Performance-related issues label Oct 30, 2025

eicherseiji changed the title ~~[benchmark] Make request IDs unique~~ [benchmark] Make request IDs unique across clients by default Oct 30, 2025

njhill approved these changes Oct 31, 2025

View reviewed changes

njhill merged commit b2e65cb into vllm-project:main Oct 31, 2025
50 of 51 checks passed

markmc mentioned this pull request Nov 3, 2025

[Core] Add a random suffix to frontend-provided request IDs #27987

Merged

ZhengHongming888 pushed a commit to ZhengHongming888/vllm that referenced this pull request Nov 8, 2025

[benchmark] Make request IDs unique across clients by default (vllm-p…

c28ed4b

…roject#27723) Signed-off-by: Seiji Eicher <seiji@anyscale.com>

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025

[benchmark] Make request IDs unique across clients by default (vllm-p…

f54238c

…roject#27723) Signed-off-by: Seiji Eicher <seiji@anyscale.com>

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

[benchmark] Make request IDs unique across clients by default (vllm-p…

7d04cd4

…roject#27723) Signed-off-by: Seiji Eicher <seiji@anyscale.com>

Uh oh!

Conversation

eicherseiji commented Oct 29, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

markmc commented Oct 29, 2025

Uh oh!

markmc Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

eicherseiji Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

markmc commented Oct 29, 2025

Uh oh!

eicherseiji commented Oct 29, 2025

Uh oh!

kouroshHakha Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

markmc Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

kouroshHakha Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

eicherseiji Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

kouroshHakha Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

eicherseiji commented Oct 29, 2025 •

edited by github-actions bot

Loading

eicherseiji Oct 29, 2025 •

edited

Loading