[Feature] Add --max-unfinished-requests apiserver parameter by chaunceyjiang · Pull Request #39492 · vllm-project/vllm

chaunceyjiang · 2026-04-10T10:11:26Z

Purpose

Feature implementation #18826

Add new CLI argument --max-unfinished-requests to limit concurrent unfinished requests across all API servers. When the limit is exceeded, new requests are rejected with 503 Service Unavailable.

Test Plan

 vllm serve /mnt/data3/models/MiniMax/MiniMax-M2.5 -tp 4 --tool-call-parser minimax_m2 --enable-auto-tool-choice --reasoning-parser minimax_m2   --trust-remote-code  --max_unfinished_requests 10 --api-server-count 4

Test Result

(ApiServer_1 pid=3380147) INFO:     127.0.0.1:52948 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(ApiServer_1 pid=3380147) INFO:     127.0.0.1:53006 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(ApiServer_1 pid=3380147) INFO:     127.0.0.1:53012 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(ApiServer_1 pid=3380147) INFO:     127.0.0.1:53012 - "POST /v1/chat/completions HTTP/1.1" 503 Service Unavailable
(ApiServer_1 pid=3380147) INFO:     127.0.0.1:53006 - "POST /v1/chat/completions HTTP/1.1" 503 Service Unavailable
(ApiServer_2 pid=3380148) INFO:     127.0.0.1:52970 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(ApiServer_3 pid=3380149) INFO:     127.0.0.1:52964 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(ApiServer_3 pid=3380149) INFO:     127.0.0.1:52990 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(ApiServer_2 pid=3380148) INFO:     127.0.0.1:53022 - "POST /v1/chat/completions HTTP/1.1" 200 OK

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Add new CLI argument --max-unfinished-requests to limit concurrent unfinished requests across all API servers. When the limit is exceeded, new requests are rejected with 503 Service Unavailable. Features: - Single server mode: checks local server_load_metrics directly - Multi-server mode: uses shared multiprocessing.Array to aggregate counts from all API servers and check the total - Auto-enables --enable-server-load-tracking when this option is set Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

gemini-code-assist

Code Review

This pull request implements a global request limit across multiple API servers using shared memory. It introduces the --max-unfinished-requests parameter and updates the load_aware_call decorator to reject requests with a 503 error when the limit is reached. Review feedback identifies several critical issues: the shared memory array is not updated during request completion, leading to stale data; a race condition exists in the limit-checking logic; and the shared array should use a lock to ensure consistent reads during summation.

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

chaunceyjiang · 2026-04-10T10:33:07Z

#27064 (comment)

Hi @orozery PTAL.

chaunceyjiang · 2026-04-13T06:36:39Z

/cc @DarkLight1337 PTAL.

chaunceyjiang requested review from DarkLight1337, aarnphm, hmellor, mgoin, njhill and russellb as code owners April 10, 2026 10:11

mergify bot added frontend v1 labels Apr 10, 2026

gemini-code-assist bot reviewed Apr 10, 2026

View reviewed changes

Comment thread vllm/entrypoints/utils.py

Comment thread vllm/entrypoints/utils.py Outdated

Comment thread vllm/entrypoints/cli/serve.py

Add --max-unfinished-requests apiserver parameter

bc30971

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

chaunceyjiang force-pushed the feature/max-unfinished-requests branch from 9990397 to bc30971 Compare April 10, 2026 10:30

chaunceyjiang requested a review from orozery April 10, 2026 10:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Add --max-unfinished-requests apiserver parameter#39492

[Feature] Add --max-unfinished-requests apiserver parameter#39492
chaunceyjiang wants to merge 2 commits intovllm-project:mainfrom
chaunceyjiang:feature/max-unfinished-requests

chaunceyjiang commented Apr 10, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chaunceyjiang commented Apr 10, 2026

Uh oh!

chaunceyjiang commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

chaunceyjiang commented Apr 10, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chaunceyjiang commented Apr 10, 2026

Uh oh!

chaunceyjiang commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

chaunceyjiang commented Apr 10, 2026 •

edited by github-actions bot

Loading