[P/D]Provide bucket algorithm rate limiter for proxy_server #22643

frankie-ys · 2025-08-11T12:22:21Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

I found that run vllm with 1P1D with disaggregation example, the proxy server don't have rate limiter. When request's concurrency bigger than 20, the prefill or decode insatence will hangs or crash. After add rate limiter to proxy server , it works smoothly. And solve this issue 's problem which because of high concurrency request. https://github.com/vllm-project/vllm/issues/11247
What's more, you can contact me via email if you have any questions.

Test Plan

No need to add new test.

Test Result

(Optional) Documentation Update

…xy to handle concurrent requests and prevent the prefill or decode service crashed or hangs (vllm-project#22575) Signed-off-by: frankie-ys <[email protected]>

gemini-code-assist

Code Review

This pull request introduces a rate limiter and a request queue to the disaggregation proxy server to prevent crashes under high concurrency. The implementation uses a token bucket algorithm for rate limiting and a semaphore for controlling concurrent requests to the backend. While this is a good approach to solve the stability issue, there is a critical flaw in the RateLimiter.acquire method where an asyncio lock is held during an await asyncio.sleep(). This will serialize all requests and severely impact performance, defeating the purpose of using an asynchronous framework. I've provided a suggestion to fix this issue.

benchmarks/disagg_benchmarks/disagg_prefill_proxy_server.py

Signed-off-by: frankie-ys <[email protected]>

frankie-ys · 2025-08-11T12:33:55Z

yes, I forgot this issue, I have recommit this file.

github-actions · 2025-08-11T12:46:59Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: frankie-ys <[email protected]>

frankie-ys · 2025-08-12T06:27:19Z

I found that an error of OOM happened on buildkite/fastcheck/pr, and this error unrelated to my PR. Can this check get a manual trigger?

DarkLight1337 · 2025-08-12T07:02:14Z

Retrying

Signed-off-by: frankie-ys <[email protected]>

frankie-ys · 2025-08-12T11:06:42Z

Retrying

Thanks, after merge from main branch , it works well.

Signed-off-by: frankie-ys <[email protected]>

benchmarks/disagg_benchmarks/disagg_prefill_proxy_server.py

Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: frankie <[email protected]>

Modify the variables and abstract the request queue into a separate file (vllm-project#22643) Signed-off-by: frankie-ys <[email protected]>

…ject#22643) Signed-off-by: frankie-ys <[email protected]>

Signed-off-by: frankie-ys <[email protected]>

KuntaiDu

Other part LGTM.

KuntaiDu · 2025-08-14T18:04:07Z

benchmarks/disagg_benchmarks/disagg_prefill_proxy_server.py

+AIOHTTP_TIMEOUT = aiohttp.ClientTimeout(total=300)
+# Maximum concurrent requests to backend services
+MAX_CONCURRENT_REQUESTS = 100
+REQUEST_QUEUE_SIZE = 500  # Maximum number of requests in the queue
+RATE_LIMIT = 40  # Maximum requests per second (rate limiting)


It would be nice if we can move these to cli args.

Yeah, I have moved the variable to the cli args.

Signed-off-by: frankie-ys <[email protected]>

KuntaiDu

The code looks much much cleaner. LGTM!
But since currently there are people fixing CI, so I will postpone the merge till the CI is green.

DarkLight1337 · 2025-08-15T05:12:36Z

The CI should be green now apart from nightly tests so feel free to merge

DarkLight1337 · 2025-08-15T05:13:14Z

Can you merge from main branch?

KuntaiDu · 2025-08-15T05:14:28Z

Sure! Thanks for letting me know!

…ject#22643) Signed-off-by: frankie-ys <[email protected]> Signed-off-by: frankie <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Kuntai Du <[email protected]>

…ject#22643) Signed-off-by: frankie-ys <[email protected]> Signed-off-by: frankie <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Kuntai Du <[email protected]> Signed-off-by: Duncan Moss <[email protected]>

…ject#22643) Signed-off-by: frankie-ys <[email protected]> Signed-off-by: frankie <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Kuntai Du <[email protected]>

…ject#22643) Signed-off-by: frankie-ys <[email protected]> Signed-off-by: frankie <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Kuntai Du <[email protected]> Signed-off-by: Xiao Yu <[email protected]>

…ject#22643) Signed-off-by: frankie-ys <[email protected]> Signed-off-by: frankie <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Kuntai Du <[email protected]>

[BugFix] Provide bucket algorithm rate limiter and queue for 1P1D pro…

a25a4cb

…xy to handle concurrent requests and prevent the prefill or decode service crashed or hangs (vllm-project#22575) Signed-off-by: frankie-ys <[email protected]>

mergify bot added the performance Performance-related issues label Aug 11, 2025

gemini-code-assist bot reviewed Aug 11, 2025

View reviewed changes

benchmarks/disagg_benchmarks/disagg_prefill_proxy_server.py Outdated Show resolved Hide resolved

[PR Fix] prevent deadLock (vllm-project#22575)

3a8a896

Signed-off-by: frankie-ys <[email protected]>

frankie-ys added 6 commits August 11, 2025 21:13

[PR Fix] fix ruff (vllm-project#22575)

b4c50ce

Signed-off-by: frankie-ys <[email protected]>

[PR Fix] fix ruff (vllm-project#22575)

4c71543

Signed-off-by: frankie-ys <[email protected]>

[PR Fix] fix ruff (vllm-project#22575)

7640a98

Signed-off-by: frankie-ys <[email protected]>

[PR Fix] fix ruff (vllm-project#22575)

4f380c7

Signed-off-by: frankie-ys <[email protected]>

[PR Fix] fix ruff (vllm-project#22575)

83c4bb4

Signed-off-by: frankie-ys <[email protected]>

[PR Fix] fix ruff (vllm-project#22575)

9976225

Signed-off-by: frankie-ys <[email protected]>

[PR Fix] fix ruff (vllm-project#22575)

68f70a0

Signed-off-by: frankie-ys <[email protected]>

frankie-ys added 2 commits August 13, 2025 08:30

Merge branch 'vllm-project:main' into main

115e536

[PR Fix] fix ruff (vllm-project#22575)

fd3f1e9

Signed-off-by: frankie-ys <[email protected]>

frankie-ys changed the title ~~[BugFix] Provide bucket algorithm rate limiter and queue for 1P1D pro…~~ [BugFix] Provide bucket algorithm rate limiter and queue for proxy_server Aug 13, 2025

frankie-ys changed the title ~~[BugFix] Provide bucket algorithm rate limiter and queue for proxy_server~~ Provide bucket algorithm rate limiter and queue for proxy_server Aug 13, 2025

DarkLight1337 requested a review from KuntaiDu August 13, 2025 11:47

frankie-ys changed the title ~~Provide bucket algorithm rate limiter and queue for proxy_server~~ [P/D]Provide bucket algorithm rate limiter for proxy_server Aug 13, 2025

KuntaiDu reviewed Aug 14, 2025

View reviewed changes

benchmarks/disagg_benchmarks/disagg_prefill_proxy_server.py Outdated Show resolved Hide resolved

benchmarks/disagg_benchmarks/disagg_prefill_proxy_server.py Outdated Show resolved Hide resolved

benchmarks/disagg_benchmarks/disagg_prefill_proxy_server.py Show resolved Hide resolved

DarkLight1337 reviewed Aug 14, 2025

View reviewed changes

benchmarks/disagg_benchmarks/disagg_prefill_proxy_server.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Aug 14, 2025

View reviewed changes

benchmarks/disagg_benchmarks/disagg_prefill_proxy_server.py Outdated Show resolved Hide resolved

frankie-ys and others added 5 commits August 14, 2025 13:57

Update benchmarks/disagg_benchmarks/disagg_prefill_proxy_server.py

fe69db9

Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: frankie <[email protected]>

Update benchmarks/disagg_benchmarks/disagg_prefill_proxy_server.py

6a61475

Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: frankie <[email protected]>

[review Fix]

e899ef1

Modify the variables and abstract the request queue into a separate file (vllm-project#22643) Signed-off-by: frankie-ys <[email protected]>

[review Fix] abstract the rate limiter into a separate file (vllm-pro…

5583205

…ject#22643) Signed-off-by: frankie-ys <[email protected]>

[review Fix] abstract the rate limiter into a separate file (vllm-pro…

273f54f

…ject#22643) Signed-off-by: frankie-ys <[email protected]>

frankie-ys added 2 commits August 14, 2025 22:21

[Fix] fix(vllm-project#22643)

0ea3e8d

Signed-off-by: frankie-ys <[email protected]>

[Fix] fix(vllm-project#22643)

cdb4941

Signed-off-by: frankie-ys <[email protected]>

KuntaiDu reviewed Aug 14, 2025

View reviewed changes

frankie-ys and others added 5 commits August 15, 2025 10:25

[Fix] ruff fix(vllm-project#22643)

f2d9c2f

Signed-off-by: frankie-ys <[email protected]>

Merge branch 'main' into main

75d2e5b

[Fix] ruff fix(vllm-project#22643)

53dfe50

Signed-off-by: frankie-ys <[email protected]>

[Fix] fix(vllm-project#22643)

6512c8b

Signed-off-by: frankie-ys <[email protected]>

[Fix] fix(vllm-project#22643)

f78d894

Signed-off-by: frankie-ys <[email protected]>

frankie-ys requested review from DarkLight1337 and KuntaiDu August 15, 2025 03:52

KuntaiDu approved these changes Aug 15, 2025

View reviewed changes

Merge branch 'main' into main

9ee5b88

KuntaiDu enabled auto-merge (squash) August 15, 2025 05:14

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 15, 2025

KuntaiDu merged commit b2c0650 into vllm-project:main Aug 15, 2025
26 of 28 checks passed

Uh oh!

[P/D]Provide bucket algorithm rate limiter for proxy_server #22643

[P/D]Provide bucket algorithm rate limiter for proxy_server #22643

Uh oh!

Conversation

frankie-ys commented Aug 11, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

frankie-ys commented Aug 11, 2025

Uh oh!

github-actions bot commented Aug 11, 2025

Uh oh!

frankie-ys commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 commented Aug 12, 2025

Uh oh!

frankie-ys commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

KuntaiDu left a comment

Choose a reason for hiding this comment

Uh oh!

KuntaiDu Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

frankie-ys Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

KuntaiDu left a comment

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 commented Aug 15, 2025

Uh oh!

KuntaiDu commented Aug 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

frankie-ys commented Aug 11, 2025 •

edited by github-actions bot

Loading

frankie-ys commented Aug 12, 2025 •

edited

Loading

frankie-ys commented Aug 12, 2025 •

edited

Loading

DarkLight1337 commented Aug 15, 2025 •

edited

Loading