Skip to content

Conversation

@frankie-ys
Copy link
Contributor

@frankie-ys frankie-ys commented Aug 11, 2025

Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

I found that run vllm with 1P1D with disaggregation example, the proxy server don't have rate limiter. When request's concurrency bigger than 20, the prefill or decode insatence will hangs or crash. After add rate limiter to proxy server , it works smoothly. And solve this issue 's problem which because of high concurrency request. https://github.com/vllm-project/vllm/issues/11247
What's more, you can contact me via email if you have any questions.

Test Plan

No need to add new test.

Test Result

(Optional) Documentation Update

…xy to handle concurrent requests and prevent the prefill or decode service crashed or hangs (vllm-project#22575)

Signed-off-by: frankie-ys <[email protected]>
@mergify mergify bot added the performance Performance-related issues label Aug 11, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a rate limiter and a request queue to the disaggregation proxy server to prevent crashes under high concurrency. The implementation uses a token bucket algorithm for rate limiting and a semaphore for controlling concurrent requests to the backend. While this is a good approach to solve the stability issue, there is a critical flaw in the RateLimiter.acquire method where an asyncio lock is held during an await asyncio.sleep(). This will serialize all requests and severely impact performance, defeating the purpose of using an asynchronous framework. I've provided a suggestion to fix this issue.

@frankie-ys
Copy link
Contributor Author

yes, I forgot this issue, I have recommit this file.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@frankie-ys
Copy link
Contributor Author

frankie-ys commented Aug 12, 2025

I found that an error of OOM happened on buildkite/fastcheck/pr, and this error unrelated to my PR. Can this check get a manual trigger?
image

@DarkLight1337
Copy link
Member

Retrying

@frankie-ys
Copy link
Contributor Author

frankie-ys commented Aug 12, 2025

Retrying

Thanks, after merge from main branch , it works well.

@frankie-ys frankie-ys changed the title [BugFix] Provide bucket algorithm rate limiter and queue for 1P1D pro… [BugFix] Provide bucket algorithm rate limiter and queue for proxy_server Aug 13, 2025
@frankie-ys frankie-ys changed the title [BugFix] Provide bucket algorithm rate limiter and queue for proxy_server Provide bucket algorithm rate limiter and queue for proxy_server Aug 13, 2025
@DarkLight1337 DarkLight1337 requested a review from KuntaiDu August 13, 2025 11:47
@frankie-ys frankie-ys changed the title Provide bucket algorithm rate limiter and queue for proxy_server [P/D]Provide bucket algorithm rate limiter for proxy_server Aug 13, 2025
Copy link
Collaborator

@KuntaiDu KuntaiDu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other part LGTM.

Comment on lines 18 to 22
AIOHTTP_TIMEOUT = aiohttp.ClientTimeout(total=300)
# Maximum concurrent requests to backend services
MAX_CONCURRENT_REQUESTS = 100
REQUEST_QUEUE_SIZE = 500 # Maximum number of requests in the queue
RATE_LIMIT = 40 # Maximum requests per second (rate limiting)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we can move these to cli args.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I have moved the variable to the cli args.

Copy link
Collaborator

@KuntaiDu KuntaiDu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks much much cleaner. LGTM!
But since currently there are people fixing CI, so I will postpone the merge till the CI is green.

@DarkLight1337
Copy link
Member

DarkLight1337 commented Aug 15, 2025

The CI should be green now apart from nightly tests so feel free to merge

@DarkLight1337
Copy link
Member

Can you merge from main branch?

@KuntaiDu
Copy link
Collaborator

Sure! Thanks for letting me know!

@KuntaiDu KuntaiDu enabled auto-merge (squash) August 15, 2025 05:14
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 15, 2025
@KuntaiDu KuntaiDu merged commit b2c0650 into vllm-project:main Aug 15, 2025
26 of 28 checks passed
yiliu30 pushed a commit to yiliu30/vllm-fork that referenced this pull request Aug 19, 2025
…ject#22643)

Signed-off-by: frankie-ys <[email protected]>
Signed-off-by: frankie <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Kuntai Du <[email protected]>
divakar-amd pushed a commit to divakar-amd/vllm_upstream that referenced this pull request Aug 20, 2025
…ject#22643)

Signed-off-by: frankie-ys <[email protected]>
Signed-off-by: frankie <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Kuntai Du <[email protected]>
djmmoss pushed a commit to djmmoss/vllm that referenced this pull request Aug 21, 2025
…ject#22643)

Signed-off-by: frankie-ys <[email protected]>
Signed-off-by: frankie <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Kuntai Du <[email protected]>
Signed-off-by: Duncan Moss <[email protected]>
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025
…ject#22643)

Signed-off-by: frankie-ys <[email protected]>
Signed-off-by: frankie <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Kuntai Du <[email protected]>
xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025
…ject#22643)

Signed-off-by: frankie-ys <[email protected]>
Signed-off-by: frankie <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Kuntai Du <[email protected]>
Signed-off-by: Xiao Yu <[email protected]>
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025
…ject#22643)

Signed-off-by: frankie-ys <[email protected]>
Signed-off-by: frankie <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Kuntai Du <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Performance-related issues ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants