[benchmark] Port benchmark request sent optimization to benchmark_serving #21209

Jialin · 2025-07-18T22:33:24Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Port changes in #21108 to benchmark_serving script as well. However, instead of copypasta the code changes, we directly let benchmark_serving scripts depends on serve.py functions to avoid code duplication.

In a long run, we would completely get rid of benchmark_serving (#21206). However, that's out of the scope of this PR which is focusing on closing gaps among these 2 benchmark scripts.

Test Plan

Check request throughput in the following scenarios:

fix request rate
unlimited request rate

Test Result

Fixed request rate benchmark run

Request Rate 200: Throughput improved 184.01 -> 199.89
Request Rate 1000: Throughput improved 687.95 -> 771.87

Before (Request Rate 200)

After (Request Rate 200)

Before (Request Rate 1000)

After (Request Rate 1000)

python benchmarks/benchmark_serving.py \
    --dataset-name random \
    --model facebook/opt-125m \
    --served-model-name facebook/opt-125m \
    --random-input-len 700 \
    --random-output-len 1 \
    --endpoint /v1/completions \
    --ignore-eos \
    --host localhost \
    --port 8000 \
    --request-rate 200 \
    --num-prompts 10000

Unlimited request rate

Request Rate slight improvements: 418.24 -> 433.88

Additional observations It UNEXPECTED that the server request throughput of Unlimited request rate is worse request rate 1000, which indicates there're weakness of the benchmark scripts or server for churn of requests. We're going to follow up for the investigation as a separate issue.

Before

After

(Optional) Documentation Update

github-actions · 2025-07-18T22:33:31Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request refactors benchmarks/benchmark_serving.py to reuse the request generation logic from vllm/benchmarks/serve.py, effectively reducing code duplication. The change also includes a micro-optimization in vllm/benchmarks/serve.py to avoid unnecessary time.time() calls when the request rate is unlimited. The changes are well-reasoned and the provided test results demonstrate a positive performance impact. The code appears correct and improves maintainability.

yeqcharlotte

awesome thanks!

houseroad

Looks good, thanks!

Signed-off-by: Jialin Ouyang <[email protected]>

…ving (vllm-project#21209) Signed-off-by: Jialin Ouyang <[email protected]> Signed-off-by: qizixi <[email protected]>

…ving (vllm-project#21209) Signed-off-by: Jialin Ouyang <[email protected]> Signed-off-by: x22x22 <[email protected]>

…ving (vllm-project#21209) Signed-off-by: Jialin Ouyang <[email protected]>

…ving (vllm-project#21209) Signed-off-by: Jialin Ouyang <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]>

…ving (vllm-project#21209) Signed-off-by: Jialin Ouyang <[email protected]> Signed-off-by: Paul Pak <[email protected]>

…ving (vllm-project#21209) Signed-off-by: Jialin Ouyang <[email protected]> Signed-off-by: Diego-Castan <[email protected]>

…ving (vllm-project#21209) Signed-off-by: Jialin Ouyang <[email protected]>

mergify bot added the performance Performance-related issues label Jul 18, 2025

gemini-code-assist bot reviewed Jul 18, 2025

View reviewed changes

yeqcharlotte requested review from simon-mo and ywang96 July 19, 2025 02:19

yeqcharlotte approved these changes Jul 19, 2025

View reviewed changes

houseroad approved these changes Jul 21, 2025

View reviewed changes

houseroad added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 21, 2025

houseroad enabled auto-merge (squash) July 21, 2025 16:45

Port benchmark request sent optimization to benchmark_serving

5a45f0a

Signed-off-by: Jialin Ouyang <[email protected]>

auto-merge was automatically disabled July 22, 2025 05:06
Head branch was pushed to by a user without write access

Jialin force-pushed the benchmark_sync branch from cf4ca5c to 5a45f0a Compare July 22, 2025 05:06

vllm-bot merged commit 10904e6 into vllm-project:main Jul 22, 2025
64 of 66 checks passed

zixi-qi pushed a commit to zixi-qi/vllm that referenced this pull request Jul 23, 2025

[benchmark] Port benchmark request sent optimization to benchmark_ser…

c7f963b

…ving (vllm-project#21209) Signed-off-by: Jialin Ouyang <[email protected]> Signed-off-by: qizixi <[email protected]>

x22x22 pushed a commit to x22x22/vllm that referenced this pull request Aug 5, 2025

[benchmark] Port benchmark request sent optimization to benchmark_ser…

a0f2a45

…ving (vllm-project#21209) Signed-off-by: Jialin Ouyang <[email protected]> Signed-off-by: x22x22 <[email protected]>

Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Aug 6, 2025

[benchmark] Port benchmark request sent optimization to benchmark_ser…

7acf3fb

…ving (vllm-project#21209) Signed-off-by: Jialin Ouyang <[email protected]>

npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025

[benchmark] Port benchmark request sent optimization to benchmark_ser…

11aeb0e

…ving (vllm-project#21209) Signed-off-by: Jialin Ouyang <[email protected]>

paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025

[benchmark] Port benchmark request sent optimization to benchmark_ser…

53b9bdf

…ving (vllm-project#21209) Signed-off-by: Jialin Ouyang <[email protected]> Signed-off-by: Paul Pak <[email protected]>

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025

[benchmark] Port benchmark request sent optimization to benchmark_ser…

09463e4

…ving (vllm-project#21209) Signed-off-by: Jialin Ouyang <[email protected]>

sducouedic pushed a commit to sducouedic/vllm that referenced this pull request Oct 16, 2025

[benchmark] Port benchmark request sent optimization to benchmark_ser…

9045853

…ving (vllm-project#21209) Signed-off-by: Jialin Ouyang <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[benchmark] Port benchmark request sent optimization to benchmark_serving #21209

[benchmark] Port benchmark request sent optimization to benchmark_serving #21209

Uh oh!

Jialin commented Jul 18, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jul 18, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

yeqcharlotte left a comment

Uh oh!

houseroad left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[benchmark] Port benchmark request sent optimization to benchmark_serving #21209

[benchmark] Port benchmark request sent optimization to benchmark_serving #21209

Uh oh!

Conversation

Jialin commented Jul 18, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

Fixed request rate benchmark run

Unlimited request rate

(Optional) Documentation Update

Uh oh!

github-actions bot commented Jul 18, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

yeqcharlotte left a comment

Choose a reason for hiding this comment

Uh oh!

houseroad left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Jialin commented Jul 18, 2025 •

edited by github-actions bot

Loading