[Benchmark] Convenience script for multiple parameter combinations by DarkLight1337 · Pull Request #27085 · vllm-project/vllm

DarkLight1337 · 2025-10-17T07:39:20Z

Purpose

Avoid having to manually start up the server and run the benchmark separately for every parameter combination.

FIX #27084

cc @lengrongfu @noooop

Also:

Fix wrong link of ShareGPT4V download
Add /reset_mm_cache endpoint for MM benchmarks

Example usage:

Download benchmarks/ShareGPT_V3_unfiltered_cleaned_split.json
Create benchmarks/hparams.json:

[
    {
        "api_server_count": 1,
        "max_num_batched_tokens": 2048
    },
    {
        "api_server_count": 2,
        "max_num_batched_tokens": 2048
    },
    {
        "api_server_count": 4,
        "max_num_batched_tokens": 2048
    },
    {
        "api_server_count": 1,
        "max_num_batched_tokens": 1024
    },
    {
        "api_server_count": 1,
        "max_num_batched_tokens": 2048
    },
    {
        "api_server_count": 1,
        "max_num_batched_tokens": 4096
    },
    {
        "api_server_count": 1,
        "max_num_batched_tokens": 2048,
        "dtype": "bfloat16"
    }
]

Run the benchmarks:

python vllm/benchmarks/serve_multi.py \
    --serve-cmd 'vllm serve BAAI/bge-base-en-v1.5 --dtype float32 --runner pooling' \
    --bench-cmd 'vllm bench serve --model BAAI/bge-base-en-v1.5 --backend openai-embeddings --endpoint /v1/embeddings --dataset-name sharegpt --dataset-path benchmarks/ShareGPT_V3_unfiltered_cleaned_split.json' \
    --serve-params benchmarks/hparams.json \
    -o benchmarks/results

Tips:

Use --dry-run to preview the commands to run first.
Set --num-runs to increase the reliability of the results.
Set --resume to resume a previous run based on timestamp.
Set --sla-params to iterate through request rate or allowed concurrency to find the maximum value that supports the SLA.
- Set --sla-variable to choose between determining request rate or max allowed concurrency

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

mergify · 2025-10-17T09:47:53Z

Documentation preview: https://vllm--27085.org.readthedocs.build/en/27085/

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

ProExpertProg

This is great, I can't wait to abandon my numerous combinations of bash scripts. One Q: why not just start the server once for all different bench params?

It would be amazing if we could add an lm_eval run at the start as well (maybe optional if lm_eval command is specified)

happy to try to add that in a follow-up if you don't get to it

vllm/benchmarks/serve_multi.py

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 · 2025-10-19T04:55:27Z

/gemini review

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

gemini-code-assist

Code Review

This pull request introduces a convenient benchmarking script, serve_multi.py, to automate running benchmarks with multiple parameter combinations, which is a great addition for performance testing. It also includes a fix for a download link and adds a new /reset_mm_cache endpoint for multi-modal models. The new script is well-documented and feature-rich, supporting different modes like batch and SLA. My main feedback is on improving the robustness of server process termination within the new script.

gemini-code-assist · 2025-10-19T05:02:47Z

vllm/benchmarks/serve_multi.py

+            # In case only some processes have been terminated
+            with contextlib.suppress(ProcessLookupError):
+                # We need to kill both API Server and Engine processes
+                os.killpg(os.getpgid(server_process.pid), signal.SIGKILL)


Using signal.SIGKILL to terminate the server process is quite forceful and prevents any graceful shutdown or cleanup procedures that the server might have. This can lead to orphaned resources or an inconsistent state, which is particularly problematic for a benchmarking script that relies on a clean environment for each run.

I recommend using signal.SIGTERM first to allow for a graceful shutdown. You can then wait for a short period and follow up with signal.SIGKILL if the process has not terminated. This two-step approach is more robust.

# We need to kill both API Server and Engine processes pgid = os.getpgid(server_process.pid) os.killpg(pgid, signal.SIGTERM) try: server_process.wait(timeout=10) except subprocess.TimeoutExpired: print("Server did not terminate gracefully, sending SIGKILL.") os.killpg(pgid, signal.SIGKILL)

…llm-project#27085) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

ProExpertProg · 2025-10-20T15:45:25Z

The second idea is how we can plot the full curve of multiple relationships at the same time; for example, TTFT vs QPS, TPOT vs QPS, ITL vs QPS, similar to the plotting provided in this PR #27080.

Maybe we can leave these issues as TODO and wait for community feedback before deciding whether to go further and implement it.

@lengrongfu I can share the script I use for parsing the results, I'll probably have to adapt it to read from the JSON outputs. I am not sure what the best way to integrate it would be but if you have an idea feel free to make a PR, and perhaps my script can be used as a starting point.

DarkLight1337 · 2025-10-20T16:04:55Z

I have added plotting in #27168, check it out

…llm-project#27085) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

…llm-project#27085) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

…llm-project#27085) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

[Benchmark] Convenience script for multiple parameter combinations

d067884

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

mergify bot added the performance Performance-related issues label Oct 17, 2025

DarkLight1337 added 7 commits October 17, 2025 07:47

Put run number first for easier conversion to CSV

e9f546d

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Output CSV and separate results by timestamp

211aa98

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Clean up

cc687a1

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Comment

c809ff9

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Clean

481e1e2

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Comment

ba3e8dc

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Support benchmark overrides

978c626

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 force-pushed the bench-serve-multi branch from 8b69ea8 to 978c626 Compare October 17, 2025 08:40

Fix a wrong link

34f35ea

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

mergify bot added the documentation Improvements or additions to documentation label Oct 17, 2025

DarkLight1337 added 2 commits October 17, 2025 13:52

Convert to str

e9c13a9

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Add resume functionality

af893ab

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 mentioned this pull request Oct 17, 2025

[RFC]: Add vllm benchs subcommand to benchmark test multiple rounds #27084

Closed

1 task

ProExpertProg reviewed Oct 17, 2025

View reviewed changes

vllm/benchmarks/serve_multi.py Outdated Show resolved Hide resolved

Avoid restarting the server

9483e6c

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

mergify bot added the frontend label Oct 18, 2025

DarkLight1337 added 11 commits October 18, 2025 06:45

Add SLA tuning

729fb43

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Fix name

85cfc56

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Fix name

76b165d

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Fix name

7695590

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Fix name

95bef21

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Fix name

dcfa092

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Fix request rate not set

1016665

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Use binary search

a293bce

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Simplify

c824d02

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Fix convergence

0871752

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Multiple runs per iter for better reliability

42ab905

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 19, 2025

DarkLight1337 disabled auto-merge October 19, 2025 03:40

DarkLight1337 added 2 commits October 19, 2025 03:41

add links

f65c15c

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Fix after_bench not used

6ce907f

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 enabled auto-merge (squash) October 19, 2025 03:46

DarkLight1337 added 7 commits October 19, 2025 04:33

Fix

27c27da

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Mention --dry-run

c9adffa

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Add tip

600ce3c

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Refactor

bd80c80

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Typo

66c4f68

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Reword

2cb649c

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Increase inf

abead3c

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Fix

2910133

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

gemini-code-assist bot reviewed Oct 19, 2025

View reviewed changes

Update serve_multi.py

e3e32b2

vllm-bot merged commit b3aba04 into vllm-project:main Oct 19, 2025
45 of 49 checks passed

DarkLight1337 deleted the bench-serve-multi branch October 19, 2025 06:57

DarkLight1337 mentioned this pull request Oct 19, 2025

[Benchmark] Add plot utility for parameter sweep #27168

Merged

5 tasks

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[Benchmark] Convenience script for multiple parameter combinations (v…

e6df4e6

…llm-project#27085) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

adabeyta pushed a commit to adabeyta/vllm that referenced this pull request Oct 20, 2025

[Benchmark] Convenience script for multiple parameter combinations (v…

c7c87ee

…llm-project#27085) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

ilmarkov pushed a commit to neuralmagic/vllm that referenced this pull request Nov 7, 2025

[Benchmark] Convenience script for multiple parameter combinations (v…

6903582

…llm-project#27085) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025

[Benchmark] Convenience script for multiple parameter combinations (v…

fca5dc7

…llm-project#27085) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

[Benchmark] Convenience script for multiple parameter combinations (v…

28ad8b5

…llm-project#27085) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Benchmark] Convenience script for multiple parameter combinations#27085

[Benchmark] Convenience script for multiple parameter combinations#27085
vllm-bot merged 66 commits intovllm-project:mainfrom
DarkLight1337:bench-serve-multi

DarkLight1337 commented Oct 17, 2025 •

edited by github-actions bot

Loading

Uh oh!

mergify bot commented Oct 17, 2025

Uh oh!

ProExpertProg left a comment

Uh oh!

Uh oh!

DarkLight1337 commented Oct 19, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 19, 2025

Uh oh!

Uh oh!

ProExpertProg commented Oct 20, 2025

Uh oh!

DarkLight1337 commented Oct 20, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Uh oh!

Conversation

DarkLight1337 commented Oct 17, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

mergify bot commented Oct 17, 2025

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DarkLight1337 commented Oct 19, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ProExpertProg commented Oct 20, 2025

Uh oh!

DarkLight1337 commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

DarkLight1337 commented Oct 17, 2025 •

edited by github-actions bot

Loading

DarkLight1337 commented Oct 20, 2025 •

edited

Loading