Skip to content

[Benchmark] Convenience script for multiple parameter combinations#27085

Merged
vllm-bot merged 66 commits intovllm-project:mainfrom
DarkLight1337:bench-serve-multi
Oct 19, 2025
Merged

[Benchmark] Convenience script for multiple parameter combinations#27085
vllm-bot merged 66 commits intovllm-project:mainfrom
DarkLight1337:bench-serve-multi

Conversation

@DarkLight1337
Copy link
Copy Markdown
Member

@DarkLight1337 DarkLight1337 commented Oct 17, 2025

Purpose

Avoid having to manually start up the server and run the benchmark separately for every parameter combination.

FIX #27084

cc @lengrongfu @noooop

Also:

  • Fix wrong link of ShareGPT4V download
  • Add /reset_mm_cache endpoint for MM benchmarks

Example usage:

  1. Download benchmarks/ShareGPT_V3_unfiltered_cleaned_split.json
  2. Create benchmarks/hparams.json:
[
    {
        "api_server_count": 1,
        "max_num_batched_tokens": 2048
    },
    {
        "api_server_count": 2,
        "max_num_batched_tokens": 2048
    },
    {
        "api_server_count": 4,
        "max_num_batched_tokens": 2048
    },
    {
        "api_server_count": 1,
        "max_num_batched_tokens": 1024
    },
    {
        "api_server_count": 1,
        "max_num_batched_tokens": 2048
    },
    {
        "api_server_count": 1,
        "max_num_batched_tokens": 4096
    },
    {
        "api_server_count": 1,
        "max_num_batched_tokens": 2048,
        "dtype": "bfloat16"
    }
]
  1. Run the benchmarks:
python vllm/benchmarks/serve_multi.py \
    --serve-cmd 'vllm serve BAAI/bge-base-en-v1.5 --dtype float32 --runner pooling' \
    --bench-cmd 'vllm bench serve --model BAAI/bge-base-en-v1.5 --backend openai-embeddings --endpoint /v1/embeddings --dataset-name sharegpt --dataset-path benchmarks/ShareGPT_V3_unfiltered_cleaned_split.json' \
    --serve-params benchmarks/hparams.json \
    -o benchmarks/results

Tips:

  • Use --dry-run to preview the commands to run first.
  • Set --num-runs to increase the reliability of the results.
  • Set --resume to resume a previous run based on timestamp.
  • Set --sla-params to iterate through request rate or allowed concurrency to find the maximum value that supports the SLA.
    • Set --sla-variable to choose between determining request rate or max allowed concurrency

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
@mergify mergify bot added the performance Performance-related issues label Oct 17, 2025
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Oct 17, 2025

Documentation preview: https://vllm--27085.org.readthedocs.build/en/27085/

@mergify mergify bot added the documentation Improvements or additions to documentation label Oct 17, 2025
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Copy link
Copy Markdown
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, I can't wait to abandon my numerous combinations of bash scripts. One Q: why not just start the server once for all different bench params?

It would be amazing if we could add an lm_eval run at the start as well (maybe optional if lm_eval command is specified)

  • happy to try to add that in a follow-up if you don't get to it

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
@mergify mergify bot added the frontend label Oct 18, 2025
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 19, 2025
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) October 19, 2025 03:46
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
@DarkLight1337
Copy link
Copy Markdown
Member Author

/gemini review

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a convenient benchmarking script, serve_multi.py, to automate running benchmarks with multiple parameter combinations, which is a great addition for performance testing. It also includes a fix for a download link and adds a new /reset_mm_cache endpoint for multi-modal models. The new script is well-documented and feature-rich, supporting different modes like batch and SLA. My main feedback is on improving the robustness of server process termination within the new script.

# In case only some processes have been terminated
with contextlib.suppress(ProcessLookupError):
# We need to kill both API Server and Engine processes
os.killpg(os.getpgid(server_process.pid), signal.SIGKILL)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using signal.SIGKILL to terminate the server process is quite forceful and prevents any graceful shutdown or cleanup procedures that the server might have. This can lead to orphaned resources or an inconsistent state, which is particularly problematic for a benchmarking script that relies on a clean environment for each run.

I recommend using signal.SIGTERM first to allow for a graceful shutdown. You can then wait for a short period and follow up with signal.SIGKILL if the process has not terminated. This two-step approach is more robust.

                # We need to kill both API Server and Engine processes
                pgid = os.getpgid(server_process.pid)
                os.killpg(pgid, signal.SIGTERM)
                try:
                    server_process.wait(timeout=10)
                except subprocess.TimeoutExpired:
                    print("Server did not terminate gracefully, sending SIGKILL.")
                    os.killpg(pgid, signal.SIGKILL)

@vllm-bot vllm-bot merged commit b3aba04 into vllm-project:main Oct 19, 2025
45 of 49 checks passed
@DarkLight1337 DarkLight1337 deleted the bench-serve-multi branch October 19, 2025 06:57
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
@ProExpertProg
Copy link
Copy Markdown
Collaborator

The second idea is how we can plot the full curve of multiple relationships at the same time; for example, TTFT vs QPS, TPOT vs QPS, ITL vs QPS, similar to the plotting provided in this PR #27080.

Maybe we can leave these issues as TODO and wait for community feedback before deciding whether to go further and implement it.

@lengrongfu I can share the script I use for parsing the results, I'll probably have to adapt it to read from the JSON outputs. I am not sure what the best way to integrate it would be but if you have an idea feel free to make a PR, and perhaps my script can be used as a starting point.

@DarkLight1337
Copy link
Copy Markdown
Member Author

DarkLight1337 commented Oct 20, 2025

I have added plotting in #27168, check it out

adabeyta pushed a commit to adabeyta/vllm that referenced this pull request Oct 20, 2025
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…llm-project#27085)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…llm-project#27085)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
ilmarkov pushed a commit to neuralmagic/vllm that referenced this pull request Nov 7, 2025
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation frontend performance Performance-related issues ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RFC]: Add vllm benchs subcommand to benchmark test multiple rounds

6 participants