[Benchmark] Convenience script for multiple parameter combinations#27085
[Benchmark] Convenience script for multiple parameter combinations#27085vllm-bot merged 66 commits intovllm-project:mainfrom
Conversation
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
8b69ea8 to
978c626
Compare
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
Documentation preview: https://vllm--27085.org.readthedocs.build/en/27085/ |
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
ProExpertProg
left a comment
There was a problem hiding this comment.
This is great, I can't wait to abandon my numerous combinations of bash scripts. One Q: why not just start the server once for all different bench params?
It would be amazing if we could add an lm_eval run at the start as well (maybe optional if lm_eval command is specified)
- happy to try to add that in a follow-up if you don't get to it
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a convenient benchmarking script, serve_multi.py, to automate running benchmarks with multiple parameter combinations, which is a great addition for performance testing. It also includes a fix for a download link and adds a new /reset_mm_cache endpoint for multi-modal models. The new script is well-documented and feature-rich, supporting different modes like batch and SLA. My main feedback is on improving the robustness of server process termination within the new script.
| # In case only some processes have been terminated | ||
| with contextlib.suppress(ProcessLookupError): | ||
| # We need to kill both API Server and Engine processes | ||
| os.killpg(os.getpgid(server_process.pid), signal.SIGKILL) |
There was a problem hiding this comment.
Using signal.SIGKILL to terminate the server process is quite forceful and prevents any graceful shutdown or cleanup procedures that the server might have. This can lead to orphaned resources or an inconsistent state, which is particularly problematic for a benchmarking script that relies on a clean environment for each run.
I recommend using signal.SIGTERM first to allow for a graceful shutdown. You can then wait for a short period and follow up with signal.SIGKILL if the process has not terminated. This two-step approach is more robust.
# We need to kill both API Server and Engine processes
pgid = os.getpgid(server_process.pid)
os.killpg(pgid, signal.SIGTERM)
try:
server_process.wait(timeout=10)
except subprocess.TimeoutExpired:
print("Server did not terminate gracefully, sending SIGKILL.")
os.killpg(pgid, signal.SIGKILL)…llm-project#27085) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
@lengrongfu I can share the script I use for parsing the results, I'll probably have to adapt it to read from the JSON outputs. I am not sure what the best way to integrate it would be but if you have an idea feel free to make a PR, and perhaps my script can be used as a starting point. |
|
I have added plotting in #27168, check it out |
…llm-project#27085) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…llm-project#27085) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…llm-project#27085) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…llm-project#27085) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…llm-project#27085) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…llm-project#27085) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Purpose
Avoid having to manually start up the server and run the benchmark separately for every parameter combination.
FIX #27084
cc @lengrongfu @noooop
Also:
/reset_mm_cacheendpoint for MM benchmarksExample usage:
benchmarks/ShareGPT_V3_unfiltered_cleaned_split.jsonbenchmarks/hparams.json:[ { "api_server_count": 1, "max_num_batched_tokens": 2048 }, { "api_server_count": 2, "max_num_batched_tokens": 2048 }, { "api_server_count": 4, "max_num_batched_tokens": 2048 }, { "api_server_count": 1, "max_num_batched_tokens": 1024 }, { "api_server_count": 1, "max_num_batched_tokens": 2048 }, { "api_server_count": 1, "max_num_batched_tokens": 4096 }, { "api_server_count": 1, "max_num_batched_tokens": 2048, "dtype": "bfloat16" } ]python vllm/benchmarks/serve_multi.py \ --serve-cmd 'vllm serve BAAI/bge-base-en-v1.5 --dtype float32 --runner pooling' \ --bench-cmd 'vllm bench serve --model BAAI/bge-base-en-v1.5 --backend openai-embeddings --endpoint /v1/embeddings --dataset-name sharegpt --dataset-path benchmarks/ShareGPT_V3_unfiltered_cleaned_split.json' \ --serve-params benchmarks/hparams.json \ -o benchmarks/resultsTips:
--dry-runto preview the commands to run first.--num-runsto increase the reliability of the results.--resumeto resume a previous run based on timestamp.--sla-paramsto iterate through request rate or allowed concurrency to find the maximum value that supports the SLA.--sla-variableto choose between determining request rate or max allowed concurrencyTest Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.