[Bugfix]: fixed ServerDisconnectedError in benchmark test (reapply #1683, fixes #1374)#1841
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1e902af223
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
f33fd41 to
4083e17
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 10997e6453
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@amy-why-3459 PTAL |
|
LGTM |
hsliuustc0106
left a comment
There was a problem hiding this comment.
Review Blocked - Gate Failure
pre-commit: FAILED ❌
Please fix the pre-commit issues before review:
pre-commit run --all-filesOnce pre-commit passes, I'll review the ServerDisconnectedError retry logic.
e928439 to
f6aeb1d
Compare
|
@codex review |
6a0b521 to
2498f04
Compare
f6aeb1d to
fddc222
Compare
There was a problem hiding this comment.
💡 Codex Review
This commit rewrites vllm_omni/benchmarks/patch/patch.py as UTF-16 (starts with BOM 0xFF 0xFE and embedded NUL bytes), which makes the module unloadable by Python (ValueError: source code string cannot contain null bytes when compiling/importing), so any benchmark path importing this patch will fail before execution.
vllm-omni/tests/perf/tests/test.json
Line 1 in fddc222
tests/perf/tests/test.json is also converted to UTF-16 in this commit, but the perf harness reads it with open(..., encoding="utf-8") in tests/perf/scripts/run_benchmark.py, so loading the config now raises UnicodeDecodeError and prevents the nightly perf suite from starting.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Signed-off-by: NumberWan <wantszkin2003@gmail.com>
fddc222 to
d3a4941
Compare
Signed-off-by: NumberWan <wantszkin2003@gmail.com>
…m/NumberWan/vllm-omni into fix_server_disconnected_benchmark
Signed-off-by: NumberWan <wantszkin2003@gmail.com>
|
@amy-why-3459 @R2-Y PTAL |
|
LGTM, thanks |
|
@hsliuustc0106 @Gaohan123 @david6666666 Please add a ready label. |
…chmark test Signed-off-by: NumberWan <wantszkin2003@gmail.com>
Signed-off-by: NumberWan <wantszkin2003@gmail.com>
Signed-off-by: NumberWan <wantszkin2003@gmail.com>
…m/NumberWan/vllm-omni into fix_server_disconnected_benchmark
Signed-off-by: NumberWan <wantszkin2003@gmail.com>
Signed-off-by: NumberWan <wantszkin2003@gmail.com>
Signed-off-by: NumberWan <wantszkin2003@gmail.com>
Head branch was pushed to by a user without write access
…lm-project#1683, fixes vllm-project#1374) (vllm-project#1841) Signed-off-by: NumberWan <wantszkin2003@gmail.com>
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
This PR fixes #1374, where vllm bench serve --omni occasionally fails with aiohttp.client_exceptions.ServerDisconnectedError under high concurrency (e.g. max-concurrency=10) even though the Omni server is still healthy and continues to return 200 OK.
Root cause:
Fixes in this PR:
Client-side retry for transient HTTP transport errors
In async_request_openai_chat_omni_completions (vllm_omni/benchmarks/patch/patch.py), wrap the HTTP request in a bounded retry loop
In tests/perf/tests/test.json, restore the perf matrix that was previously reduced in eeb393f (which temporarily lowered max_concurrency and num_prompts to avoid [Bug]: [Benchmark] Server disconnected error #1374):
num_prompts: [10, 40, 100]
max_concurrency: [1, 4, 10]
This effectively recovers the original high‑concurrency perf cases for Qwen3‑Omni (both normal and async‑chunk).
In .buildkite/test-nightly.yml, for the 🌕 Omni Model Perf Test & Test Case Statistics step, set:
This change is scoped only to the nightly perf Buildkite job, to reduce idle‑timeout‑induced disconnects during long‑running, high‑concurrency benchmarks, without changing the default keep‑alive behavior for normal users.
Test Plan
use 50 max-concurrency and 10000 num-prompts
Test Result
Passed. all 10000 requests successfully replied.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)