Skip to content

[Bugfix]: fixed ServerDisconnectedError in benchmark test (reapply #1683, fixes #1374)#1841

Merged
Gaohan123 merged 13 commits into
vllm-project:mainfrom
NumberWan:fix_server_disconnected_benchmark
Mar 21, 2026
Merged

[Bugfix]: fixed ServerDisconnectedError in benchmark test (reapply #1683, fixes #1374)#1841
Gaohan123 merged 13 commits into
vllm-project:mainfrom
NumberWan:fix_server_disconnected_benchmark

Conversation

@NumberWan
Copy link
Copy Markdown
Contributor

@NumberWan NumberWan commented Mar 12, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

This PR fixes #1374, where vllm bench serve --omni occasionally fails with aiohttp.client_exceptions.ServerDisconnectedError under high concurrency (e.g. max-concurrency=10) even though the Omni server is still healthy and continues to return 200 OK.

Root cause:

  • The Omni benchmark client uses aiohttp.ClientSession with keep‑alive connections.
  • Under short-request, high‑concurrency load, some idle connections are closed by the server / proxies between requests (e.g. keep‑alive idle timeout).
  • When the client reuses such half‑closed connections, ServerDisconnectedError is raised, and the current benchmark path treats this as a fatal error without retry.

Fixes in this PR:

  1. Client-side retry for transient HTTP transport errors
    In async_request_openai_chat_omni_completions (vllm_omni/benchmarks/patch/patch.py), wrap the HTTP request in a bounded retry loop

  2. In tests/perf/tests/test.json, restore the perf matrix that was previously reduced in eeb393f (which temporarily lowered max_concurrency and num_prompts to avoid [Bug]: [Benchmark] Server disconnected error #1374):

  • num_prompts: [10, 40, 100]

  • max_concurrency: [1, 4, 10]

  • This effectively recovers the original high‑concurrency perf cases for Qwen3‑Omni (both normal and async‑chunk).

  1. CI-only keep‑alive tuning for nightly perf

In .buildkite/test-nightly.yml, for the 🌕 Omni Model Perf Test & Test Case Statistics step, set:

  • VLLM_HTTP_TIMEOUT_KEEP_ALIVE=60

This change is scoped only to the nightly perf Buildkite job, to reduce idle‑timeout‑induced disconnects during long‑running, high‑concurrency benchmarks, without changing the default keep‑alive behavior for normal users.

Test Plan

use 50 max-concurrency and 10000 num-prompts

export VLLM_HTTP_TIMEOUT_KEEP_ALIVE=120
vllm serve /home/models/Qwen/Qwen3-Omni-30B-A3B-Instruct --omni --port 28985 


vllm bench serve     --omni   --dataset-name random   --port 28985   --max-concurrency 50   --model /home/models/Qwen/Qwen3-Omni-30B-A3B-Instruct   --endpoint /v1/chat/completions   --backend openai-chat-omni   --num-prompts 10000   --random-input-len 10 --ignore-eos   --percentile-metrics ttft,tpot,itl,e2el,audio_ttfp,audio_rtf   --random-output-len 10 --extra_body '{"modalities": ["text", "audio"]}'

Test Result

Passed. all 10000 requests successfully replied.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@NumberWan NumberWan marked this pull request as ready for review March 12, 2026 07:59
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1e902af223

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread vllm_omni/benchmarks/patch/patch.py
@NumberWan NumberWan force-pushed the fix_server_disconnected_benchmark branch 2 times, most recently from f33fd41 to 4083e17 Compare March 12, 2026 08:23
@NumberWan NumberWan marked this pull request as draft March 12, 2026 08:30
@NumberWan NumberWan marked this pull request as ready for review March 12, 2026 08:59
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 10997e6453

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread vllm_omni/benchmarks/patch/patch.py
@Gaohan123
Copy link
Copy Markdown
Collaborator

@amy-why-3459 PTAL

@Gaohan123 Gaohan123 added this to the v0.18.0 milestone Mar 12, 2026
@amy-why-3459
Copy link
Copy Markdown
Contributor

LGTM

Copy link
Copy Markdown
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Blocked - Gate Failure

pre-commit: FAILED

Please fix the pre-commit issues before review:

pre-commit run --all-files

Once pre-commit passes, I'll review the ServerDisconnectedError retry logic.

@NumberWan
Copy link
Copy Markdown
Contributor Author

@codex review

@NumberWan NumberWan force-pushed the fix_server_disconnected_benchmark branch 2 times, most recently from 6a0b521 to 2498f04 Compare March 13, 2026 08:54
@NumberWan NumberWan closed this Mar 13, 2026
@NumberWan NumberWan reopened this Mar 13, 2026
@NumberWan NumberWan marked this pull request as draft March 13, 2026 09:02
@NumberWan NumberWan force-pushed the fix_server_disconnected_benchmark branch 3 times, most recently from f6aeb1d to fddc222 Compare March 13, 2026 09:28
@NumberWan NumberWan marked this pull request as ready for review March 13, 2026 09:28
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review


P0 Badge Re-encode benchmark patch module as UTF-8 text

This commit rewrites vllm_omni/benchmarks/patch/patch.py as UTF-16 (starts with BOM 0xFF 0xFE and embedded NUL bytes), which makes the module unloadable by Python (ValueError: source code string cannot contain null bytes when compiling/importing), so any benchmark path importing this patch will fail before execution.



P1 Badge Keep perf test matrix JSON in UTF-8 encoding

tests/perf/tests/test.json is also converted to UTF-16 in this commit, but the perf harness reads it with open(..., encoding="utf-8") in tests/perf/scripts/run_benchmark.py, so loading the config now raises UnicodeDecodeError and prevents the nightly perf suite from starting.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Signed-off-by: NumberWan <wantszkin2003@gmail.com>
@NumberWan NumberWan force-pushed the fix_server_disconnected_benchmark branch from fddc222 to d3a4941 Compare March 13, 2026 10:03
Signed-off-by: NumberWan <wantszkin2003@gmail.com>
@NumberWan NumberWan requested a review from hsliuustc0106 March 17, 2026 06:16
Copy link
Copy Markdown
Collaborator

@congw729 congw729 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Gaohan123
Copy link
Copy Markdown
Collaborator

@amy-why-3459 @R2-Y PTAL

@R2-Y
Copy link
Copy Markdown
Contributor

R2-Y commented Mar 19, 2026

LGTM, thanks

@congw729
Copy link
Copy Markdown
Collaborator

@hsliuustc0106 @Gaohan123 @david6666666 Please add a ready label.

@david6666666 david6666666 added the ready label to trigger buildkite CI label Mar 19, 2026
…chmark test

Signed-off-by: NumberWan <wantszkin2003@gmail.com>
Signed-off-by: NumberWan <wantszkin2003@gmail.com>
Signed-off-by: NumberWan <wantszkin2003@gmail.com>
@NumberWan NumberWan changed the title fix: make omni benchmark resilient to ServerDisconnectedError (reapply #1683, fixes #1374) [Bugfix]: make omni benchmark resilient to ServerDisconnectedError (reapply #1683, fixes #1374) Mar 20, 2026
Signed-off-by: NumberWan <wantszkin2003@gmail.com>
Signed-off-by: NumberWan <wantszkin2003@gmail.com>
@NumberWan NumberWan changed the title [Bugfix]: make omni benchmark resilient to ServerDisconnectedError (reapply #1683, fixes #1374) [Bugfix]: fixed ServerDisconnectedError in benchmark test (reapply #1683, fixes #1374) Mar 20, 2026
@Gaohan123 Gaohan123 enabled auto-merge (squash) March 20, 2026 09:10
Signed-off-by: NumberWan <wantszkin2003@gmail.com>
auto-merge was automatically disabled March 20, 2026 09:32

Head branch was pushed to by a user without write access

Copy link
Copy Markdown
Collaborator

@Gaohan123 Gaohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks

@Gaohan123 Gaohan123 merged commit 7007217 into vllm-project:main Mar 21, 2026
8 checks passed
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: [Benchmark] Server disconnected error

7 participants