Skip to content

[perf]Qwen3-Omni performance optimization#3203

Merged
hsliuustc0106 merged 7 commits into
vllm-project:mainfrom
amy-why-3459:perf
May 1, 2026
Merged

[perf]Qwen3-Omni performance optimization#3203
hsliuustc0106 merged 7 commits into
vllm-project:mainfrom
amy-why-3459:perf

Conversation

@amy-why-3459
Copy link
Copy Markdown
Contributor

@amy-why-3459 amy-why-3459 commented Apr 28, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

#3164

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@hsliuustc0106 hsliuustc0106 added ready label to trigger buildkite CI merge-test label to trigger buildkite merge test CI labels Apr 28, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bd9b3f4ffd

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread tests/dfx/conftest.py
Comment on lines +114 to +116
extra_cli_args,
use_omni,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep reliability tuple unpacking compatible

Adding use_omni to server_param makes create_unique_server_params emit 6-tuples, but create_reliability_omni_server_params in the same file still unpacks each entry as 5 values. When reliability tests initialize QWEN_PARAMS/WAN_PARAMS, this now raises ValueError: too many values to unpack, so those suites fail before running any test logic.

Useful? React with 👍 / 👎.

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

many ci failed

Comment thread vllm_omni/worker/gpu_ar_model_runner.py Outdated
Copy link
Copy Markdown
Collaborator

@gcanlin gcanlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@amy-why-3459
Copy link
Copy Markdown
Contributor Author

In the main branch, this test case also appears to fail.
image

Comment thread vllm_omni/worker/gpu_ar_model_runner.py Outdated
def _warmup_single_request_prefill_compile_path(self) -> None:
# Warm up with a single long prefill request to avoid first-token
# compile spikes from wide prefill shapes.
warmup_tokens = min(int(self.max_num_tokens), 3072)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 3072? Is it intermediate_size of qwen omni?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 3072? Is it intermediate_size of qwen omni?

Fixed

Comment thread vllm_omni/worker/gpu_ar_model_runner.py Outdated

def _maybe_prune_downstream_payload_cache(self) -> None:
# Keep cache size bounded under long-lived serving workloads.
if len(self._downstream_payload_cache) <= max(4096, len(self.requests) * 2):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hardcoded 4096

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hardcoded 4096

Fixed

@amy-why-3459
Copy link
Copy Markdown
Contributor Author

amy-why-3459 commented Apr 28, 2026

test_bagel

@amy-why-3459 amy-why-3459 force-pushed the perf branch 3 times, most recently from f1acfe7 to 04f92d8 Compare April 28, 2026 11:26
@hsliuustc0106 hsliuustc0106 removed ready label to trigger buildkite CI merge-test label to trigger buildkite merge test CI labels Apr 29, 2026
@Gaohan123 Gaohan123 added this to the v0.20.0 milestone Apr 29, 2026
Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>
Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>
@hsliuustc0106 hsliuustc0106 added ready label to trigger buildkite CI omni-test label to trigger buildkite omni model test in nightly CI labels Apr 30, 2026
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

retest it now under vllm 0.20.0

@hsliuustc0106 hsliuustc0106 removed the omni-test label to trigger buildkite omni model test in nightly CI label Apr 30, 2026
@Gaohan123 Gaohan123 added the high priority high priority issue, needs to be done asap label Apr 30, 2026
@hsliuustc0106 hsliuustc0106 added omni-test label to trigger buildkite omni model test in nightly CI labels Apr 30, 2026
@hsliuustc0106 hsliuustc0106 merged commit 01f500a into vllm-project:main May 1, 2026
7 of 8 checks passed
lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026
Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
sphinxkkkbc pushed a commit to sphinxkkkbc/vllm-omni that referenced this pull request May 4, 2026
Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

high priority high priority issue, needs to be done asap omni-test label to trigger buildkite omni model test in nightly CI ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants