[perf]Qwen3-Omni performance optimization#3203
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bd9b3f4ffd
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| extra_cli_args, | ||
| use_omni, | ||
| ) |
There was a problem hiding this comment.
Keep reliability tuple unpacking compatible
Adding use_omni to server_param makes create_unique_server_params emit 6-tuples, but create_reliability_omni_server_params in the same file still unpacks each entry as 5 values. When reliability tests initialize QWEN_PARAMS/WAN_PARAMS, this now raises ValueError: too many values to unpack, so those suites fail before running any test logic.
Useful? React with 👍 / 👎.
|
many ci failed |
| def _warmup_single_request_prefill_compile_path(self) -> None: | ||
| # Warm up with a single long prefill request to avoid first-token | ||
| # compile spikes from wide prefill shapes. | ||
| warmup_tokens = min(int(self.max_num_tokens), 3072) |
There was a problem hiding this comment.
Why 3072? Is it intermediate_size of qwen omni?
There was a problem hiding this comment.
Why 3072? Is it intermediate_size of qwen omni?
Fixed
|
|
||
| def _maybe_prune_downstream_payload_cache(self) -> None: | ||
| # Keep cache size bounded under long-lived serving workloads. | ||
| if len(self._downstream_payload_cache) <= max(4096, len(self.requests) * 2): |
There was a problem hiding this comment.
hardcoded 4096
Fixed
f1acfe7 to
04f92d8
Compare
Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>
Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>
|
retest it now under vllm 0.20.0 |
Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>
Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>


PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
#3164
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)