[BugFix] Fix Whitelist optimization CI failure#3290
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
Reapply the deploy override field derivation that was reverted in vllm-project#3287 and make prefix-cache behavior explicit in deploy configs. This preserves the config refactor while restoring the previous Omni behavior where deploy stages do not accidentally fall through to vLLM's model-dependent prefix-cache default. Signed-off-by: xiaohajiayou <923390377@qq.com>
230ece7 to
6478b2c
Compare
|
Could you help run the full CI checks to make sure there are no other issues? @lishunyang12 @Gaohan123 @hsliuustc0106 |
Signed-off-by: xiaohajiayou <923390377@qq.com>
8a461ab to
da464af
Compare
2e08140 to
235a452
Compare
|
Updated according to your comments:
Could you please take another look when you have time and let me know if there are any remaining issues? |
Signed-off-by: xiaohajiayou <923390377@qq.com>
d73a938 to
3675d56
Compare
hsliuustc0106
left a comment
There was a problem hiding this comment.
Review Summary
What I validated:
- DCO, pre-commit, mergeability all passing
- Fresh unit tests cover the core config nullification behavior (
test_default_stage_config_ignores_none_deploy_overrides,test_to_omegaconf_omits_none_deploy_overrides_for_engine_args,test_deploy_override_fields_include_deploy_schema_fields) deploy_override_field_names()unification intostage_config.pyis clean and removes the duplicated allowlist fromarg_utils.pyenable_prefix_caching: falseis now explicit on all deploy YAML stages — this is the right fix for the CI regression
What must change before approval:
-
cosyvoice3.yaml:disable_hybrid_kv_cache_manager: truewas silently removed and replaced withenable_prefix_caching: false. These are different settings (hybrid KV cache manager vs prefix caching). If this was intentional, please explain why disabling the hybrid KV cache manager is no longer needed for cosyvoice3. If unintentional, restore it alongside the newenable_prefix_cachingline. -
Removal of
engine_args.setdefault("max_num_seqs", 1)without ensuring all deploy YAMLs setmax_num_seqs. Deploy configs likeqwen3_omni_moe.yaml(all 3 stages) andcosyvoice3.yaml(both stages) don't setmax_num_seqs. With the oldsetdefault, they gotmax_num_seqs=1. Now they will fall through to vLLM EngineArgs default of 256. This could change scheduling behavior and memory allocation. Either restore thesetdefaultor add explicitmax_num_seqsvalues to every stage in every deploy YAML that currently omits it. -
Missing test evidence. The PR body's test plan and test results sections are empty. This is a >10 file change that previously caused CI failures (#3287). Please run L3 tests locally and paste the results.
Non-blocking:
- buildkite CI is still pending — wait for it to complete before merging
Reviewed by Claude Code
|
Previously, the fix added several missing fields in However, during CI fixes, two different issues got mixed together:
To keep this PR focused and easier to reason about, this PR only addresses the first issue (i.e., preserving previous behavior for migrated YAMLs). Based on this, this PR updates the migrated deploy YAMLs to explicitly restore only those defaults that differ between old Omni behavior and vLLM defaults, as summarized below:
The goal here is conservative compatibility: keep migrated deploy YAML behavior aligned with pre-refactor Omni defaults, instead of silently falling through to vLLM defaults. Follow-up todo issue: |
a3117dd to
074b866
Compare
Signed-off-by: xiaohajiayou <923390377@qq.com>
074b866 to
5604953
Compare
| mm_processor_cache_gb: float | None = None | ||
|
|
||
| # Profiling, tokenizer/config parsing, and model-loading behavior. | ||
| profiler_config: dict[str, Any] | None = None |
… own group Move devices and tensor_parallel_size into a dedicated "GPU resources and parallelism" section, leaving stage_id alone as stage identity. Change devices default from "0" to None, and tighten the None check in merge_pipeline_deploy to avoid writing a spurious "devices" key. Signed-off-by: xiaohajiayou <923390377@qq.com>
|
update for ming as well after #3154 merged |
Signed-off-by: xiaohajiayou <923390377@qq.com>
|
It seems most of the discussion is now around whether these defaults should be fully removed from
Based on these, in this PR , I removed the implicit reliance on defaults and explicitly materialized them in the YAMLs. My plan is to continue the discussion in a follow-up issue (#3313). This way, even if we later remove vLLM fields from |
|
Known good commit: 9d9b720. CI was still passing there. After that, this PR only had one real own change: ad83e5f, which is mainly YAML field ordering/schema cleanup and does not touch Qwen3-TTS online serving, runtime helpers, or the assertion logic. The rest of the changes were brought in by updating/merging main. Between
I also checked the CI results for those individual PRs, and they all passed, which makes this a bit strange. |
|
If possible, could you add an omni-test label to check if the changes in this PR have any impact on performance? @gcanlin @lishunyang12 |
Signed-off-by: xiaohajiayou <923390377@qq.com>
31dcc5d to
065034f
Compare
|
I ran this Qwen3-TTS CI test locally and it passes on my side. Here are the logs:
Click to expand full logs |
|
please fix ci |
|
|
@yenuo26 @hsliuustc0106 i think this pr is good to merge. please take another look. |
|
@amy-why-3459 https://buildkite.com/vllm/vllm-omni/builds/8939/canvas?sid=019df8f9-159d-43d0-8b59-530fe5f1dcdc&tab=output, please check this performance. It seems that it doesn't happen regression. |
I will merge it after CI |
LGTM |
Signed-off-by: xiaohajiayou <923390377@qq.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: SYLAR <125541396+lishunyang12@users.noreply.github.com> Co-authored-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com>
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Reapply the deploy override field derivation change that was reverted by #3287, and explicitly restore the previous deploy behavior for prefix caching.
The previous attempt allowed omitted deploy fields to fall through to vLLM defaults. For
enable_prefix_caching, that changed behavior from Omni's previous defaultFalseto vLLM's model-dependent fallback. For decoder/generative stages, vLLM usually resolves this toTrue, which exposed unsupported Omni multi-stage prefix-cache paths and caused L3/L4 CI failures.This PR keeps the config refactor, but makes the old behavior explicit by setting
enable_prefix_caching: falseon all deploy stages.Changes
enable_prefix_caching: falseto every stage invllm_omni/deploy/*.yaml.Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)