Skip to content

[BugFix] Normalize diffusion parallel CLI overrides into stage parallel_config#3478

Closed
xiaohajiayou wants to merge 6 commits into
vllm-project:mainfrom
xiaohajiayou:fix/diffusion-parallel-overrides
Closed

[BugFix] Normalize diffusion parallel CLI overrides into stage parallel_config#3478
xiaohajiayou wants to merge 6 commits into
vllm-project:mainfrom
xiaohajiayou:fix/diffusion-parallel-overrides

Conversation

@xiaohajiayou
Copy link
Copy Markdown
Contributor

@xiaohajiayou xiaohajiayou commented May 10, 2026

Purpose

Diffusion stages in omni multi-stage configs consume parallel settings from engine_args.parallel_config, but CLI overrides were being applied as flat top-level engine_args fields.

This could produce a resolved stage config where:

  • diffusion parallel_config still kept the old values
  • CLI overrides appeared only at the top level
  • LLM and diffusion stages did not handle the same override fields consistently

This change fixes that at stage-config materialization time:

  • for diffusion stages, fields defined by DiffusionParallelConfig are normalized into engine_args.parallel_config
  • those fields are no longer duplicated at the top level for diffusion stages
  • for LLM stages, the existing top-level override behavior is preserved

Test Plan

  • Add a unit test covering diffusion stages with an existing parallel_config, verifying that CLI overrides replace the nested values
  • Add a unit test covering diffusion stages without an existing parallel_config, verifying that the nested config is created from CLI overrides
  • Add a unit test covering LLM stages, verifying that shared parallel fields remain top-level and do not create parallel_config
  • cmd
pytest -q tests/test_config_factory.py -k "diffusion_parallel_overrides or llm_parallel_overrides or deploy_override_fields_include_deploy_schema_fields"
pytest -q tests/test_arg_utils.py -k "nullify_stage_engine_defaults_resets_inherited_defaults or non_override_flags_keep_real_defaults_after_nullify"

Test Result

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
--- Running Summary
4 passed, 104 deselected, 18 warnings in 1.57s



-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
--- Running Summary
2 passed, 22 deselected, 18 warnings in 1.63s
(vllm-omni) root@autodl-container-wpn6tqcvq6-04abd7f3:~/vllm-omni# 

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Co-authored-by: zzhuoxin1508 <234137171+zzhuoxin1508@users.noreply.github.com>
Signed-off-by: xiaohajiayou <923390377@qq.com>
@xiaohajiayou xiaohajiayou force-pushed the fix/diffusion-parallel-overrides branch from 3ecb6ed to c6da68a Compare May 10, 2026 13:33
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3ecb6edb3d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread vllm_omni/config/stage_config.py Outdated

parallel_fields = frozenset(f.name for f in fields(DiffusionParallelConfig))
parallel_config = engine_args.get("parallel_config")
parallel_config_dict = to_dict(parallel_config) if parallel_config is not None else None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve plain dict parallel_config values

When a diffusion stage already has engine_args.parallel_config as a normal dict (the common result of parsed YAML/registry deploy config, and also what the new tests construct), this unconditional to_dict() call goes through the OmegaConf wrapper, which only accepts OmegaConf containers and raises for plain Python dicts. That means any diffusion deploy YAML that includes a nested parallel_config now fails during StageConfig.to_omegaconf() before overrides can be applied; handle mappings/dataclass instances directly or only call to_dict() for OmegaConf objects.

Useful? React with 👍 / 👎.

Comment on lines +136 to +138
value = runtime_overrides.get(key)
if value is None or key not in parallel_fields:
continue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Do not treat parser defaults as diffusion overrides

This moves every non-None DiffusionParallelConfig field from runtime_overrides into the nested config, but the CLI nullifier only covers deploy_override_field_names() and does not include several diffusion-only fields; for example vllm_omni/entrypoints/cli/serve.py still defaults --cfg-parallel-size and --vae-patch-parallel-size to 1 and --ulysses-mode to "strict". Starting serve without typing those flags will therefore overwrite any deploy YAML parallel_config values for those fields with parser defaults, reintroducing the deploy-precedence bug this change is trying to avoid.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. This was a real issue.

I previously did not account for the omni deploy-YAML path where one of the stages can be a diffusion stage, so these diffusion-only parallel fields also need to be included in the deploy override whitelist.

This is now addressed by adding the diffusion parallel_config override fields to StageDeployConfig, so they are included in deploy_override_field_names() and go through the existing nullifier flow. As a result, untyped parser defaults for these diffusion-only fields no longer override deploy YAML values, while explicit CLI values still do.

Also added/updated tests around the deploy-override field set and nullifier behavior.

Signed-off-by: xiaohajiayou <923390377@qq.com>
Signed-off-by: xiaohajiayou <923390377@qq.com>
@xiaohajiayou
Copy link
Copy Markdown
Contributor Author

xiaohajiayou commented May 10, 2026

This pr is closed and now handled by #3483.

@xiaohajiayou xiaohajiayou deleted the fix/diffusion-parallel-overrides branch May 11, 2026 01:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants