[BugFix] Normalize diffusion parallel CLI overrides into stage parallel_config#3478
[BugFix] Normalize diffusion parallel CLI overrides into stage parallel_config#3478xiaohajiayou wants to merge 6 commits into
Conversation
Co-authored-by: zzhuoxin1508 <234137171+zzhuoxin1508@users.noreply.github.com> Signed-off-by: xiaohajiayou <923390377@qq.com>
3ecb6ed to
c6da68a
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3ecb6edb3d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
||
| parallel_fields = frozenset(f.name for f in fields(DiffusionParallelConfig)) | ||
| parallel_config = engine_args.get("parallel_config") | ||
| parallel_config_dict = to_dict(parallel_config) if parallel_config is not None else None |
There was a problem hiding this comment.
Preserve plain dict parallel_config values
When a diffusion stage already has engine_args.parallel_config as a normal dict (the common result of parsed YAML/registry deploy config, and also what the new tests construct), this unconditional to_dict() call goes through the OmegaConf wrapper, which only accepts OmegaConf containers and raises for plain Python dicts. That means any diffusion deploy YAML that includes a nested parallel_config now fails during StageConfig.to_omegaconf() before overrides can be applied; handle mappings/dataclass instances directly or only call to_dict() for OmegaConf objects.
Useful? React with 👍 / 👎.
| value = runtime_overrides.get(key) | ||
| if value is None or key not in parallel_fields: | ||
| continue |
There was a problem hiding this comment.
Do not treat parser defaults as diffusion overrides
This moves every non-None DiffusionParallelConfig field from runtime_overrides into the nested config, but the CLI nullifier only covers deploy_override_field_names() and does not include several diffusion-only fields; for example vllm_omni/entrypoints/cli/serve.py still defaults --cfg-parallel-size and --vae-patch-parallel-size to 1 and --ulysses-mode to "strict". Starting serve without typing those flags will therefore overwrite any deploy YAML parallel_config values for those fields with parser defaults, reintroducing the deploy-precedence bug this change is trying to avoid.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch. This was a real issue.
I previously did not account for the omni deploy-YAML path where one of the stages can be a diffusion stage, so these diffusion-only parallel fields also need to be included in the deploy override whitelist.
This is now addressed by adding the diffusion parallel_config override fields to StageDeployConfig, so they are included in deploy_override_field_names() and go through the existing nullifier flow. As a result, untyped parser defaults for these diffusion-only fields no longer override deploy YAML values, while explicit CLI values still do.
Also added/updated tests around the deploy-override field set and nullifier behavior.
Signed-off-by: xiaohajiayou <923390377@qq.com>
Signed-off-by: xiaohajiayou <923390377@qq.com>
Signed-off-by: dengyunyang <584797741@qq.com>
Signed-off-by: xiaohajiayou <923390377@qq.com>
3d15c1d to
5685577
Compare
|
This pr is closed and now handled by #3483. |
Purpose
Diffusion stages in omni multi-stage configs consume parallel settings from
engine_args.parallel_config, but CLI overrides were being applied as flat top-levelengine_argsfields.This could produce a resolved stage config where:
parallel_configstill kept the old valuesThis change fixes that at stage-config materialization time:
DiffusionParallelConfigare normalized intoengine_args.parallel_configTest Plan
parallel_config, verifying that CLI overrides replace the nested valuesparallel_config, verifying that the nested config is created from CLI overridesparallel_configTest Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)