Skip to content

[Fix] Bridge flat CLI parallel args into DiffusionParallelConfig before YAML stage-config merge#2264

Open
zzhuoxin1508 wants to merge 8 commits intovllm-project:mainfrom
zzhuoxin1508:fix-diffusion-cli-parallel-config
Open

[Fix] Bridge flat CLI parallel args into DiffusionParallelConfig before YAML stage-config merge#2264
zzhuoxin1508 wants to merge 8 commits intovllm-project:mainfrom
zzhuoxin1508:fix-diffusion-cli-parallel-config

Conversation

@zzhuoxin1508
Copy link
Copy Markdown
Contributor

@zzhuoxin1508 zzhuoxin1508 commented Mar 27, 2026

Problem

When launching a diffusion model server via CLI:

vllm serve tencent/HunyuanImage-3.0 --omni \
  --tensor-parallel-size 4 --enable-expert-parallel

kwargs entering the engine are flat top-level keys:

{"tensor_parallel_size": 4, "enable_expert_parallel": True}

However, bundled stage YAMLs for multi-stage diffusion models (e.g. hunyuan_image_3_moe.yaml) store these under a nested
parallel_config block:

parallel_config:
  tensor_parallel_size: 8
  enable_expert_parallel: false

OmegaConf.merge only overrides matching key paths, so a flat enable_expert_parallel can never reach parallel_config.enable_expert_parallel regardless of merge order. Both keys coexist independently after the merge,
and the diffusion worker reads only from parallel_config — so the CLI values are silently ignored. #2076

The offline example scripts avoid this because they explicitly construct DiffusionParallelConfig and pass it as parallel_config before calling Omni(). The CLI/server path was missing this bridging step.

Affects all models with such configs, including HunyuanImage and Bagel.

Fix

Assemble DiffusionParallelConfig from flat kwargs before YAML merge in_resolve_stage_configs.

Essential Elements of an Effective PR Description Checklist - [ ] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)". - [ ] The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the [test style doc](https://docs.vllm.ai/projects/vllm-omni/en/latest/contributing/ci/tests_style/) - [ ] The test results. Please paste the results comparison before and after, or the e2e results. - [ ] (Optional) The necessary documentation update, such as updating `supported_models.md` and `examples` for a new model. **Please run `mkdocs serve` to sync the documentation editions to `./docs`.** - [ ] (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@zzhuoxin1508 zzhuoxin1508 marked this pull request as ready for review March 27, 2026 09:29
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b5348fa63d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread vllm_omni/engine/async_omni_engine.py Outdated
hsdp_replicate_size = kwargs.get("hsdp_replicate_size", 1)
if sequence_parallel_size is None:
sequence_parallel_size = ulysses_degree * ring_degree
kwargs["parallel_config"] = DiffusionParallelConfig(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Apply CLI parallel overrides after YAML merge

Setting kwargs["parallel_config"] here does not actually let flat CLI flags override stage YAML defaults, because load_stage_configs_from_yaml later merges base_engine_args with stage_arg.engine_args in that order (see vllm_omni/entrypoints/utils.py lines 292-295), so the YAML engine_args.parallel_config wins for keys like tensor_parallel_size and enable_expert_parallel. In the default diffusion configs (e.g. hunyuan_image_3_moe.yaml), this means --tensor-parallel-size/--enable-expert-parallel are still ignored despite this new bridge.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue is addressed by #2076.

Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This alone doesn't fix the CLI override problem for models that ship a YAML with parallel_config in engine_args (e.g. hunyuan_image_3_moe.yaml). The YAML stage engine_args still wins in merge_configs(base, stage). Left a couple comments.


stage_configs_path = kwargs.get("stage_configs_path", None)
explicit_stage_configs = kwargs.pop("stage_configs", None)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does a network round-trip (get_hf_file_to_dict) on every call just to guard the parallel-config bridging. _resolve_stage_configs already resolves the model type downstream — can you reuse that or at least cache the result?

Comment thread vllm_omni/engine/async_omni_engine.py Outdated
Comment thread vllm_omni/engine/async_omni_engine.py Outdated
hsdp_replicate_size = kwargs.get("hsdp_replicate_size", 1)
if sequence_parallel_size is None:
sequence_parallel_size = ulysses_degree * ring_degree
kwargs["parallel_config"] = DiffusionParallelConfig(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the bot noted: merge_configs(base_engine_args, stage_arg.engine_args) gives the YAML the last word, so this parallel_config is overwritten for any model that ships a stage YAML with parallel_config (all current HunyuanImage/Bagel configs). Without #2076 merged first, this PR doesn't actually fix the CLI path. Please either land #2076 first or reverse the merge order here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this PR is based on #2076. I'll rebase once it's merged.

Signed-off-by: zhou zhuoxin <zhouzhuoxin1508@outlook.com>
@xiaohajiayou
Copy link
Copy Markdown
Contributor

This looks incorrect. kwargs["parallel_config"] is synthesized only after load_and_resolve_stage_configs() has already run, but at that point kwargs has already been consumed.

load_and_resolve_stage_configs() uses kwargs to perform YAML merge / default factory construction and returns fully materialized stage_configs. After that, diffusion initialization proceeds from stage_cfg.engine_args -> OmniDiffusionConfig.from_kwargs(...); it does not go back and re-read the original kwargs. So this post-resolution assignment:

kwargs["parallel_config"] = DiffusionParallelConfig(...)

does not affect the resolved diffusion stage config for the current initialization.

@zzhuoxin1508 zzhuoxin1508 marked this pull request as draft April 14, 2026 08:19
Signed-off-by: zhou zhuoxin <zhouzhuoxin1508@outlook.com>
@zzhuoxin1508 zzhuoxin1508 force-pushed the fix-diffusion-cli-parallel-config branch from c9f10f3 to bb64b0c Compare April 14, 2026 11:38
@zzhuoxin1508 zzhuoxin1508 marked this pull request as ready for review April 14, 2026 12:50
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@zzhuoxin1508
Copy link
Copy Markdown
Contributor Author

This looks incorrect. kwargs["parallel_config"] is synthesized only after load_and_resolve_stage_configs() has already run, but at that point kwargs has already been consumed.

This has been addressed. The code now writes directly to cfg.engine_args.parallel_config on each resolved diffusion stage , instead of mutating kwargs after consumption. Could you take another look? Thanks! @xiaohajiayou

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants