[Bugfix] Fix default diffusion stage config generator drops runtime engine args by xiaohajiayou · Pull Request #2559 · vllm-project/vllm-omni

xiaohajiayou · 2026-04-07T14:28:40Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

As #2539 #2544, and discussed in #2076, when loading a model without a default stage config and no stage config YAML is explicitly provided via CLI arguments, the current AsyncOmniEngine constructs self.stage_configs through self._create_default_diffusion_stage_cfg. The resulting config is then passed into:

od_config = OmniDiffusionConfig.from_kwargs(
    model=model,
    **_to_dict(stage_cfg.engine_args),
)

which is later consumed by the diffusion components, such as StageDiffusionClient.

However, the current implementation of self._create_default_diffusion_stage_cfg does not fully propagate CLI arguments into the constructed config.
The affected arguments include:

distributed_executor_backend

vllm-omni/vllm_omni/diffusion/executor/abstract.py

Lines 22 to 24 in 408365f

    
           def get_class(od_config: OmniDiffusionConfig) -> type[DiffusionExecutor]: 
        
               executor_class: type[DiffusionExecutor] 
        
               distributed_executor_backend = od_config.distributed_executor_backend

boundary_ratio

vllm-omni/vllm_omni/diffusion/models/wan2_2/pipeline_wan2_2.py

Line 236 in 408365f

self.boundary_ratio = od_config.boundary_ratio
flow_shift

vllm-omni/vllm_omni/diffusion/models/wan2_2/pipeline_wan2_2.py

Line 300 in 408365f

flow_shift = od_config.flow_shift if od_config.flow_shift is not None else 5.0 # default for 720p

trust_remote_code

vllm-omni/vllm_omni/config/stage_config.py

Lines 270 to 271 in 408365f

    
           trust_remote_code = cli_overrides.get("trust_remote_code", True) 
        
           pipeline = cls._load_pipeline(model, trust_remote_code=trust_remote_code)

num_gpus

vllm-omni/vllm_omni/diffusion/executor/multiproc_executor.py

Lines 67 to 70 in 408365f

    
           num_workers = self.od_config.num_gpus 
        
           self._broadcast_mq = self._init_broadcast_queue(num_workers) 
        
           broadcast_handle = self._broadcast_mq.export_handle()

Although these fields have default values defined in OmniDiffusionConfig, the CLI-provided values are not injected into od_config, resulting in the user-specified arguments being ignored and default values always being used.

Test Plan

python -m pytest  tests/entrypoints/test_async_omni_diffusion_config.py

Test Result

(vllm-omni) root@autodl-container-2201459dc8-f8f44d5f:~/vllm-omni# python -m pytest  tests/entrypoints/test_async_omni_diffusion_config.py
============================================================================ test session starts ==========================================================================================================================
platform linux -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0
rootdir: /root/vllm-omni
configfile: pyproject.toml
plugins: anyio-4.13.0, asyncio-1.3.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 6 items                                                                                                                                                                                                                                                       

tests/entrypoints/test_async_omni_diffusion_config.py ......                                                                                                                                                                                                      [100%]

===================================================================== 6 passed, 19 warnings in 0.70s =====================================================================================================================

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

chatgpt-codex-connector · 2026-04-07T14:28:48Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

Signed-off-by: xiaohajiayou <923390377@qq.com>

lishunyang12

Review: [Bugfix] Fix default diffusion stage config generator drops runtime engine args

Thanks for the PR and the clear description of the problem. The fix for trust_remote_code is correct and necessary -- it was genuinely missing from the dict literal. However, I have concerns about the other four fields.

Issues

1. boundary_ratio and flow_shift are already propagated (redundant overwrites)

Lines 1216-1217 of the existing code already include these in the stage_engine_args dict:

"boundary_ratio": kwargs.get("boundary_ratio", None),
"flow_shift": kwargs.get("flow_shift", None),

The conditional blocks added after the dict literal will overwrite these keys with the exact same values. This is dead code. If the intent was to avoid setting them when they are None, note that OmniDiffusionConfig.from_kwargs already filters kwargs to valid dataclass fields and constructs the config -- having None values for optional fields is the normal path and matches the dataclass defaults.

Please either:

Remove lines 1216-1217 from the dict literal and keep only the conditional blocks (if you want to avoid passing None explicitly), or
Remove the conditional blocks for these two fields (since they are already handled).

2. num_gpus is overwritten downstream and the addition has no effect

In stage_init_utils.py line 536:

od_config.num_gpus = num_devices_per_stage

This unconditionally overwrites num_gpus after OmniDiffusionConfig.from_kwargs(), deriving it from parallel_config.world_size. So even if you inject num_gpus into stage_engine_args, it will be overridden. Adding it here gives a false sense that the CLI value is being respected, when in reality it is not. This should either be removed, or the downstream override should be fixed if the intent is to let users control num_gpus directly.

3. distributed_executor_backend -- the conditional style is inconsistent

distributed_executor_backend is a legitimate fix: it was missing from the dict literal. But the conditional if key in kwargs and kwargs[key] is not None pattern is inconsistent with how every other field is handled in this function (using kwargs.get(key, default) inside the dict literal). Using kwargs.get("distributed_executor_backend", "mp") inline would be simpler and consistent -- the dataclass default is "mp", so that aligns.

Suggestion

A cleaner approach would be to add trust_remote_code and distributed_executor_backend directly in the dict literal (like all the other fields), remove the redundant conditional blocks for boundary_ratio/flow_shift, and drop num_gpus since it is overridden downstream. Something like:

stage_engine_args = {
    ...
    "trust_remote_code": kwargs.get("trust_remote_code", False),
    "distributed_executor_backend": kwargs.get("distributed_executor_backend", "mp"),
    ...
}

No conditional blocks needed.

Test

The test is well-written and covers the right fields. It will need minor adjustment once the redundant parts are removed.

Overall this is a real bug fix for trust_remote_code and distributed_executor_backend, but needs cleanup to avoid redundancy and misleading num_gpus propagation.

xiaohajiayou requested a review from hsliuustc0106 as a code owner April 7, 2026 14:28

[Bugfix] Propagate diffusion fallback engine args

da80d5d

Signed-off-by: xiaohajiayou <923390377@qq.com>

xiaohajiayou force-pushed the fix/diffusion-default-factory-engine-args branch from 942ecd7 to da80d5d Compare April 7, 2026 14:34

xiaohajiayou changed the title ~~[Bugfix] Propagate diffusion fallback engine args~~ [Bugfix] Fix default diffusion stage config generator drops runtime engine args Apr 7, 2026

This was referenced Apr 7, 2026

[Bugfix] Fix dropped diffusion executor backend in default config #2539 #2544

Open

[Bug]: Diffusion stage configs contain non-OmniDiffusionConfig engine_args fields #2563

Closed

lishunyang12 reviewed Apr 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Fix default diffusion stage config generator drops runtime engine args#2559

[Bugfix] Fix default diffusion stage config generator drops runtime engine args#2559
xiaohajiayou wants to merge 1 commit intovllm-project:mainfrom
xiaohajiayou:fix/diffusion-default-factory-engine-args

xiaohajiayou commented Apr 7, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot commented Apr 7, 2026

Uh oh!

lishunyang12 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	def get_class(od_config: OmniDiffusionConfig) -> type[DiffusionExecutor]:
	executor_class: type[DiffusionExecutor]
	distributed_executor_backend = od_config.distributed_executor_backend

	trust_remote_code = cli_overrides.get("trust_remote_code", True)
	pipeline = cls._load_pipeline(model, trust_remote_code=trust_remote_code)

	num_workers = self.od_config.num_gpus
	self._broadcast_mq = self._init_broadcast_queue(num_workers)
	broadcast_handle = self._broadcast_mq.export_handle()

Conversation

xiaohajiayou commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot commented Apr 7, 2026

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Review: [Bugfix] Fix default diffusion stage config generator drops runtime engine args

Issues

Suggestion

Test

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xiaohajiayou commented Apr 7, 2026 •

edited

Loading