[Config] Remove invalid LLM-only engine_args from diffusion stage configs by ianliuy · Pull Request #2622 · vllm-project/vllm-omni

ianliuy · 2026-04-09T05:28:16Z

Purpose

Fix for: #2563

Remove dead engine_args fields from diffusion stage configs (stage_type: diffusion). These fields were copy-pasted from LLM stage configs and are silently dropped by OmniDiffusionConfig.from_kwargs().

Note: Generation/AR stages (worker_type: generation, worker_type: ar) use OmniEngineArgs where these fields are actively consumed they are intentionally left unchanged.

Changes

Diffusion config cleanup (11 YAMLs, 51 lines)

Main configs (9 files in vllm_omni/model_executor/stage_configs/):

File	Fields removed
`bagel.yaml`	`gpu_memory_utilization`, `engine_output_type`, `enable_prefix_caching`, `max_num_batched_tokens`, `tensor_parallel_size`
`bagel_multiconnector.yaml`	same 5
`bagel_think.yaml`	same 5
`bagel_single_stage.yaml`	same 5
`bagel_usp2.yaml`	same 5
`hunyuan_image_3_moe.yaml`	4 (no top-level `tensor_parallel_size`)
`hunyuan_image3_moe_dit.yaml`	4
`hunyuan_image3_moe_dit_2gpu_fp8.yaml`	4
`omnivoice.yaml`	2 (`gpu_memory_utilization`, `engine_output_type`)

Test configs (2 files in tests/e2e/offline_inference/stage_configs/):

File	Fields removed
`bagel_mooncake_ci.yaml`	same 5 + `load_format` (`OmniDiffusionConfig` uses `diffusion_load_format`)
`bagel_sharedmemory_ci.yaml`	same 5 + `load_format`

Regression test (new file)

tests/test_diffusion_config_fields.py scans all YAML configs (main + test dirs) and asserts diffusion stage engine_args only contain valid OmniDiffusionConfig fields.

Allowlisted fields consumed outside the dataclass:

model_stage stage init layer
model_arch diffusion model class resolution
quantization mapped to quantization_config via backwards-compat in from_kwargs()

Why these fields are safe to remove

OmniDiffusionConfig.from_kwargs() explicitly filters unknown fields:

\\python
valid_fields = {f.name for f in fields(cls)}
filtered_kwargs = {k: v for k, v in kwargs.items() if k in valid_fields}
return cls(**filtered_kwargs)
\\

No behavioral change all removed fields were already silently dropped at runtime.

Test Plan

Regression test passes
Existing CI tests pass (no behavioral change)

chatgpt-codex-connector · 2026-04-09T05:28:27Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

hsliuustc0106 · 2026-04-09T22:40:15Z

let me help run the ci tests

hsliuustc0106 · 2026-04-09T22:41:52Z

I think you also need to make changes to the yamls under tests

ianliuy · 2026-04-10T01:56:01Z

Thanks for running the CI and the feedback @hsliuustc0106!

Updated here's what changed:

1. Fixed the CI failure (regression test)

The test was flagging model_arch and quantization as invalid. Both are actually consumed:

quantization mapped to quantization_config via backwards-compat in from_kwargs() (data.py L679-682)
model_arch consumed by the stage init layer for model class resolution

Added both to the test's allowlist.

2. Cleaned test YAMLs (14 files, 65 lines)

Removed the same dead fields from diffusion/generation stages in:

Directory	Files
`tests/e2e/stage_configs/`	`dynin_omni_ci`, `mimo_audio_ci`, `qwen2_5_omni_ci`, `qwen3_omni_ci`
`tests/e2e/stage_configs/rocm/`	`qwen2_5_omni_ci`, `qwen3_omni_ci`
`tests/e2e/stage_configs/xpu/`	`qwen2_5_omni_ci`, `qwen3_omni_ci`
`tests/e2e/offline_inference/stage_configs/`	`bagel_mooncake_ci`, `bagel_sharedmemory_ci`
`tests/e2e/offline_inference/stage_configs/npu/`	`qwen2_5_omni_ci`
`tests/dfx/perf/stage_configs/`	`qwen3_omni`, `qwen3_tts`
`tests/dfx/stability/stage_configs/`	`qwen3_omni`

3. Rebased on latest main

Total: 24 files changed, +56 104.

hsliuustc0106 · 2026-04-10T02:23:28Z

please check whether the ci failure is related to this PR

Remove fields not part of OmniDiffusionConfig from diffusion stages: - gpu_memory_utilization (9 files) - enable_prefix_caching (8 files) - engine_output_type (9 files) - max_num_batched_tokens (8 files) - tensor_parallel_size at top-level (5 bagel files) These fields were copy-pasted from LLM stage configs and silently dropped by OmniDiffusionConfig.from_kwargs(). Removing them for clarity. Also adds a regression test to prevent future copy-paste of invalid fields into diffusion stage configs. Fixes vllm-project#2563 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Yiyang Liu <yiyangliu@microsoft.com>

ianliuy · 2026-04-10T02:53:56Z

Pushed another update narrowed the scope after investigating more carefully:

Only diffusion-stage configs are cleaned. The test YAMLs with worker_type: generation (qwen3, qwen2.5, dynin, mimo, etc.) use OmniEngineArgs where these fields are actively consumed, so those are left unchanged.

Changes in this push:

Removed dead fields from 2 diffusion test YAMLs (bagel_mooncake_ci, bagel_sharedmemory_ci)
Also removed load_format: dummy from their diffusion stages (also dead OmniDiffusionConfig uses diffusion_load_format instead)
Extended the regression test to scan tests/**/*.yaml recursively, still only flagging stage_type: diffusion stages

Total: 12 files, +68 51.

princepride · 2026-04-10T02:48:14Z

      distributed_executor_backend: mp
-      enable_prefix_caching: false
-      max_num_batched_tokens: 32768
-      tensor_parallel_size: 1


Why we remove tensor_parallel_size? Seems someone just edit tensor_parallel_size place, right?

Good observation! Yes tensor_parallel_size was effectively "moved" to parallel_config.tensor_parallel_size when DiffusionParallelConfig was introduced in PR #189. Since then, OmniDiffusionConfig no longer has a top-level tensor_parallel_size field, so from_kwargs() silently drops it (data.py L692-694). The value here is also 1, which matches the DiffusionParallelConfig default. Issue #2635 also tracks this inconsistency.

princepride · 2026-04-10T02:48:24Z

      distributed_executor_backend: "mp"
-      enable_prefix_caching: false
-      max_num_batched_tokens: 32768
-      tensor_parallel_size: 1


Same reasoning as above.

princepride · 2026-04-10T02:53:41Z

If future users add new special fields, how should we maintain this UT, considering these new fields might only apply to a specific model?

Good question. Two cases:

If the new field is added to the OmniDiffusionConfig dataclass, the test picks it up automatically via fields(OmniDiffusionConfig) no changes needed.

If it's consumed outside the dataclass (like model_stage by the stage init layer, or quantization via from_kwargs() backwards-compat), add it to the allowlist in this test.

I'll add a comment in the test to document this but before I do, I'd love to hear your thoughts on whether this approach works for you.

princepride · 2026-04-10T02:54:25Z

-      gpu_memory_utilization: 0.5
      enforce_eager: true
      trust_remote_code: true
-      engine_output_type: audio


I think we should maintain it, correct me if I am wrong

These are safe to remove this is a stage_type: diffusion stage, and OmniDiffusionConfig has neither field. from_kwargs() silently drops them (data.py L692-694). For engine_output_type specifically, extract_stage_metadata() also hardcodes it to None for all diffusion stages (stage_init_utils.py L171). Audio routing is handled by the stage-level final_output_type: audio (which is preserved) and the SupportAudioOutput interface on OmniVoicePipeline.

lishunyang12

LGTM

lishunyang12

LGTM, verified the dead-field claims against data.py / stage_init_utils.py. cc @princepride for re-review.

lishunyang12 · 2026-04-10T13:10:04Z

+    # model_arch is consumed by the stage init layer for diffusion model class resolution
+    valid_fields.add("model_arch")
+    # "quantization" is mapped to "quantization_config" by from_kwargs() backwards-compat
+    valid_fields.add("quantization")


nit: worth a short block comment above the allowlist explaining the maintenance policy — entries here are fields consumed outside OmniDiffusionConfig (e.g. by extract_stage_metadata or the backwards-compat path in from_kwargs). Saves the next maintainer a git blame.

…figs (vllm-project#2622) Signed-off-by: Yiyang Liu <yiyangliu@microsoft.com> Co-authored-by: Yiyang Liu <yiyangliu@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

ianliuy requested a review from hsliuustc0106 as a code owner April 9, 2026 05:28

ianliuy force-pushed the fix/cleanup-diffusion-stage-configs branch 2 times, most recently from de91632 to f567554 Compare April 9, 2026 05:34

hsliuustc0106 requested review from lishunyang12 and princepride April 9, 2026 06:06

hsliuustc0106 added the ready label to trigger buildkite CI label Apr 9, 2026

ianliuy force-pushed the fix/cleanup-diffusion-stage-configs branch from f567554 to d9ad8a7 Compare April 10, 2026 01:53

ianliuy force-pushed the fix/cleanup-diffusion-stage-configs branch from d9ad8a7 to 6470bf4 Compare April 10, 2026 02:43

ianliuy force-pushed the fix/cleanup-diffusion-stage-configs branch from 6470bf4 to be25a9c Compare April 10, 2026 02:51

princepride requested changes Apr 10, 2026

View reviewed changes

lishunyang12 approved these changes Apr 10, 2026

View reviewed changes

hsliuustc0106 merged commit 687405c into vllm-project:main Apr 10, 2026
8 checks passed

lishunyang12 mentioned this pull request Apr 10, 2026

[Config Refactor][2/N] Pipeline + Deploy Config Schema #2383

Merged

Conversation

ianliuy commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Changes

Diffusion config cleanup (11 YAMLs, 51 lines)

Regression test (new file)

Why these fields are safe to remove

Test Plan

Uh oh!

chatgpt-codex-connector Bot commented Apr 9, 2026

Uh oh!

hsliuustc0106 commented Apr 9, 2026

Uh oh!

hsliuustc0106 commented Apr 9, 2026

Uh oh!

ianliuy commented Apr 10, 2026

Uh oh!

hsliuustc0106 commented Apr 10, 2026

Uh oh!

ianliuy commented Apr 10, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ianliuy commented Apr 9, 2026 •

edited

Loading