Skip to content

[Config Refactor] Introduce unified VllmOmniConfig and consolidate OmniEngineArgs#3672

Open
wuhang2014 wants to merge 13 commits into
vllm-project:mainfrom
wuhang2014:diffconf-1
Open

[Config Refactor] Introduce unified VllmOmniConfig and consolidate OmniEngineArgs#3672
wuhang2014 wants to merge 13 commits into
vllm-project:mainfrom
wuhang2014:diffconf-1

Conversation

@wuhang2014
Copy link
Copy Markdown
Contributor

@wuhang2014 wuhang2014 commented May 17, 2026

Purpose

Introduce a unified VllmOmniConfig frozen dataclass as the single source of truth for all runtime configuration (pipeline topology, per-stage VllmConfig/OmniDiffusionConfig, orchestrator settings, connectors, PD config). Consolidate OmniEngineArgs following vLLM's EngineArgsVllmConfig pattern to eliminate scattered kwargs, split_kwargs partitioning (OrchestratorArgs + SHARED_FIELDS), and the two-phase config injection loop.

Six redundancies addressed:

  • R1: duplicate build_engine_args_dict in _initialize_llm_replica (now reads TP from pre-built VllmConfig.parallel_config)
  • R2: duplicate extract_stage_metadata per replica (new path reads from StageResolvedConfig)
  • R3: two-phase config injection (~90 lines) absorbed into build_vllm_omni_config
  • R4: two default-diffusion builders unified
  • R5: OmegaConf roundtrip eliminated in new path
  • R6: triple dict filtering → single pass in factory

Test Plan

# Unit tests (CPU, local)
pytest tests/test_arg_utils.py tests/entrypoints/test_async_omni_diffusion_config.py -v

# On remote GPU host:
ssh -p 2222 root@<host> "source /workspace/.venv/bin/activate && cd /workspace/vllm-omni && pytest tests/test_arg_utils.py tests/entrypoints/test_async_omni_diffusion_config.py -v"

Test Result

27 passed, 1 skipped in 5.21s

(skipped: test_create_omni_config_returns_vllm_omni_config — requires local model path via VLLM_OMNI_TEST_MODEL)


Essential Elements Checklist
  • Purpose: introduce unified VllmOmniConfig, consolidate OmniEngineArgs, eliminate config redundancies
  • Test plan: pytest tests/test_arg_utils.py tests/entrypoints/test_async_omni_diffusion_config.py
  • Test results: 27 passed, 1 skipped
  • Documentation: no API surface changes (internal refactor); OmniEngineArgs.add_cli_args(omni_args_only=True) is the only new public interface
  • Release notes: not user-facing at this stage; subsequent PRs will wire the new path fully

Copilot AI review requested due to automatic review settings May 17, 2026 16:02
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors vLLM-Omni runtime configuration around a unified VllmOmniConfig and consolidated OmniEngineArgs, replacing scattered config construction with a centralized factory path.

Changes:

  • Adds VllmOmniConfig / StageResolvedConfig and routes engine/orchestrator initialization toward resolved stage configs.
  • Consolidates CLI/offline config fields into OmniEngineArgs and delegates Omni CLI flag registration to it.
  • Updates stage initialization, device locking, PD detection, and tests for the new config path.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
vllm_omni/entrypoints/omni_base.py Builds OmniEngineArgs/VllmOmniConfig before constructing AsyncOmniEngine.
vllm_omni/entrypoints/cli/serve.py Delegates Omni CLI flags to OmniEngineArgs and starts unified config creation.
vllm_omni/engine/stage_init_utils.py Allows device locking from resolved config objects.
vllm_omni/engine/orchestrator.py Accepts VllmOmniConfig and derives async chunk / PD config from it.
vllm_omni/engine/async_omni_engine.py Adds unified config path for stage planning and initialization.
vllm_omni/engine/arg_utils.py Consolidates Omni args and CLI registration; removes old split/routing helpers.
vllm_omni/config/vllm_omni_config.py Introduces resolved runtime config dataclasses and factory.
vllm_omni/config/stage_config.py Moves internal key filtering into stage config utilities.
vllm_omni/config/__init__.py Exports new config dataclasses.
tests/test_config_factory.py Updates internal-key tests for the new filtering source.
tests/test_arg_utils.py Replaces old split/routing tests with unified args tests.
tests/entrypoints/test_async_omni_diffusion_config.py Adapts default diffusion config helper usage in tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread vllm_omni/config/vllm_omni_config.py Outdated
)

# Resolve async_chunk: CLI override wins over deploy YAML
async_chunk: bool = kwargs.get("async_chunk", False)
Comment thread vllm_omni/config/stage_config.py Outdated
Comment on lines +82 to +87
# ── Diffusion model flags (orchestrator-level, not per-stage) ──
"num_gpus",
"model_class_name",
"diffusion_load_format",
"diffusers_load_kwargs",
"diffusers_call_kwargs",
Comment thread vllm_omni/engine/async_omni_engine.py Outdated
Comment on lines +550 to +558
# Build a minimal stage_cfg stub for replica launch
replica_cfg = type("_ReplicaCfg", (), {
"stage_id": src.stage_id,
"stage_type": src.stage_type,
"runtime": type("_Runtime", (), {
"devices": replica_devices_map.get(stage_idx, [None])[replica_id]
if stage_idx in replica_devices_map else None,
})(),
})()
Comment on lines 170 to +176
self.engine = AsyncOmniEngine(
model=model,
engine_args=engine_args,
init_timeout=init_timeout,
stage_init_timeout=stage_init_timeout,
diffusion_batch_size=diffusion_batch_size,
**kwargs,
omni_config=self.omni_config,
engine_args=omni_engine_args,
init_timeout=self.omni_config.init_timeout,
stage_init_timeout=self.omni_config.stage_init_timeout,
diffusion_batch_size=omni_engine_args.diffusion_batch_size,
Comment thread vllm_omni/engine/arg_utils.py Outdated
Comment on lines +756 to +757
except _json.JSONDecodeError:
stage_overrides = None
Comment thread vllm_omni/entrypoints/omni_base.py Outdated
Comment on lines 149 to 153
else:
omni_engine_args = OmniEngineArgs(model=model, **kwargs)

if "log_requests" in kwargs:
raise TypeError("`log_requests` has been removed in Omni/AsyncOmni. Use `log_stats`.")
Comment on lines +98 to 100
omni_config = omni_engine_args.create_omni_config()

if args.headless:
Comment thread vllm_omni/engine/arg_utils.py Outdated
Comment on lines +740 to +745
# Build kwarg dict from all non-None fields on OmniEngineArgs
kwargs: dict[str, Any] = {
f.name: getattr(self, f.name)
for f in _dc.fields(self)
if getattr(self, f.name) is not None
}
Comment thread vllm_omni/config/vllm_omni_config.py Outdated
Comment on lines +244 to +248
# Build CLI overrides from engine args (replace kwargs dict)
if engine_args is not None:
cli_overrides = _convert_dataclasses_to_dict(
{f.name: getattr(engine_args, f.name) for f in engine_args.__dataclass_fields__.values()
if getattr(engine_args, f.name) is not None}
Comment thread vllm_omni/engine/async_omni_engine.py Outdated
Comment on lines +1124 to +1127
# Build replica layout from StageResolvedConfig objects
stage_configs_for_layout = [
type("_LayoutStub", (), {"runtime": s.runtime or {}, "stage_type": s.stage_type})()

@wuhang2014 wuhang2014 force-pushed the diffconf-1 branch 3 times, most recently from c3ca82c to 2c514c1 Compare May 17, 2026 16:29
Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly a clean consolidation, but a few correctness issues in the new path — the explicit-field tracking and async_chunk resolution are the main ones.

engine_args._explicit_fields = frozenset(
attr for attr in attrs if hasattr(args, attr) and getattr(args, attr) is not None
)
return engine_args
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from_cli_args no longer sets _explicit_fields. Combined with explicit_kwargs() falling into the explicit is None branch (line 591), every non-None field is now treated as an explicit user override — including argparse/dataclass defaults like worker_backend="multi_process", ulysses_mode="strict", cfg_parallel_size=1. Those will clobber deploy-YAML values that intentionally differ. This is the sentinel-tracking we deliberately added; dropping it here is a behavior change. Can we keep _explicit_fields on the online path?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to coordinate landing this PR and the tracking parser to make sure behavior is stable #3366

{
f.name: getattr(engine_args, f.name)
for f in engine_args.__dataclass_fields__.values()
if getattr(engine_args, f.name) is not None
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same root cause as the from_cli_args comment: if getattr(engine_args, f.name) is not None can't distinguish a user-set value from a dataclass default, so defaults leak into cli_overrides and override the deploy YAML. The old path filtered by explicit_kwargs() for exactly this reason.

Comment thread vllm_omni/config/vllm_omni_config.py Outdated
)

# Resolve async_chunk: CLI override wins over deploy YAML
async_chunk: bool = kwargs.get("async_chunk", False)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

async_chunk is annotated bool and defaulted to kwargs.get("async_chunk", False), so it's never None. The deploy-YAML async_chunk is only read in the if async_chunk is None branch below (382), which is now dead — the YAML value is silently ignored on the new path. Also, since unset and --no-async-chunk both arrive here as False, an explicit --no-async-chunk is indistinguishable from "not passed".

Comment thread vllm_omni/config/vllm_omni_config.py Outdated
pd_config = _detect_pd_config_from_stages(resolved_stages)

# ── Resolve async_chunk from stage 0 if not explicit ─────────────
if async_chunk is None and resolved_stages:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unreachable given line 301 — async_chunk can't be None here.

Comment thread vllm_omni/config/vllm_omni_config.py Outdated

if not stage_configs:
# No pipeline config found — build a default single-stage diffusion config
from omegaconf import OmegaConf
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description lists "R5: OmegaConf roundtrip eliminated in new path", but this fallback reintroduces OmegaConf. Separately, it hard-codes stage_type="diffusion", so a pure-LLM model with no pipeline config silently becomes a diffusion stage. Worth gating on worker_type before defaulting.

metadata=replica_metadata,
stage_connector_spec=src.stage_connector_spec or {},
omni_kv_connector=src.omni_kv_connector,
stage_vllm_config=src.vllm_config,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For diffusion stages src.vllm_config is None, and the pre-built src.diffusion_config is never threaded into ReplicaInitPlan. The _ReplicaCfg stub also has no engine_args, so diffusion replica init that rebuilds OmniDiffusionConfig from stage_cfg.engine_args loses the resolved config entirely. How does the diffusion replica get its config on the new path?

Comment thread vllm_omni/engine/arg_utils.py Outdated
def add_cli_args(cls, parser: argparse.ArgumentParser) -> argparse.ArgumentParser:
parser = AsyncEngineArgs.add_cli_args(parser)
parser = OmniEngineArgs._add_omni_specific_args(parser)
parser = OmniEngineArgs.add_cli_args(parser)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AsyncEngineArgs.add_cli_args(parser) already registers EngineArgs flags; calling OmniEngineArgs.add_cli_args(parser) without omni_args_only=True registers them again.

Suggested change
parser = OmniEngineArgs.add_cli_args(parser)
parser = OmniEngineArgs.add_cli_args(parser, omni_args_only=True)

engine_args_dict = build_engine_args_dict(
stage_cfg,
model,
stage_connector_spec=stage_connector_spec,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The legacy path passed cli_tokenizer=getattr(self, "tokenizer", None) into build_engine_args_dict; this call drops it, so the CLI --tokenizer override no longer reaches LLM stages.


# Build unified OmniEngineArgs from CLI (replaces scattered kwargs)
omni_engine_args = OmniEngineArgs.from_cli_args(args)
omni_config = omni_engine_args.create_omni_config()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_omni_config() runs for every vllm serve invocation (model download + per-stage VllmConfig build), but the non-headless path calls omni_run_server(args) which ignores omni_config, and run_headless() re-resolves from scratch and ignores the passed value too. So this builds the full config at CLI dispatch and discards it. Either wire it through or build it lazily.

Comment thread vllm_omni/entrypoints/omni_base.py Outdated
else:
omni_engine_args = OmniEngineArgs(model=model, **kwargs)

if "log_requests" in kwargs:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This runs after OmniEngineArgs(model=model, **kwargs) on line 149, so a caller still passing log_requests= hits an opaque dataclass TypeError instead of this message. Move the check above the construction.

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

BLOCKING:

  • [Breaking Changes] - This PR introduces a new public API VllmOmniConfig and refactors OmniEngineArgs. A refactor of this magnitude touching the core config initialization path requires an end-to-end integration test that validates the entire config flow from CLI args to AsyncOmniEngine initialization. Unit tests alone (27 passed) are insufficient for core config path changes.

  • [Test Coverage] - PR description notes "could you please run the L3 tests locally and paste the results here" but no L3 test results are provided. L3 tests are required for substantial config refactor PRs to ensure existing deployment configs continue to work.

  • [Documentation] - Checklist shows Documentation unchecked. VllmOmniConfig appears to be a new public API surface (frozen dataclass exported from vllm_omni.config), which requires docstrings explaining its usage and when users should interact with it versus OmniEngineArgs.

  • [Breaking Changes] - OmniEngineArgs.add_cli_args(omni_args_only=True) is mentioned as a new public interface in the checklist but lacks docstrings explaining the new parameter, when to use it, and what flags it enables/disables.

  • [Test Coverage] - No e2e integration test covering the full path: CLI args → OmniEngineArgsVllmOmniConfigAsyncOmniEngine. For a refactor that changes how all configuration flows through the system, at least one e2e test validating a real model deployment (e.g., Qwen2.5-Omni) should pass.

VERDICT: REQUEST_CHANGES

@Gaohan123 Gaohan123 added this to the v0.22.0 milestone May 18, 2026
…niEngineArgs

- Add VllmOmniConfig and StageResolvedConfig frozen dataclasses
  (vllm_omni/config/vllm_omni_config.py) as single source of truth for
  the runtime config. Contains pre-built VllmConfig/OmniDiffusionConfig
  per stage, orchestrator settings, and pipeline topology.

- Add build_vllm_omni_config() factory that consolidates the old
  scattered config construction (_resolve_stage_configs, Phase 2
  injection loop, build_engine_args_dict, build_vllm_config,
  build_diffusion_config, extract_stage_metadata).

- Refactor OmniEngineArgs to absorb OrchestratorArgs and all diffusion
  CLI flags into a single dataclass.  Add add_cli_args() static method
  matching vLLM's EngineArgs/AsyncEngineArgs pattern.  Add
  create_omni_config() as the unified config entrypoint.

- Delete OrchestratorArgs, split_kwargs, SHARED_FIELDS, and all
  related kwargs-partitioning infrastructure (~250 lines removed).

- Eliminate redundancies:
  - R1: duplicate build_engine_args_dict in _initialize_llm_replica
    (use pre-built VllmConfig.parallel_config.tensor_parallel_size)
  - R2: duplicate extract_stage_metadata in _build_logical_stage_init_plans
    (new path reads from pre-built StageResolvedConfig)
  - R3: two-phase config injection absorbed into factory
  - R4: unified default diffusion builder
  - R5: OmegaConf roundtrip eliminated in new path

- Refactor acquire_device_locks to accept VllmConfig | OmniDiffusionConfig
  directly (add _extract_device_counts helper for type dispatch).

- Delegate CLI arg registration in OmniServeCommand.subparser_init
  to OmniEngineArgs.add_cli_args(omni_args_only=True) (~400 lines removed).

- Propagate VllmOmniConfig through AsyncOmniEngine and Orchestrator.

- Add _OMNI_INTERNAL_KEYS to stage_config.py (replaces deleted syms).

- Update tests: remove split_kwargs/OrchestratorArgs tests, add
  OmniEngineArgs.add_cli_args/from_cli_args/create_omni_config tests.

Signed-off-by: WU Hang <whlbx@hotmail.com>
- Fix 8: add diffusion_config field to ReplicaInitPlan and carry
  pre-built OmniDiffusionConfig through the new path so diffusion
  replica init uses the resolved config instead of rebuilding from
  the empty _ReplicaCfg stub (which has no engine_args).

- Fix 9: gate the no-pipeline fallback on engine_args.worker_type.
  Pure LLM models (worker_type ar/generation/None) no longer
  silently receive a broken diffusion stage.

- Fix 10: forward --tokenizer CLI override through
  build_engine_args_dict() in build_vllm_omni_config.  The
  orchestrator-level tokenizer was dropped because it is in
  _OMNI_INTERNAL_KEYS, but the legacy path passed it explicitly
  via cli_tokenizer=.

- Fix 11: move log_requests check before OmniEngineArgs
  construction in OmniBase.__init__ so callers still passing
  log_requests= see the intentional migration TypeError instead
  of an opaque dataclass TypeError.

Signed-off-by: WU Hang <whlbx@hotmail.com>
Replace OmegaConf.create() in the default diffusion config fallback
with a simple type()-constructed stub.  This eliminates the last
OmegaConf usage in the new unified config path (the legacy path
still uses OmegaConf, which is unchanged).

Signed-off-by: WU Hang <whlbx@hotmail.com>
Signed-off-by: WU Hang <whlbx@hotmail.com>
@wuhang2014
Copy link
Copy Markdown
Contributor Author

Thanks for the thorough review @lishunyang12, @hsliuustc0106, and @copilot. Here is a summary of the changes since the initial review feedback:

Critical fixes (commit 8427111b)

# Issue Fix
1 from_cli_args missing _explicit_fields — deploy YAML values clobbered by defaults Restored _explicit_fields tracking; create_omni_config now uses explicit_kwargs() instead of iterating all non-None fields
2 async_chunk always False — deploy YAML value silently ignored Changed to kwargs.get("async_chunk") (None when unset) so the deploy-YAML fallback branch is reachable
3 _StageCfgWrapper hard-codes is_prefill_only=False / is_decode_only=False — PD detection permanently broken Replaced with _detect_pd_config_from_omega_conf that reads the real OmegaConf stage configs
4 Diffusion CLI flags (--num-gpus, --cache-backend, etc.) in _OMNI_INTERNAL_KEYS filtered out Removed 33 diffusion-specific engine args from the internal keys frozenset
5 Layout stubs missing engine_args.parallel_config — multi-GPU diffusion replicas default to 1 device Layout stubs now include engine_args with parallel_config from pre-built VllmConfig / OmniDiffusionConfig
6 num_replicas arms inverted (IndexError) Flipped to correct order: replicas_per_stage[stage_idx] if stage_idx < len(...) else src.num_replicas
7 OmniAsyncEngineArgs.add_cli_args double-registers vLLM parent flags Added omni_args_only=True

Important fixes (commit ffe453b4)

# Issue Fix
8 Diffusion replica missing pre-built config — _ReplicaCfg stubs have no engine_args, diffusion init rebuilds from empty stub Added diffusion_config field to ReplicaInitPlan; new path carries src.diffusion_config; _initialize_diffusion_replica uses plan.diffusion_config when available
9 No-pipeline fallback hard-codes diffusion for pure LLM models Gated on engine_args.worker_type — LLM models (ar/generation/None) return empty config instead of broken diffusion stage
10 --tokenizer CLI override dropped build_engine_args_dict call now forwards cli_tokenizer=getattr(engine_args, "tokenizer", None)
11 log_requests check after OmniEngineArgs construction (opaque TypeError) Moved check before construction

Cleanup (commit 7eb5d334)

  • Removed last OmegaConf usage in new config path (replaced OmegaConf.create() with type()-constructed stub in the no-pipeline fallback)

Docs fix (commit 3b694c9e)

  • RTD build: generate_argparse.py now statically extracts OmniEngineArgs.add_cli_args from source and execs it in a sandbox with mock EngineArgs (no-op), so all real OmniConfig flags appear in docs without vLLM dependency

Open / transitional

  • omni_config built but not consumed in serve.py cmd() non-headless path — this is intentional transitional state; omni_run_server(args) still receives args directly until downstream wiring is complete in a follow-up PR
  • --stage-overrides JSON parse now raises ValueError on failure instead of silently swallowing
  • PD detection failure now logs a warning before returning None

… is provided

When omni_config is available (new unified path), engine_args is only
used for tokenizer/single-stage field extraction, not for kwargs
merging.  The old guard rejected bare OmniEngineArgs objects that
don't carry _explicit_fields, but OmniBase now constructs
OmniEngineArgs directly from kwargs without calling create().

Only enforce _explicit_fields when omni_config is None (old path).
When omni_config IS available, extract explicit kwargs only if
_explicit_fields exists, otherwise skip the merge.

This fixes the e2e tests that were getting:
  TypeError: engine_args=OmniEngineArgs(...) is ambiguous under
  sentinel-default precedence

Signed-off-by: WU Hang <whlbx@hotmail.com>
When the online serving path passes vars(args) as **kwargs to
OmniBase, it includes vLLM server flags (subparser, host, port,
api_key, ssl_*, etc.) that are not fields on OmniEngineArgs.
The bare OmniEngineArgs(model=model, **kwargs) constructor
rejects unknown fields with TypeError.

Filter kwargs to only OmniEngineArgs dataclass fields before
constructing.  This replaces the old split_kwargs behavior
that partitioned CLI flags into orchestrator/engine/server
buckets.

Signed-off-by: WU Hang <whlbx@hotmail.com>
None means 'not specified', which is the common case for diffusion
models that don't have a PipelineConfig registry entry. Only skip
the diffusion fallback when the user explicitly requests an LLM
worker (--worker-type ar / generation).  This fixes '0 stages' for
Qwen/Qwen-Image and other diffusion models using the online serve path.

Signed-off-by: WU Hang <whlbx@hotmail.com>
When the unified config path is active, OmniBase.stage_configs now
returns list(self.omni_config.stages) instead of the empty
engine.stage_configs fallback.  This is needed for the API server's
is_pure_diffusion detection, which reads stage_configs to determine
whether to use the diffusion-only init path.

Previously, AsyncOmniEngine.__init__ set self.stage_configs = [] when
omni_config is available, causing the server to fall through to the
LLM init path and crash with AttributeError in OpenAIServingRender.

Signed-off-by: WU Hang <whlbx@hotmail.com>
@wuhang2014
Copy link
Copy Markdown
Contributor Author

diffusion model e2e test with qwen-image:

CUDA_VISIBLE_DEVICES=0  pytest -sv tests/e2e/online_serving/test_qwen_image_expansion.py -m "full_model and diffusion and H100 and not distributed_cuda" --run-level "full_model"

...

--- Running Summary
======================================================================================== 6 passed, 10 deselected, 18 warnings in 1252.22s (0:20:52) ========================================================================================

build_async_omni_from_stage_config was passing vars(args) as
**kwargs to AsyncOmni, which then used from_kwargs — which doesn't
set _explicit_fields, so explicit_kwargs fell through to the
'all non-None fields' branch, leaking dataclass defaults like
model_stage='thinker' as per-stage CLI overrides.

For multi-stage models like Qwen3-Omni, this caused the talker
stage to incorrectly use model_stage='thinker', resulting in a
vocab size mismatch during weight loading.

Now uses OmniEngineArgs.from_cli_args(args) which correctly
tracks _explicit_fields and respects nullify_stage_engine_defaults.

Signed-off-by: WU Hang <whlbx@hotmail.com>
Fields like model_stage, model_arch, engine_output_type, hf_config_name,
worker_type, etc. are per-stage settings defined in deploy YAML /
PipelineConfig.  They should not be global CLI overrides applied to
every stage.

With from_cli_args, these fields have non-None dataclass defaults
(e.g. model_stage='thinker') which are treated as explicit user
overrides by explicit_kwargs(), causing them to leak into CLI
overrides for ALL stages — overwriting the correct per-stage values
from the deploy YAML.

Adding them to _OMNI_INTERNAL_KEYS filters them in
build_stage_runtime_overrides, so the deploy YAML values win.

Signed-off-by: WU Hang <whlbx@hotmail.com>
@wuhang2014
Copy link
Copy Markdown
Contributor Author

omni model e2e test with qwen3-omni:

CUDA_VISIBLE_DEVICES=0,1 pytest -sv tests/e2e/online_serving/test_qwen3_omni.py::test_mix_to_text_audio_001

...

--- Running Summary
================================================================================================ 1 passed, 19 warnings in 597.78s (0:09:57) ================================================================================================

wuhang2014 added a commit to wuhang2014/vllm-omni that referenced this pull request May 18, 2026
…review)

Phase 1 - Decompose build_vllm_omni_config:
- Split ~250-line factory into _build_cli_overrides, _build_default_diffusion_config,
  _resolve_stages, and _resolve_async_chunk helpers
- Each helper has a clear docstring and single responsibility

Phase 2 - Reduce fragility:
- Add TestOmniInternalKeysCoverage to keep _OMNI_INTERNAL_KEYS in sync with
  OmniEngineArgs fields
- Narrow _detect_pd_config_from_omega_conf except catch to (ImportError,
  AttributeError, TypeError); downgrade log level to debug
- Rename Orchestrator pd_config param to legacy_pd_config with explicit
  omni_config.pd_config -> legacy_pd_config fallback

Phase 3 - Long-term maintainability:
- Replace anonymous type("_DefaultDiffusionCfg") and type("_ReplicaCfg") calls
  with named frozen dataclasses (DefaultDiffusionCfg, ReplicaInitCfg +
  ReplicaInitCfgRuntime)
- Add DeprecationWarning and TODO(vllm-project#3672) to legacy init path in
  async_omni_engine.py
- Document import-deferral rationale for circular-dependency function-body imports
- Expand to_omegaconf() TODO with concrete migration steps referencing PR vllm-project#3672

Signed-off-by: WU Hang <whlbx@hotmail.com>
…review)

Phase 1 - Decompose build_vllm_omni_config:
- Split ~250-line factory into _build_cli_overrides, _build_default_diffusion_config,
  _resolve_stages, and _resolve_async_chunk helpers
- Each helper has a clear docstring and single responsibility

Phase 2 - Reduce fragility:
- Add TestOmniInternalKeysCoverage to keep _OMNI_INTERNAL_KEYS in sync with
  OmniEngineArgs fields
- Narrow _detect_pd_config_from_omega_conf except catch to (ImportError,
  AttributeError, TypeError); downgrade log level to debug
- Rename Orchestrator pd_config param to legacy_pd_config with explicit
  omni_config.pd_config -> legacy_pd_config fallback

Phase 3 - Long-term maintainability:
- Replace anonymous type("_DefaultDiffusionCfg") and type("_ReplicaCfg") calls
  with named frozen dataclasses (DefaultDiffusionCfg, ReplicaInitCfg +
  ReplicaInitCfgRuntime)
- Add DeprecationWarning and TODO(vllm-project#3672) to legacy init path in
  async_omni_engine.py
- Document import-deferral rationale for circular-dependency function-body imports
- Expand to_omegaconf() TODO with concrete migration steps referencing PR vllm-project#3672

Signed-off-by: WU Hang <whlbx@hotmail.com>
args.model = args.model_tag

# Build unified OmniEngineArgs from CLI (replaces scattered kwargs)
omni_engine_args = OmniEngineArgs.from_cli_args(args)
Copy link
Copy Markdown
Contributor

@xiaohajiayou xiaohajiayou May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from_cli_args() assumes all explicit overrides come from argparse. But overrides can come from both explicit constructor kwargs AND argparse simultaneously.

  • The old approach handled this by merging args into kwargs;
  • now that online and offline paths are explicitly separated, this needs to be reconsidered.

Copy link
Copy Markdown
Contributor

@alex-jw-brooks alex-jw-brooks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts, thanks! I need to understand a bit better how this ties into #3366, so will take another look a bit later, but I think we need to try to simplify the config stuff as much as possible, with more clear patterns for implementing new potential engine types

"data_parallel_size": getattr(pc, "data_parallel_size", 1) or 1,
"prefill_context_parallel_size": getattr(pc, "prefill_context_parallel_size", 1) or 1,
"sequence_parallel_size": getattr(pc, "sequence_parallel_size", 1) or 1,
"cfg_parallel_size": getattr(pc, "cfg_parallel_size", 1) or 1,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is likely not needed? I.e., why not just check

  • if config is a vLLM Config, get the attributes for non diffusion models
  • if config is an OmniDiffusionConfig, get the attributes for diffusion models

in the caller, rather than creating a new dict unifying things? Defaults are already handled in the dataclass init, and we lose type safety by pulling things out into another dict here, and it will get uglier if new engine types are added later on

pc = config.parallel_config

# DiffusionParallelConfig uses ``world_size`` as the total device count
if hasattr(pc, "world_size"):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should check that the type is a OmniDiffusionConfig instead of checking hasattr dynamically, and the type hint should be config: OmniDiffusionConfig | VllmConfig so that its more clear. Then we should throw an explicit error if config is not one of those types, not rely on getattr / try excepts to resolve the world size

class ReplicaInitCfgRuntime:
"""Minimal per-replica runtime info (used instead of OmegaConf roundtrip)."""

devices: str | None = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this be extended later? Seems strange to have a dataclass wrapping just one field

# ==================================================================

@staticmethod
def add_cli_args(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't called anwhere with omni_args_only=False, right?

I think we should rename this to something like register_omni_args, remove the param, and assume the parser has already been initialized for vLLM for now, otherwise it's confusing what this is meant to be doing

engine_args._explicit_fields = frozenset(
attr for attr in attrs if hasattr(args, attr) and getattr(args, attr) is not None
)
return engine_args
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to coordinate landing this PR and the tracking parser to make sure behavior is stable #3366

def _get_config_logger():
from vllm.logger import init_logger

return init_logger("vllm_omni.config.vllm_omni_config")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logger should just be initialized at the top of the file so other things can use it, rather than the fallback case needing to initialize the logger to be used

if resolved_stages:
return bool(
getattr(
getattr(stage_configs[0], "engine_args", None),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think having special resolution for async chunk as a key is a good idea. It should just be handled by the normal precedence resolution?

I think we should just use the normal resolution and then validate that the set of resolved values for async chunk is consistent across stages, and throw an error if it isn't

# extract_stage_metadata / build_diffusion_config expect.
default_cfg = DefaultDiffusionCfg(
engine_args=cli_overrides,
runtime={"devices": cli_overrides.get("devices")},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splitting runtime with special keys out from cli_overrides to pass here feels like an an antipattern to me

# the top-level ``OmniEngineArgs``, not by individual stage engines.
# Previously derived from ``OrchestratorArgs`` / ``SHARED_FIELDS`` in
# ``arg_utils``; now defined here as a single source of truth.
_OMNI_INTERNAL_KEYS: frozenset[str] = frozenset(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to figure out a better way to handle this with more robust filtering, hardcoding the keys like this is error prone, especially with them being a mix of top-level configuration and stage type specific options

E.g., if these keys are really never expected to be forward to stage init, that means the dependent stages should never expect to access them from the built object, so do we actually need to fully drop them? I think hardcoding keys in this way is more of an indicator that we do not have the right config separation / levels of encapsulation

CC @lishunyang12 @gcanlin also in case either of you have thoughts


# ── Cluster / backend ──────────────────────────────────────────

worker_backend: str = "multi_process"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is where the override semantics start to drift.

  • Once OmniEngineArgs carries non-None defaults, _build_cli_overrides() will treat them as merge inputs before YAML resolution, which can turn fallback defaults into effective overrides.
  • Under the current None-sentinel model, defaults should only materialize after CLI/YAML merge is finished, not in the unified input layer.

Pattern (matching vLLM's ``EngineArgs``):

Online: ``OmniEngineArgs.from_cli_args(parsed_args)``
Offline: ``OmniEngineArgs(model="...", tensor_parallel_size=2, ...)``
Copy link
Copy Markdown
Contributor

@xiaohajiayou xiaohajiayou May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Offline OmniEngineArgs(model="...", ...) never sets _explicit_fields.

  • Without nullification (CLI-only), EngineArgs defaults all leak as global per-stage overrides via _build_cli_overrides, silently clobbering deploy YAML values.
  • Offline needs create() or equivalent to tag explicit fields.

Signed-off-by: wuhang <wuhang6@huawei.com>

# Conflicts:
#	vllm_omni/engine/arg_utils.py
#	vllm_omni/engine/async_omni_engine.py
#	vllm_omni/engine/orchestrator.py
#	vllm_omni/engine/stage_init_utils.py
#	vllm_omni/entrypoints/cli/serve.py
wuhang2014 added a commit to wuhang2014/vllm-omni that referenced this pull request May 22, 2026
…oject#3754 — unified VllmOmniConfig + OmniArgumentParser

Introduce a unified VllmOmniConfig frozen dataclass as the single source
of truth for all runtime configuration (pipeline topology, per-stage
VllmConfig/OmniDiffusionConfig, orchestrator settings, connectors, PD
config).

Add OmniArgumentParser that injects model-specific defaults from deploy
YAML as action.default before parse — replacing nullify_stage_engine_defaults,
detect_explicit_cli_keys, deploy_override_field_names, and the two-phase
kwargs partitioning (split_kwargs / OrchestratorArgs / SHARED_FIELDS).

Removals:
- StageConfig, ModelPipeline classes + to_omegaconf() OmegaConf roundtrip
- merge_pipeline_deploy, _build_engine_args, _build_extras, _apply_platform_overrides
- StageConfigFactory.create_from_model / _create_from_registry / create_default_diffusion
- _explicit_fields / explicit_kwargs / create() sentinel on OmniEngineArgs
- _create_default_diffusion_stage_cfg + diffusion kwargs injection loop
- _strip_single_engine_args + _PARENT_ARGS_KEEP/STRIP/NO_WARN frozensets
- load_and_resolve_stage_configs / load_stage_configs_from_yaml
- OmniBase.from_cli_args

Config flow (after):
  OmniArgumentParser.parse_args()       # inject deploy YAML as action.default
  OmniEngineArgs.from_cli_args(args)    # extract resolved args
  build_vllm_omni_config(model, ...)    # single factory -> VllmOmniConfig
  AsyncOmniEngine(omni_config=...)      # read config directly

Net: -1347 lines across 10 files.

Signed-off-by: wuh <wuh@users.noreply.github.com>
@hsliuustc0106 hsliuustc0106 added the high priority high priority issue, needs to be done asap label May 28, 2026
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

resolve conflicts

@wuhang2014
Copy link
Copy Markdown
Contributor Author

resolve conflicts

I'll post a RFC first because the change is huge. Maybe delay this PR until a later release.

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

resolve conflicts

@hsliuustc0106 hsliuustc0106 removed this from the v0.22.0 milestone May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

high priority high priority issue, needs to be done asap

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants