[Config Refactor] Introduce unified VllmOmniConfig and consolidate OmniEngineArgs by wuhang2014 · Pull Request #3672 · vllm-project/vllm-omni

wuhang2014 · 2026-05-17T16:02:27Z

Purpose

Introduce a unified VllmOmniConfig frozen dataclass as the single source of truth for all runtime configuration (pipeline topology, per-stage VllmConfig/OmniDiffusionConfig, orchestrator settings, connectors, PD config). Consolidate OmniEngineArgs following vLLM's EngineArgs → VllmConfig pattern to eliminate scattered kwargs, split_kwargs partitioning (OrchestratorArgs + SHARED_FIELDS), and the two-phase config injection loop.

Six redundancies addressed:

R1: duplicate build_engine_args_dict in _initialize_llm_replica (now reads TP from pre-built VllmConfig.parallel_config)
R2: duplicate extract_stage_metadata per replica (new path reads from StageResolvedConfig)
R3: two-phase config injection (~90 lines) absorbed into build_vllm_omni_config
R4: two default-diffusion builders unified
R5: OmegaConf roundtrip eliminated in new path
R6: triple dict filtering → single pass in factory

Test Plan

# Unit tests (CPU, local)
pytest tests/test_arg_utils.py tests/entrypoints/test_async_omni_diffusion_config.py -v

# On remote GPU host:
ssh -p 2222 root@<host> "source /workspace/.venv/bin/activate && cd /workspace/vllm-omni && pytest tests/test_arg_utils.py tests/entrypoints/test_async_omni_diffusion_config.py -v"

Test Result

27 passed, 1 skipped in 5.21s

(skipped: test_create_omni_config_returns_vllm_omni_config — requires local model path via VLLM_OMNI_TEST_MODEL)

Essential Elements Checklist

Purpose: introduce unified VllmOmniConfig, consolidate OmniEngineArgs, eliminate config redundancies
Test plan: pytest tests/test_arg_utils.py tests/entrypoints/test_async_omni_diffusion_config.py
Test results: 27 passed, 1 skipped
Documentation: no API surface changes (internal refactor); OmniEngineArgs.add_cli_args(omni_args_only=True) is the only new public interface
Release notes: not user-facing at this stage; subsequent PRs will wire the new path fully

chatgpt-codex-connector · 2026-05-17T16:02:33Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

Copilot

Pull request overview

This PR refactors vLLM-Omni runtime configuration around a unified VllmOmniConfig and consolidated OmniEngineArgs, replacing scattered config construction with a centralized factory path.

Changes:

Adds VllmOmniConfig / StageResolvedConfig and routes engine/orchestrator initialization toward resolved stage configs.
Consolidates CLI/offline config fields into OmniEngineArgs and delegates Omni CLI flag registration to it.
Updates stage initialization, device locking, PD detection, and tests for the new config path.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 13 comments.

Show a summary per file

File	Description
`vllm_omni/entrypoints/omni_base.py`	Builds `OmniEngineArgs`/`VllmOmniConfig` before constructing `AsyncOmniEngine`.
`vllm_omni/entrypoints/cli/serve.py`	Delegates Omni CLI flags to `OmniEngineArgs` and starts unified config creation.
`vllm_omni/engine/stage_init_utils.py`	Allows device locking from resolved config objects.
`vllm_omni/engine/orchestrator.py`	Accepts `VllmOmniConfig` and derives async chunk / PD config from it.
`vllm_omni/engine/async_omni_engine.py`	Adds unified config path for stage planning and initialization.
`vllm_omni/engine/arg_utils.py`	Consolidates Omni args and CLI registration; removes old split/routing helpers.
`vllm_omni/config/vllm_omni_config.py`	Introduces resolved runtime config dataclasses and factory.
`vllm_omni/config/stage_config.py`	Moves internal key filtering into stage config utilities.
`vllm_omni/config/__init__.py`	Exports new config dataclasses.
`tests/test_config_factory.py`	Updates internal-key tests for the new filtering source.
`tests/test_arg_utils.py`	Replaces old split/routing tests with unified args tests.
`tests/entrypoints/test_async_omni_diffusion_config.py`	Adapts default diffusion config helper usage in tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        )
+
+    # Resolve async_chunk: CLI override wins over deploy YAML
+    async_chunk: bool = kwargs.get("async_chunk", False)


+        # ── Diffusion model flags (orchestrator-level, not per-stage) ──
+        "num_gpus",
+        "model_class_name",
+        "diffusion_load_format",
+        "diffusers_load_kwargs",
+        "diffusers_call_kwargs",


+                    # Build a minimal stage_cfg stub for replica launch
+                    replica_cfg = type("_ReplicaCfg", (), {
+                        "stage_id": src.stage_id,
+                        "stage_type": src.stage_type,
+                        "runtime": type("_Runtime", (), {
+                            "devices": replica_devices_map.get(stage_idx, [None])[replica_id]
+                            if stage_idx in replica_devices_map else None,
+                        })(),
+                    })()


        self.engine = AsyncOmniEngine(
            model=model,
-            engine_args=engine_args,
-            init_timeout=init_timeout,
-            stage_init_timeout=stage_init_timeout,
-            diffusion_batch_size=diffusion_batch_size,
-            **kwargs,
+            omni_config=self.omni_config,
+            engine_args=omni_engine_args,
+            init_timeout=self.omni_config.init_timeout,
+            stage_init_timeout=self.omni_config.stage_init_timeout,
+            diffusion_batch_size=omni_engine_args.diffusion_batch_size,


+            except _json.JSONDecodeError:
+                stage_overrides = None


+        else:
+            omni_engine_args = OmniEngineArgs(model=model, **kwargs)

        if "log_requests" in kwargs:
            raise TypeError("`log_requests` has been removed in Omni/AsyncOmni. Use `log_stats`.")


+        omni_config = omni_engine_args.create_omni_config()
+
        if args.headless:


+        # Build kwarg dict from all non-None fields on OmniEngineArgs
+        kwargs: dict[str, Any] = {
+            f.name: getattr(self, f.name)
+            for f in _dc.fields(self)
+            if getattr(self, f.name) is not None
+        }


+    # Build CLI overrides from engine args (replace kwargs dict)
+    if engine_args is not None:
+        cli_overrides = _convert_dataclasses_to_dict(
+            {f.name: getattr(engine_args, f.name) for f in engine_args.__dataclass_fields__.values()
+             if getattr(engine_args, f.name) is not None}


+            # Build replica layout from StageResolvedConfig objects
+            stage_configs_for_layout = [
+                type("_LayoutStub", (), {"runtime": s.runtime or {}, "stage_type": s.stage_type})()
+


lishunyang12

Mostly a clean consolidation, but a few correctness issues in the new path — the explicit-field tracking and async_chunk resolution are the main ones.

lishunyang12 · 2026-05-17T16:39:18Z

-        engine_args._explicit_fields = frozenset(
-            attr for attr in attrs if hasattr(args, attr) and getattr(args, attr) is not None
-        )
        return engine_args


from_cli_args no longer sets _explicit_fields. Combined with explicit_kwargs() falling into the explicit is None branch (line 591), every non-None field is now treated as an explicit user override — including argparse/dataclass defaults like worker_backend="multi_process", ulysses_mode="strict", cfg_parallel_size=1. Those will clobber deploy-YAML values that intentionally differ. This is the sentinel-tracking we deliberately added; dropping it here is a behavior change. Can we keep _explicit_fields on the online path?

I think we need to coordinate landing this PR and the tracking parser to make sure behavior is stable #3366

lishunyang12 · 2026-05-17T16:39:18Z

+            {
+                f.name: getattr(engine_args, f.name)
+                for f in engine_args.__dataclass_fields__.values()
+                if getattr(engine_args, f.name) is not None


Same root cause as the from_cli_args comment: if getattr(engine_args, f.name) is not None can't distinguish a user-set value from a dataclass default, so defaults leak into cli_overrides and override the deploy YAML. The old path filtered by explicit_kwargs() for exactly this reason.

lishunyang12 · 2026-05-17T16:39:18Z

+        )
+
+    # Resolve async_chunk: CLI override wins over deploy YAML
+    async_chunk: bool = kwargs.get("async_chunk", False)


async_chunk is annotated bool and defaulted to kwargs.get("async_chunk", False), so it's never None. The deploy-YAML async_chunk is only read in the if async_chunk is None branch below (382), which is now dead — the YAML value is silently ignored on the new path. Also, since unset and --no-async-chunk both arrive here as False, an explicit --no-async-chunk is indistinguishable from "not passed".

lishunyang12 · 2026-05-17T16:39:18Z

+    pd_config = _detect_pd_config_from_stages(resolved_stages)
+
+    # ── Resolve async_chunk from stage 0 if not explicit ─────────────
+    if async_chunk is None and resolved_stages:


Unreachable given line 301 — async_chunk can't be None here.

lishunyang12 · 2026-05-17T16:39:18Z

+
+    if not stage_configs:
+        # No pipeline config found — build a default single-stage diffusion config
+        from omegaconf import OmegaConf


The PR description lists "R5: OmegaConf roundtrip eliminated in new path", but this fallback reintroduces OmegaConf. Separately, it hard-codes stage_type="diffusion", so a pure-LLM model with no pipeline config silently becomes a diffusion stage. Worth gating on worker_type before defaulting.

lishunyang12 · 2026-05-17T16:39:18Z

+                            metadata=replica_metadata,
+                            stage_connector_spec=src.stage_connector_spec or {},
+                            omni_kv_connector=src.omni_kv_connector,
+                            stage_vllm_config=src.vllm_config,


For diffusion stages src.vllm_config is None, and the pre-built src.diffusion_config is never threaded into ReplicaInitPlan. The _ReplicaCfg stub also has no engine_args, so diffusion replica init that rebuilds OmniDiffusionConfig from stage_cfg.engine_args loses the resolved config entirely. How does the diffusion replica get its config on the new path?

lishunyang12 · 2026-05-17T16:39:18Z

    def add_cli_args(cls, parser: argparse.ArgumentParser) -> argparse.ArgumentParser:
        parser = AsyncEngineArgs.add_cli_args(parser)
-        parser = OmniEngineArgs._add_omni_specific_args(parser)
+        parser = OmniEngineArgs.add_cli_args(parser)


AsyncEngineArgs.add_cli_args(parser) already registers EngineArgs flags; calling OmniEngineArgs.add_cli_args(parser) without omni_args_only=True registers them again.

Suggested change

parser = OmniEngineArgs.add_cli_args(parser)

parser = OmniEngineArgs.add_cli_args(parser, omni_args_only=True)

lishunyang12 · 2026-05-17T16:39:18Z

+            engine_args_dict = build_engine_args_dict(
+                stage_cfg,
+                model,
+                stage_connector_spec=stage_connector_spec,


The legacy path passed cli_tokenizer=getattr(self, "tokenizer", None) into build_engine_args_dict; this call drops it, so the CLI --tokenizer override no longer reaches LLM stages.

lishunyang12 · 2026-05-17T16:39:18Z


+        # Build unified OmniEngineArgs from CLI (replaces scattered kwargs)
+        omni_engine_args = OmniEngineArgs.from_cli_args(args)
+        omni_config = omni_engine_args.create_omni_config()


create_omni_config() runs for every vllm serve invocation (model download + per-stage VllmConfig build), but the non-headless path calls omni_run_server(args) which ignores omni_config, and run_headless() re-resolves from scratch and ignores the passed value too. So this builds the full config at CLI dispatch and discards it. Either wire it through or build it lazily.

lishunyang12 · 2026-05-17T16:39:18Z

+        else:
+            omni_engine_args = OmniEngineArgs(model=model, **kwargs)

        if "log_requests" in kwargs:


This runs after OmniEngineArgs(model=model, **kwargs) on line 149, so a caller still passing log_requests= hits an opaque dataclass TypeError instead of this message. Move the check above the construction.

hsliuustc0106 · 2026-05-17T17:31:11Z

BLOCKING:

[Breaking Changes] - This PR introduces a new public API VllmOmniConfig and refactors OmniEngineArgs. A refactor of this magnitude touching the core config initialization path requires an end-to-end integration test that validates the entire config flow from CLI args to AsyncOmniEngine initialization. Unit tests alone (27 passed) are insufficient for core config path changes.
[Test Coverage] - PR description notes "could you please run the L3 tests locally and paste the results here" but no L3 test results are provided. L3 tests are required for substantial config refactor PRs to ensure existing deployment configs continue to work.
[Documentation] - Checklist shows Documentation unchecked. VllmOmniConfig appears to be a new public API surface (frozen dataclass exported from vllm_omni.config), which requires docstrings explaining its usage and when users should interact with it versus OmniEngineArgs.
[Breaking Changes] - OmniEngineArgs.add_cli_args(omni_args_only=True) is mentioned as a new public interface in the checklist but lacks docstrings explaining the new parameter, when to use it, and what flags it enables/disables.
[Test Coverage] - No e2e integration test covering the full path: CLI args → OmniEngineArgs → VllmOmniConfig → AsyncOmniEngine. For a refactor that changes how all configuration flows through the system, at least one e2e test validating a real model deployment (e.g., Qwen2.5-Omni) should pass.

VERDICT: REQUEST_CHANGES

…niEngineArgs - Add VllmOmniConfig and StageResolvedConfig frozen dataclasses (vllm_omni/config/vllm_omni_config.py) as single source of truth for the runtime config. Contains pre-built VllmConfig/OmniDiffusionConfig per stage, orchestrator settings, and pipeline topology. - Add build_vllm_omni_config() factory that consolidates the old scattered config construction (_resolve_stage_configs, Phase 2 injection loop, build_engine_args_dict, build_vllm_config, build_diffusion_config, extract_stage_metadata). - Refactor OmniEngineArgs to absorb OrchestratorArgs and all diffusion CLI flags into a single dataclass. Add add_cli_args() static method matching vLLM's EngineArgs/AsyncEngineArgs pattern. Add create_omni_config() as the unified config entrypoint. - Delete OrchestratorArgs, split_kwargs, SHARED_FIELDS, and all related kwargs-partitioning infrastructure (~250 lines removed). - Eliminate redundancies: - R1: duplicate build_engine_args_dict in _initialize_llm_replica (use pre-built VllmConfig.parallel_config.tensor_parallel_size) - R2: duplicate extract_stage_metadata in _build_logical_stage_init_plans (new path reads from pre-built StageResolvedConfig) - R3: two-phase config injection absorbed into factory - R4: unified default diffusion builder - R5: OmegaConf roundtrip eliminated in new path - Refactor acquire_device_locks to accept VllmConfig | OmniDiffusionConfig directly (add _extract_device_counts helper for type dispatch). - Delegate CLI arg registration in OmniServeCommand.subparser_init to OmniEngineArgs.add_cli_args(omni_args_only=True) (~400 lines removed). - Propagate VllmOmniConfig through AsyncOmniEngine and Orchestrator. - Add _OMNI_INTERNAL_KEYS to stage_config.py (replaces deleted syms). - Update tests: remove split_kwargs/OrchestratorArgs tests, add OmniEngineArgs.add_cli_args/from_cli_args/create_omni_config tests. Signed-off-by: WU Hang <whlbx@hotmail.com>

- Fix 8: add diffusion_config field to ReplicaInitPlan and carry pre-built OmniDiffusionConfig through the new path so diffusion replica init uses the resolved config instead of rebuilding from the empty _ReplicaCfg stub (which has no engine_args). - Fix 9: gate the no-pipeline fallback on engine_args.worker_type. Pure LLM models (worker_type ar/generation/None) no longer silently receive a broken diffusion stage. - Fix 10: forward --tokenizer CLI override through build_engine_args_dict() in build_vllm_omni_config. The orchestrator-level tokenizer was dropped because it is in _OMNI_INTERNAL_KEYS, but the legacy path passed it explicitly via cli_tokenizer=. - Fix 11: move log_requests check before OmniEngineArgs construction in OmniBase.__init__ so callers still passing log_requests= see the intentional migration TypeError instead of an opaque dataclass TypeError. Signed-off-by: WU Hang <whlbx@hotmail.com>

Replace OmegaConf.create() in the default diffusion config fallback with a simple type()-constructed stub. This eliminates the last OmegaConf usage in the new unified config path (the legacy path still uses OmegaConf, which is unchanged). Signed-off-by: WU Hang <whlbx@hotmail.com>

Signed-off-by: WU Hang <whlbx@hotmail.com>

wuhang2014 · 2026-05-18T03:26:01Z

Thanks for the thorough review @lishunyang12, @hsliuustc0106, and @copilot. Here is a summary of the changes since the initial review feedback:

Critical fixes (commit `8427111b`)

#	Issue	Fix
1	`from_cli_args` missing `_explicit_fields` — deploy YAML values clobbered by defaults	Restored `_explicit_fields` tracking; `create_omni_config` now uses `explicit_kwargs()` instead of iterating all non-None fields
2	`async_chunk` always `False` — deploy YAML value silently ignored	Changed to `kwargs.get("async_chunk")` (None when unset) so the deploy-YAML fallback branch is reachable
3	`_StageCfgWrapper` hard-codes `is_prefill_only=False` / `is_decode_only=False` — PD detection permanently broken	Replaced with `_detect_pd_config_from_omega_conf` that reads the real OmegaConf stage configs
4	Diffusion CLI flags (`--num-gpus`, `--cache-backend`, etc.) in `_OMNI_INTERNAL_KEYS` filtered out	Removed 33 diffusion-specific engine args from the internal keys frozenset
5	Layout stubs missing `engine_args.parallel_config` — multi-GPU diffusion replicas default to 1 device	Layout stubs now include `engine_args` with `parallel_config` from pre-built `VllmConfig` / `OmniDiffusionConfig`
6	`num_replicas` arms inverted (IndexError)	Flipped to correct order: `replicas_per_stage[stage_idx] if stage_idx < len(...) else src.num_replicas`
7	`OmniAsyncEngineArgs.add_cli_args` double-registers vLLM parent flags	Added `omni_args_only=True`

Important fixes (commit `ffe453b4`)

#	Issue	Fix
8	Diffusion replica missing pre-built config — `_ReplicaCfg` stubs have no `engine_args`, diffusion init rebuilds from empty stub	Added `diffusion_config` field to `ReplicaInitPlan`; new path carries `src.diffusion_config`; `_initialize_diffusion_replica` uses `plan.diffusion_config` when available
9	No-pipeline fallback hard-codes diffusion for pure LLM models	Gated on `engine_args.worker_type` — LLM models (`ar`/`generation`/`None`) return empty config instead of broken diffusion stage
10	`--tokenizer` CLI override dropped	`build_engine_args_dict` call now forwards `cli_tokenizer=getattr(engine_args, "tokenizer", None)`
11	`log_requests` check after `OmniEngineArgs` construction (opaque TypeError)	Moved check before construction

Cleanup (commit `7eb5d334`)

Removed last OmegaConf usage in new config path (replaced OmegaConf.create() with type()-constructed stub in the no-pipeline fallback)

Docs fix (commit `3b694c9e`)

RTD build: generate_argparse.py now statically extracts OmniEngineArgs.add_cli_args from source and execs it in a sandbox with mock EngineArgs (no-op), so all real OmniConfig flags appear in docs without vLLM dependency

Open / transitional

omni_config built but not consumed in serve.py cmd() non-headless path — this is intentional transitional state; omni_run_server(args) still receives args directly until downstream wiring is complete in a follow-up PR
--stage-overrides JSON parse now raises ValueError on failure instead of silently swallowing
PD detection failure now logs a warning before returning None

… is provided When omni_config is available (new unified path), engine_args is only used for tokenizer/single-stage field extraction, not for kwargs merging. The old guard rejected bare OmniEngineArgs objects that don't carry _explicit_fields, but OmniBase now constructs OmniEngineArgs directly from kwargs without calling create(). Only enforce _explicit_fields when omni_config is None (old path). When omni_config IS available, extract explicit kwargs only if _explicit_fields exists, otherwise skip the merge. This fixes the e2e tests that were getting: TypeError: engine_args=OmniEngineArgs(...) is ambiguous under sentinel-default precedence Signed-off-by: WU Hang <whlbx@hotmail.com>

When the online serving path passes vars(args) as **kwargs to OmniBase, it includes vLLM server flags (subparser, host, port, api_key, ssl_*, etc.) that are not fields on OmniEngineArgs. The bare OmniEngineArgs(model=model, **kwargs) constructor rejects unknown fields with TypeError. Filter kwargs to only OmniEngineArgs dataclass fields before constructing. This replaces the old split_kwargs behavior that partitioned CLI flags into orchestrator/engine/server buckets. Signed-off-by: WU Hang <whlbx@hotmail.com>

None means 'not specified', which is the common case for diffusion models that don't have a PipelineConfig registry entry. Only skip the diffusion fallback when the user explicitly requests an LLM worker (--worker-type ar / generation). This fixes '0 stages' for Qwen/Qwen-Image and other diffusion models using the online serve path. Signed-off-by: WU Hang <whlbx@hotmail.com>

When the unified config path is active, OmniBase.stage_configs now returns list(self.omni_config.stages) instead of the empty engine.stage_configs fallback. This is needed for the API server's is_pure_diffusion detection, which reads stage_configs to determine whether to use the diffusion-only init path. Previously, AsyncOmniEngine.__init__ set self.stage_configs = [] when omni_config is available, causing the server to fall through to the LLM init path and crash with AttributeError in OpenAIServingRender. Signed-off-by: WU Hang <whlbx@hotmail.com>

wuhang2014 · 2026-05-18T09:10:57Z

diffusion model e2e test with qwen-image:

CUDA_VISIBLE_DEVICES=0  pytest -sv tests/e2e/online_serving/test_qwen_image_expansion.py -m "full_model and diffusion and H100 and not distributed_cuda" --run-level "full_model"

...

--- Running Summary
======================================================================================== 6 passed, 10 deselected, 18 warnings in 1252.22s (0:20:52) ========================================================================================

build_async_omni_from_stage_config was passing vars(args) as **kwargs to AsyncOmni, which then used from_kwargs — which doesn't set _explicit_fields, so explicit_kwargs fell through to the 'all non-None fields' branch, leaking dataclass defaults like model_stage='thinker' as per-stage CLI overrides. For multi-stage models like Qwen3-Omni, this caused the talker stage to incorrectly use model_stage='thinker', resulting in a vocab size mismatch during weight loading. Now uses OmniEngineArgs.from_cli_args(args) which correctly tracks _explicit_fields and respects nullify_stage_engine_defaults. Signed-off-by: WU Hang <whlbx@hotmail.com>

Fields like model_stage, model_arch, engine_output_type, hf_config_name, worker_type, etc. are per-stage settings defined in deploy YAML / PipelineConfig. They should not be global CLI overrides applied to every stage. With from_cli_args, these fields have non-None dataclass defaults (e.g. model_stage='thinker') which are treated as explicit user overrides by explicit_kwargs(), causing them to leak into CLI overrides for ALL stages — overwriting the correct per-stage values from the deploy YAML. Adding them to _OMNI_INTERNAL_KEYS filters them in build_stage_runtime_overrides, so the deploy YAML values win. Signed-off-by: WU Hang <whlbx@hotmail.com>

wuhang2014 · 2026-05-18T13:36:14Z

omni model e2e test with qwen3-omni:

CUDA_VISIBLE_DEVICES=0,1 pytest -sv tests/e2e/online_serving/test_qwen3_omni.py::test_mix_to_text_audio_001

...

--- Running Summary
================================================================================================ 1 passed, 19 warnings in 597.78s (0:09:57) ================================================================================================

…review) Phase 1 - Decompose build_vllm_omni_config: - Split ~250-line factory into _build_cli_overrides, _build_default_diffusion_config, _resolve_stages, and _resolve_async_chunk helpers - Each helper has a clear docstring and single responsibility Phase 2 - Reduce fragility: - Add TestOmniInternalKeysCoverage to keep _OMNI_INTERNAL_KEYS in sync with OmniEngineArgs fields - Narrow _detect_pd_config_from_omega_conf except catch to (ImportError, AttributeError, TypeError); downgrade log level to debug - Rename Orchestrator pd_config param to legacy_pd_config with explicit omni_config.pd_config -> legacy_pd_config fallback Phase 3 - Long-term maintainability: - Replace anonymous type("_DefaultDiffusionCfg") and type("_ReplicaCfg") calls with named frozen dataclasses (DefaultDiffusionCfg, ReplicaInitCfg + ReplicaInitCfgRuntime) - Add DeprecationWarning and TODO(vllm-project#3672) to legacy init path in async_omni_engine.py - Document import-deferral rationale for circular-dependency function-body imports - Expand to_omegaconf() TODO with concrete migration steps referencing PR vllm-project#3672 Signed-off-by: WU Hang <whlbx@hotmail.com>

xiaohajiayou · 2026-05-18T15:49:38Z

            args.model = args.model_tag

+        # Build unified OmniEngineArgs from CLI (replaces scattered kwargs)
+        omni_engine_args = OmniEngineArgs.from_cli_args(args)


from_cli_args() assumes all explicit overrides come from argparse. But overrides can come from both explicit constructor kwargs AND argparse simultaneously.

The old approach handled this by merging args into kwargs;

now that online and offline paths are explicitly separated, this needs to be reconsidered.

alex-jw-brooks

Some thoughts, thanks! I need to understand a bit better how this ties into #3366, so will take another look a bit later, but I think we need to try to simplify the config stuff as much as possible, with more clear patterns for implementing new potential engine types

alex-jw-brooks · 2026-05-18T15:50:57Z

+        "data_parallel_size": getattr(pc, "data_parallel_size", 1) or 1,
+        "prefill_context_parallel_size": getattr(pc, "prefill_context_parallel_size", 1) or 1,
+        "sequence_parallel_size": getattr(pc, "sequence_parallel_size", 1) or 1,
+        "cfg_parallel_size": getattr(pc, "cfg_parallel_size", 1) or 1,


I think this is likely not needed? I.e., why not just check

if config is a vLLM Config, get the attributes for non diffusion models

if config is an OmniDiffusionConfig, get the attributes for diffusion models

in the caller, rather than creating a new dict unifying things? Defaults are already handled in the dataclass init, and we lose type safety by pulling things out into another dict here, and it will get uglier if new engine types are added later on

alex-jw-brooks · 2026-05-18T15:53:09Z

+    pc = config.parallel_config
+
+    # DiffusionParallelConfig uses ``world_size`` as the total device count
+    if hasattr(pc, "world_size"):


This should check that the type is a OmniDiffusionConfig instead of checking hasattr dynamically, and the type hint should be config: OmniDiffusionConfig | VllmConfig so that its more clear. Then we should throw an explicit error if config is not one of those types, not rely on getattr / try excepts to resolve the world size

alex-jw-brooks · 2026-05-18T15:54:12Z

+class ReplicaInitCfgRuntime:
+    """Minimal per-replica runtime info (used instead of OmegaConf roundtrip)."""
+
+    devices: str | None = None


Will this be extended later? Seems strange to have a dataclass wrapping just one field

alex-jw-brooks · 2026-05-18T15:56:51Z

+    # ==================================================================
+
+    @staticmethod
+    def add_cli_args(


This isn't called anwhere with omni_args_only=False, right?

I think we should rename this to something like register_omni_args, remove the param, and assume the parser has already been initialized for vLLM for now, otherwise it's confusing what this is meant to be doing

alex-jw-brooks · 2026-05-18T15:59:01Z

-        engine_args._explicit_fields = frozenset(
-            attr for attr in attrs if hasattr(args, attr) and getattr(args, attr) is not None
-        )
        return engine_args


I think we need to coordinate landing this PR and the tracking parser to make sure behavior is stable #3366

alex-jw-brooks · 2026-05-18T16:00:26Z

+def _get_config_logger():
+    from vllm.logger import init_logger
+
+    return init_logger("vllm_omni.config.vllm_omni_config")


Logger should just be initialized at the top of the file so other things can use it, rather than the fallback case needing to initialize the logger to be used

alex-jw-brooks · 2026-05-18T16:10:33Z

+    if resolved_stages:
+        return bool(
+            getattr(
+                getattr(stage_configs[0], "engine_args", None),


I don't think having special resolution for async chunk as a key is a good idea. It should just be handled by the normal precedence resolution?

I think we should just use the normal resolution and then validate that the set of resolved values for async chunk is consistent across stages, and throw an error if it isn't

alex-jw-brooks · 2026-05-18T16:12:01Z

+    # extract_stage_metadata / build_diffusion_config expect.
+    default_cfg = DefaultDiffusionCfg(
+        engine_args=cli_overrides,
+        runtime={"devices": cli_overrides.get("devices")},


Splitting runtime with special keys out from cli_overrides to pass here feels like an an antipattern to me

alex-jw-brooks · 2026-05-18T16:14:22Z

+# the top-level ``OmniEngineArgs``, not by individual stage engines.
+# Previously derived from ``OrchestratorArgs`` / ``SHARED_FIELDS`` in
+# ``arg_utils``; now defined here as a single source of truth.
+_OMNI_INTERNAL_KEYS: frozenset[str] = frozenset(


I think we need to figure out a better way to handle this with more robust filtering, hardcoding the keys like this is error prone, especially with them being a mix of top-level configuration and stage type specific options

E.g., if these keys are really never expected to be forward to stage init, that means the dependent stages should never expect to access them from the built object, so do we actually need to fully drop them? I think hardcoding keys in this way is more of an indicator that we do not have the right config separation / levels of encapsulation

CC @lishunyang12 @gcanlin also in case either of you have thoughts

xiaohajiayou · 2026-05-18T16:37:00Z

+
+    # ── Cluster / backend ──────────────────────────────────────────
+
+    worker_backend: str = "multi_process"


I think this is where the override semantics start to drift.

Once OmniEngineArgs carries non-None defaults, _build_cli_overrides() will treat them as merge inputs before YAML resolution, which can turn fallback defaults into effective overrides.

Under the current None-sentinel model, defaults should only materialize after CLI/YAML merge is finished, not in the unified input layer.

xiaohajiayou · 2026-05-19T02:39:49Z

+    Pattern (matching vLLM's ``EngineArgs``):
+
+        Online:  ``OmniEngineArgs.from_cli_args(parsed_args)``
+        Offline: ``OmniEngineArgs(model="...", tensor_parallel_size=2, ...)``


Offline OmniEngineArgs(model="...", ...) never sets _explicit_fields.

Without nullification (CLI-only), EngineArgs defaults all leak as global per-stage overrides via _build_cli_overrides, silently clobbering deploy YAML values.

Offline needs create() or equivalent to tag explicit fields.

Signed-off-by: wuhang <wuhang6@huawei.com> # Conflicts: # vllm_omni/engine/arg_utils.py # vllm_omni/engine/async_omni_engine.py # vllm_omni/engine/orchestrator.py # vllm_omni/engine/stage_init_utils.py # vllm_omni/entrypoints/cli/serve.py

…oject#3754 — unified VllmOmniConfig + OmniArgumentParser Introduce a unified VllmOmniConfig frozen dataclass as the single source of truth for all runtime configuration (pipeline topology, per-stage VllmConfig/OmniDiffusionConfig, orchestrator settings, connectors, PD config). Add OmniArgumentParser that injects model-specific defaults from deploy YAML as action.default before parse — replacing nullify_stage_engine_defaults, detect_explicit_cli_keys, deploy_override_field_names, and the two-phase kwargs partitioning (split_kwargs / OrchestratorArgs / SHARED_FIELDS). Removals: - StageConfig, ModelPipeline classes + to_omegaconf() OmegaConf roundtrip - merge_pipeline_deploy, _build_engine_args, _build_extras, _apply_platform_overrides - StageConfigFactory.create_from_model / _create_from_registry / create_default_diffusion - _explicit_fields / explicit_kwargs / create() sentinel on OmniEngineArgs - _create_default_diffusion_stage_cfg + diffusion kwargs injection loop - _strip_single_engine_args + _PARENT_ARGS_KEEP/STRIP/NO_WARN frozensets - load_and_resolve_stage_configs / load_stage_configs_from_yaml - OmniBase.from_cli_args Config flow (after): OmniArgumentParser.parse_args() # inject deploy YAML as action.default OmniEngineArgs.from_cli_args(args) # extract resolved args build_vllm_omni_config(model, ...) # single factory -> VllmOmniConfig AsyncOmniEngine(omni_config=...) # read config directly Net: -1347 lines across 10 files. Signed-off-by: wuh <wuh@users.noreply.github.com>

hsliuustc0106 · 2026-05-28T00:22:55Z

resolve conflicts

wuhang2014 · 2026-05-28T03:42:25Z

resolve conflicts

I'll post a RFC first because the change is huge. Maybe delay this PR until a later release.

hsliuustc0106 · 2026-05-30T16:13:49Z

resolve conflicts

Copilot AI review requested due to automatic review settings May 17, 2026 16:02

wuhang2014 requested review from lishunyang12, tzhouam and yenuo26 as code owners May 17, 2026 16:02

Copilot started reviewing on behalf of wuhang2014 May 17, 2026 16:02 View session

wuhang2014 force-pushed the diffconf-1 branch from 871e152 to 8491be6 Compare May 17, 2026 16:05

Copilot AI reviewed May 17, 2026

View reviewed changes

wuhang2014 force-pushed the diffconf-1 branch 3 times, most recently from c3ca82c to 2c514c1 Compare May 17, 2026 16:29

lishunyang12 reviewed May 17, 2026

View reviewed changes

Gaohan123 added this to the v0.22.0 milestone May 18, 2026

wuhang2014 force-pushed the diffconf-1 branch from 2c514c1 to 8427111 Compare May 18, 2026 02:16

wuhang2014 added 2 commits May 18, 2026 10:28

wuhang2014 force-pushed the diffconf-1 branch from 9979743 to 7eb5d33 Compare May 18, 2026 03:00

fix mkdoc failure

3b694c9

Signed-off-by: WU Hang <whlbx@hotmail.com>

wuhang2014 requested review from Gaohan123, david6666666 and hsliuustc0106 as code owners May 18, 2026 03:10

wuhang2014 added 2 commits May 18, 2026 13:10

wuhang2014 force-pushed the diffconf-1 branch from d9f81f4 to e6c4acb Compare May 18, 2026 08:04

wuhang2014 added 2 commits May 18, 2026 16:18

wuhang2014 added 3 commits May 18, 2026 17:55

Merge branch 'main' into diffconf-1

3b91411

wuhang2014 force-pushed the diffconf-1 branch from 2bfd7fa to 8601019 Compare May 18, 2026 12:45

wuhang2014 force-pushed the diffconf-1 branch from 7aff2d0 to 1991911 Compare May 18, 2026 15:25

xiaohajiayou reviewed May 18, 2026

View reviewed changes

alex-jw-brooks suggested changes May 18, 2026

View reviewed changes

xiaohajiayou reviewed May 18, 2026

View reviewed changes

xiaohajiayou reviewed May 19, 2026

View reviewed changes

wuhang2014 mentioned this pull request May 19, 2026

[RFC]: Replace Sentinel Default Handling for Configs With Tracking Parser #3366

Open

1 task

xiaohajiayou mentioned this pull request May 19, 2026

[RFC]: Model-Aware Argument Default Resolution #3735

Open

1 task

wuhang2014 mentioned this pull request May 20, 2026

[Feature]: Inject model-specific defaults from deploy YAML via OmniArgumentParser #3754

Open

Merge upstream/main into diffconf-1

5ea2be8

Signed-off-by: wuhang <wuhang6@huawei.com> # Conflicts: # vllm_omni/engine/arg_utils.py # vllm_omni/engine/async_omni_engine.py # vllm_omni/engine/orchestrator.py # vllm_omni/engine/stage_init_utils.py # vllm_omni/entrypoints/cli/serve.py

wuhang2014 force-pushed the diffconf-1 branch from 7f479b7 to 5ea2be8 Compare May 21, 2026 09:07

xiaohajiayou mentioned this pull request May 26, 2026

[BugFix] Fix diffusion parallel_config YAML override and add deploy config field allowlist #3483

Merged

hsliuustc0106 added the high priority high priority issue, needs to be done asap label May 28, 2026

hsliuustc0106 removed this from the v0.22.0 milestone May 30, 2026

xiaohajiayou mentioned this pull request May 31, 2026

[Core] Integrate TrackingArgumentParser #3369

Merged

		omni_config = omni_engine_args.create_omni_config()

		if args.headless:

	parser = OmniEngineArgs.add_cli_args(parser)
	parser = OmniEngineArgs.add_cli_args(parser, omni_args_only=True)


		# ── Cluster / backend ──────────────────────────────────────────

		worker_backend: str = "multi_process"

Conversation

wuhang2014 commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot commented May 17, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

lishunyang12 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented May 17, 2026

Uh oh!

wuhang2014 commented May 18, 2026

Critical fixes (commit 8427111b)

Important fixes (commit ffe453b4)

Cleanup (commit 7eb5d334)

Docs fix (commit 3b694c9e)

Open / transitional

Uh oh!

wuhang2014 commented May 18, 2026

Uh oh!

wuhang2014 commented May 18, 2026

Uh oh!

xiaohajiayou May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alex-jw-brooks left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wuhang2014 commented May 17, 2026 •

edited

Loading

lishunyang12 left a comment •

edited

Loading

Critical fixes (commit `8427111b`)

Important fixes (commit `ffe453b4`)

Cleanup (commit `7eb5d334`)

Docs fix (commit `3b694c9e`)

xiaohajiayou May 18, 2026 •

edited

Loading

xiaohajiayou May 19, 2026 •

edited

Loading