[Config Refactor] Introduce unified VllmOmniConfig and consolidate OmniEngineArgs#3672
[Config Refactor] Introduce unified VllmOmniConfig and consolidate OmniEngineArgs#3672wuhang2014 wants to merge 13 commits into
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
There was a problem hiding this comment.
Pull request overview
This PR refactors vLLM-Omni runtime configuration around a unified VllmOmniConfig and consolidated OmniEngineArgs, replacing scattered config construction with a centralized factory path.
Changes:
- Adds
VllmOmniConfig/StageResolvedConfigand routes engine/orchestrator initialization toward resolved stage configs. - Consolidates CLI/offline config fields into
OmniEngineArgsand delegates Omni CLI flag registration to it. - Updates stage initialization, device locking, PD detection, and tests for the new config path.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
vllm_omni/entrypoints/omni_base.py |
Builds OmniEngineArgs/VllmOmniConfig before constructing AsyncOmniEngine. |
vllm_omni/entrypoints/cli/serve.py |
Delegates Omni CLI flags to OmniEngineArgs and starts unified config creation. |
vllm_omni/engine/stage_init_utils.py |
Allows device locking from resolved config objects. |
vllm_omni/engine/orchestrator.py |
Accepts VllmOmniConfig and derives async chunk / PD config from it. |
vllm_omni/engine/async_omni_engine.py |
Adds unified config path for stage planning and initialization. |
vllm_omni/engine/arg_utils.py |
Consolidates Omni args and CLI registration; removes old split/routing helpers. |
vllm_omni/config/vllm_omni_config.py |
Introduces resolved runtime config dataclasses and factory. |
vllm_omni/config/stage_config.py |
Moves internal key filtering into stage config utilities. |
vllm_omni/config/__init__.py |
Exports new config dataclasses. |
tests/test_config_factory.py |
Updates internal-key tests for the new filtering source. |
tests/test_arg_utils.py |
Replaces old split/routing tests with unified args tests. |
tests/entrypoints/test_async_omni_diffusion_config.py |
Adapts default diffusion config helper usage in tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ) | ||
|
|
||
| # Resolve async_chunk: CLI override wins over deploy YAML | ||
| async_chunk: bool = kwargs.get("async_chunk", False) |
| # ── Diffusion model flags (orchestrator-level, not per-stage) ── | ||
| "num_gpus", | ||
| "model_class_name", | ||
| "diffusion_load_format", | ||
| "diffusers_load_kwargs", | ||
| "diffusers_call_kwargs", |
| # Build a minimal stage_cfg stub for replica launch | ||
| replica_cfg = type("_ReplicaCfg", (), { | ||
| "stage_id": src.stage_id, | ||
| "stage_type": src.stage_type, | ||
| "runtime": type("_Runtime", (), { | ||
| "devices": replica_devices_map.get(stage_idx, [None])[replica_id] | ||
| if stage_idx in replica_devices_map else None, | ||
| })(), | ||
| })() |
| self.engine = AsyncOmniEngine( | ||
| model=model, | ||
| engine_args=engine_args, | ||
| init_timeout=init_timeout, | ||
| stage_init_timeout=stage_init_timeout, | ||
| diffusion_batch_size=diffusion_batch_size, | ||
| **kwargs, | ||
| omni_config=self.omni_config, | ||
| engine_args=omni_engine_args, | ||
| init_timeout=self.omni_config.init_timeout, | ||
| stage_init_timeout=self.omni_config.stage_init_timeout, | ||
| diffusion_batch_size=omni_engine_args.diffusion_batch_size, |
| except _json.JSONDecodeError: | ||
| stage_overrides = None |
| else: | ||
| omni_engine_args = OmniEngineArgs(model=model, **kwargs) | ||
|
|
||
| if "log_requests" in kwargs: | ||
| raise TypeError("`log_requests` has been removed in Omni/AsyncOmni. Use `log_stats`.") |
| omni_config = omni_engine_args.create_omni_config() | ||
|
|
||
| if args.headless: |
| # Build kwarg dict from all non-None fields on OmniEngineArgs | ||
| kwargs: dict[str, Any] = { | ||
| f.name: getattr(self, f.name) | ||
| for f in _dc.fields(self) | ||
| if getattr(self, f.name) is not None | ||
| } |
| # Build CLI overrides from engine args (replace kwargs dict) | ||
| if engine_args is not None: | ||
| cli_overrides = _convert_dataclasses_to_dict( | ||
| {f.name: getattr(engine_args, f.name) for f in engine_args.__dataclass_fields__.values() | ||
| if getattr(engine_args, f.name) is not None} |
| # Build replica layout from StageResolvedConfig objects | ||
| stage_configs_for_layout = [ | ||
| type("_LayoutStub", (), {"runtime": s.runtime or {}, "stage_type": s.stage_type})() | ||
|
|
c3ca82c to
2c514c1
Compare
| engine_args._explicit_fields = frozenset( | ||
| attr for attr in attrs if hasattr(args, attr) and getattr(args, attr) is not None | ||
| ) | ||
| return engine_args |
There was a problem hiding this comment.
from_cli_args no longer sets _explicit_fields. Combined with explicit_kwargs() falling into the explicit is None branch (line 591), every non-None field is now treated as an explicit user override — including argparse/dataclass defaults like worker_backend="multi_process", ulysses_mode="strict", cfg_parallel_size=1. Those will clobber deploy-YAML values that intentionally differ. This is the sentinel-tracking we deliberately added; dropping it here is a behavior change. Can we keep _explicit_fields on the online path?
There was a problem hiding this comment.
I think we need to coordinate landing this PR and the tracking parser to make sure behavior is stable #3366
| { | ||
| f.name: getattr(engine_args, f.name) | ||
| for f in engine_args.__dataclass_fields__.values() | ||
| if getattr(engine_args, f.name) is not None |
There was a problem hiding this comment.
Same root cause as the from_cli_args comment: if getattr(engine_args, f.name) is not None can't distinguish a user-set value from a dataclass default, so defaults leak into cli_overrides and override the deploy YAML. The old path filtered by explicit_kwargs() for exactly this reason.
| ) | ||
|
|
||
| # Resolve async_chunk: CLI override wins over deploy YAML | ||
| async_chunk: bool = kwargs.get("async_chunk", False) |
There was a problem hiding this comment.
async_chunk is annotated bool and defaulted to kwargs.get("async_chunk", False), so it's never None. The deploy-YAML async_chunk is only read in the if async_chunk is None branch below (382), which is now dead — the YAML value is silently ignored on the new path. Also, since unset and --no-async-chunk both arrive here as False, an explicit --no-async-chunk is indistinguishable from "not passed".
| pd_config = _detect_pd_config_from_stages(resolved_stages) | ||
|
|
||
| # ── Resolve async_chunk from stage 0 if not explicit ───────────── | ||
| if async_chunk is None and resolved_stages: |
There was a problem hiding this comment.
Unreachable given line 301 — async_chunk can't be None here.
|
|
||
| if not stage_configs: | ||
| # No pipeline config found — build a default single-stage diffusion config | ||
| from omegaconf import OmegaConf |
There was a problem hiding this comment.
The PR description lists "R5: OmegaConf roundtrip eliminated in new path", but this fallback reintroduces OmegaConf. Separately, it hard-codes stage_type="diffusion", so a pure-LLM model with no pipeline config silently becomes a diffusion stage. Worth gating on worker_type before defaulting.
| metadata=replica_metadata, | ||
| stage_connector_spec=src.stage_connector_spec or {}, | ||
| omni_kv_connector=src.omni_kv_connector, | ||
| stage_vllm_config=src.vllm_config, |
There was a problem hiding this comment.
For diffusion stages src.vllm_config is None, and the pre-built src.diffusion_config is never threaded into ReplicaInitPlan. The _ReplicaCfg stub also has no engine_args, so diffusion replica init that rebuilds OmniDiffusionConfig from stage_cfg.engine_args loses the resolved config entirely. How does the diffusion replica get its config on the new path?
| def add_cli_args(cls, parser: argparse.ArgumentParser) -> argparse.ArgumentParser: | ||
| parser = AsyncEngineArgs.add_cli_args(parser) | ||
| parser = OmniEngineArgs._add_omni_specific_args(parser) | ||
| parser = OmniEngineArgs.add_cli_args(parser) |
There was a problem hiding this comment.
AsyncEngineArgs.add_cli_args(parser) already registers EngineArgs flags; calling OmniEngineArgs.add_cli_args(parser) without omni_args_only=True registers them again.
| parser = OmniEngineArgs.add_cli_args(parser) | |
| parser = OmniEngineArgs.add_cli_args(parser, omni_args_only=True) |
| engine_args_dict = build_engine_args_dict( | ||
| stage_cfg, | ||
| model, | ||
| stage_connector_spec=stage_connector_spec, |
There was a problem hiding this comment.
The legacy path passed cli_tokenizer=getattr(self, "tokenizer", None) into build_engine_args_dict; this call drops it, so the CLI --tokenizer override no longer reaches LLM stages.
|
|
||
| # Build unified OmniEngineArgs from CLI (replaces scattered kwargs) | ||
| omni_engine_args = OmniEngineArgs.from_cli_args(args) | ||
| omni_config = omni_engine_args.create_omni_config() |
There was a problem hiding this comment.
create_omni_config() runs for every vllm serve invocation (model download + per-stage VllmConfig build), but the non-headless path calls omni_run_server(args) which ignores omni_config, and run_headless() re-resolves from scratch and ignores the passed value too. So this builds the full config at CLI dispatch and discards it. Either wire it through or build it lazily.
| else: | ||
| omni_engine_args = OmniEngineArgs(model=model, **kwargs) | ||
|
|
||
| if "log_requests" in kwargs: |
There was a problem hiding this comment.
This runs after OmniEngineArgs(model=model, **kwargs) on line 149, so a caller still passing log_requests= hits an opaque dataclass TypeError instead of this message. Move the check above the construction.
|
BLOCKING:
VERDICT: REQUEST_CHANGES |
…niEngineArgs
- Add VllmOmniConfig and StageResolvedConfig frozen dataclasses
(vllm_omni/config/vllm_omni_config.py) as single source of truth for
the runtime config. Contains pre-built VllmConfig/OmniDiffusionConfig
per stage, orchestrator settings, and pipeline topology.
- Add build_vllm_omni_config() factory that consolidates the old
scattered config construction (_resolve_stage_configs, Phase 2
injection loop, build_engine_args_dict, build_vllm_config,
build_diffusion_config, extract_stage_metadata).
- Refactor OmniEngineArgs to absorb OrchestratorArgs and all diffusion
CLI flags into a single dataclass. Add add_cli_args() static method
matching vLLM's EngineArgs/AsyncEngineArgs pattern. Add
create_omni_config() as the unified config entrypoint.
- Delete OrchestratorArgs, split_kwargs, SHARED_FIELDS, and all
related kwargs-partitioning infrastructure (~250 lines removed).
- Eliminate redundancies:
- R1: duplicate build_engine_args_dict in _initialize_llm_replica
(use pre-built VllmConfig.parallel_config.tensor_parallel_size)
- R2: duplicate extract_stage_metadata in _build_logical_stage_init_plans
(new path reads from pre-built StageResolvedConfig)
- R3: two-phase config injection absorbed into factory
- R4: unified default diffusion builder
- R5: OmegaConf roundtrip eliminated in new path
- Refactor acquire_device_locks to accept VllmConfig | OmniDiffusionConfig
directly (add _extract_device_counts helper for type dispatch).
- Delegate CLI arg registration in OmniServeCommand.subparser_init
to OmniEngineArgs.add_cli_args(omni_args_only=True) (~400 lines removed).
- Propagate VllmOmniConfig through AsyncOmniEngine and Orchestrator.
- Add _OMNI_INTERNAL_KEYS to stage_config.py (replaces deleted syms).
- Update tests: remove split_kwargs/OrchestratorArgs tests, add
OmniEngineArgs.add_cli_args/from_cli_args/create_omni_config tests.
Signed-off-by: WU Hang <whlbx@hotmail.com>
- Fix 8: add diffusion_config field to ReplicaInitPlan and carry pre-built OmniDiffusionConfig through the new path so diffusion replica init uses the resolved config instead of rebuilding from the empty _ReplicaCfg stub (which has no engine_args). - Fix 9: gate the no-pipeline fallback on engine_args.worker_type. Pure LLM models (worker_type ar/generation/None) no longer silently receive a broken diffusion stage. - Fix 10: forward --tokenizer CLI override through build_engine_args_dict() in build_vllm_omni_config. The orchestrator-level tokenizer was dropped because it is in _OMNI_INTERNAL_KEYS, but the legacy path passed it explicitly via cli_tokenizer=. - Fix 11: move log_requests check before OmniEngineArgs construction in OmniBase.__init__ so callers still passing log_requests= see the intentional migration TypeError instead of an opaque dataclass TypeError. Signed-off-by: WU Hang <whlbx@hotmail.com>
Replace OmegaConf.create() in the default diffusion config fallback with a simple type()-constructed stub. This eliminates the last OmegaConf usage in the new unified config path (the legacy path still uses OmegaConf, which is unchanged). Signed-off-by: WU Hang <whlbx@hotmail.com>
Signed-off-by: WU Hang <whlbx@hotmail.com>
|
Thanks for the thorough review @lishunyang12, @hsliuustc0106, and @copilot. Here is a summary of the changes since the initial review feedback: Critical fixes (commit
|
| # | Issue | Fix |
|---|---|---|
| 1 | from_cli_args missing _explicit_fields — deploy YAML values clobbered by defaults |
Restored _explicit_fields tracking; create_omni_config now uses explicit_kwargs() instead of iterating all non-None fields |
| 2 | async_chunk always False — deploy YAML value silently ignored |
Changed to kwargs.get("async_chunk") (None when unset) so the deploy-YAML fallback branch is reachable |
| 3 | _StageCfgWrapper hard-codes is_prefill_only=False / is_decode_only=False — PD detection permanently broken |
Replaced with _detect_pd_config_from_omega_conf that reads the real OmegaConf stage configs |
| 4 | Diffusion CLI flags (--num-gpus, --cache-backend, etc.) in _OMNI_INTERNAL_KEYS filtered out |
Removed 33 diffusion-specific engine args from the internal keys frozenset |
| 5 | Layout stubs missing engine_args.parallel_config — multi-GPU diffusion replicas default to 1 device |
Layout stubs now include engine_args with parallel_config from pre-built VllmConfig / OmniDiffusionConfig |
| 6 | num_replicas arms inverted (IndexError) |
Flipped to correct order: replicas_per_stage[stage_idx] if stage_idx < len(...) else src.num_replicas |
| 7 | OmniAsyncEngineArgs.add_cli_args double-registers vLLM parent flags |
Added omni_args_only=True |
Important fixes (commit ffe453b4)
| # | Issue | Fix |
|---|---|---|
| 8 | Diffusion replica missing pre-built config — _ReplicaCfg stubs have no engine_args, diffusion init rebuilds from empty stub |
Added diffusion_config field to ReplicaInitPlan; new path carries src.diffusion_config; _initialize_diffusion_replica uses plan.diffusion_config when available |
| 9 | No-pipeline fallback hard-codes diffusion for pure LLM models | Gated on engine_args.worker_type — LLM models (ar/generation/None) return empty config instead of broken diffusion stage |
| 10 | --tokenizer CLI override dropped |
build_engine_args_dict call now forwards cli_tokenizer=getattr(engine_args, "tokenizer", None) |
| 11 | log_requests check after OmniEngineArgs construction (opaque TypeError) |
Moved check before construction |
Cleanup (commit 7eb5d334)
- Removed last OmegaConf usage in new config path (replaced
OmegaConf.create()withtype()-constructed stub in the no-pipeline fallback)
Docs fix (commit 3b694c9e)
- RTD build:
generate_argparse.pynow statically extractsOmniEngineArgs.add_cli_argsfrom source and execs it in a sandbox with mockEngineArgs(no-op), so all real OmniConfig flags appear in docs without vLLM dependency
Open / transitional
omni_configbuilt but not consumed inserve.py cmd()non-headless path — this is intentional transitional state;omni_run_server(args)still receivesargsdirectly until downstream wiring is complete in a follow-up PR--stage-overridesJSON parse now raisesValueErroron failure instead of silently swallowing- PD detection failure now logs a warning before returning
None
… is provided When omni_config is available (new unified path), engine_args is only used for tokenizer/single-stage field extraction, not for kwargs merging. The old guard rejected bare OmniEngineArgs objects that don't carry _explicit_fields, but OmniBase now constructs OmniEngineArgs directly from kwargs without calling create(). Only enforce _explicit_fields when omni_config is None (old path). When omni_config IS available, extract explicit kwargs only if _explicit_fields exists, otherwise skip the merge. This fixes the e2e tests that were getting: TypeError: engine_args=OmniEngineArgs(...) is ambiguous under sentinel-default precedence Signed-off-by: WU Hang <whlbx@hotmail.com>
When the online serving path passes vars(args) as **kwargs to OmniBase, it includes vLLM server flags (subparser, host, port, api_key, ssl_*, etc.) that are not fields on OmniEngineArgs. The bare OmniEngineArgs(model=model, **kwargs) constructor rejects unknown fields with TypeError. Filter kwargs to only OmniEngineArgs dataclass fields before constructing. This replaces the old split_kwargs behavior that partitioned CLI flags into orchestrator/engine/server buckets. Signed-off-by: WU Hang <whlbx@hotmail.com>
None means 'not specified', which is the common case for diffusion models that don't have a PipelineConfig registry entry. Only skip the diffusion fallback when the user explicitly requests an LLM worker (--worker-type ar / generation). This fixes '0 stages' for Qwen/Qwen-Image and other diffusion models using the online serve path. Signed-off-by: WU Hang <whlbx@hotmail.com>
When the unified config path is active, OmniBase.stage_configs now returns list(self.omni_config.stages) instead of the empty engine.stage_configs fallback. This is needed for the API server's is_pure_diffusion detection, which reads stage_configs to determine whether to use the diffusion-only init path. Previously, AsyncOmniEngine.__init__ set self.stage_configs = [] when omni_config is available, causing the server to fall through to the LLM init path and crash with AttributeError in OpenAIServingRender. Signed-off-by: WU Hang <whlbx@hotmail.com>
|
diffusion model e2e test with qwen-image: |
build_async_omni_from_stage_config was passing vars(args) as **kwargs to AsyncOmni, which then used from_kwargs — which doesn't set _explicit_fields, so explicit_kwargs fell through to the 'all non-None fields' branch, leaking dataclass defaults like model_stage='thinker' as per-stage CLI overrides. For multi-stage models like Qwen3-Omni, this caused the talker stage to incorrectly use model_stage='thinker', resulting in a vocab size mismatch during weight loading. Now uses OmniEngineArgs.from_cli_args(args) which correctly tracks _explicit_fields and respects nullify_stage_engine_defaults. Signed-off-by: WU Hang <whlbx@hotmail.com>
Fields like model_stage, model_arch, engine_output_type, hf_config_name, worker_type, etc. are per-stage settings defined in deploy YAML / PipelineConfig. They should not be global CLI overrides applied to every stage. With from_cli_args, these fields have non-None dataclass defaults (e.g. model_stage='thinker') which are treated as explicit user overrides by explicit_kwargs(), causing them to leak into CLI overrides for ALL stages — overwriting the correct per-stage values from the deploy YAML. Adding them to _OMNI_INTERNAL_KEYS filters them in build_stage_runtime_overrides, so the deploy YAML values win. Signed-off-by: WU Hang <whlbx@hotmail.com>
|
omni model e2e test with qwen3-omni: |
…review)
Phase 1 - Decompose build_vllm_omni_config:
- Split ~250-line factory into _build_cli_overrides, _build_default_diffusion_config,
_resolve_stages, and _resolve_async_chunk helpers
- Each helper has a clear docstring and single responsibility
Phase 2 - Reduce fragility:
- Add TestOmniInternalKeysCoverage to keep _OMNI_INTERNAL_KEYS in sync with
OmniEngineArgs fields
- Narrow _detect_pd_config_from_omega_conf except catch to (ImportError,
AttributeError, TypeError); downgrade log level to debug
- Rename Orchestrator pd_config param to legacy_pd_config with explicit
omni_config.pd_config -> legacy_pd_config fallback
Phase 3 - Long-term maintainability:
- Replace anonymous type("_DefaultDiffusionCfg") and type("_ReplicaCfg") calls
with named frozen dataclasses (DefaultDiffusionCfg, ReplicaInitCfg +
ReplicaInitCfgRuntime)
- Add DeprecationWarning and TODO(vllm-project#3672) to legacy init path in
async_omni_engine.py
- Document import-deferral rationale for circular-dependency function-body imports
- Expand to_omegaconf() TODO with concrete migration steps referencing PR vllm-project#3672
Signed-off-by: WU Hang <whlbx@hotmail.com>
…review)
Phase 1 - Decompose build_vllm_omni_config:
- Split ~250-line factory into _build_cli_overrides, _build_default_diffusion_config,
_resolve_stages, and _resolve_async_chunk helpers
- Each helper has a clear docstring and single responsibility
Phase 2 - Reduce fragility:
- Add TestOmniInternalKeysCoverage to keep _OMNI_INTERNAL_KEYS in sync with
OmniEngineArgs fields
- Narrow _detect_pd_config_from_omega_conf except catch to (ImportError,
AttributeError, TypeError); downgrade log level to debug
- Rename Orchestrator pd_config param to legacy_pd_config with explicit
omni_config.pd_config -> legacy_pd_config fallback
Phase 3 - Long-term maintainability:
- Replace anonymous type("_DefaultDiffusionCfg") and type("_ReplicaCfg") calls
with named frozen dataclasses (DefaultDiffusionCfg, ReplicaInitCfg +
ReplicaInitCfgRuntime)
- Add DeprecationWarning and TODO(vllm-project#3672) to legacy init path in
async_omni_engine.py
- Document import-deferral rationale for circular-dependency function-body imports
- Expand to_omegaconf() TODO with concrete migration steps referencing PR vllm-project#3672
Signed-off-by: WU Hang <whlbx@hotmail.com>
| args.model = args.model_tag | ||
|
|
||
| # Build unified OmniEngineArgs from CLI (replaces scattered kwargs) | ||
| omni_engine_args = OmniEngineArgs.from_cli_args(args) |
There was a problem hiding this comment.
from_cli_args() assumes all explicit overrides come from argparse. But overrides can come from both explicit constructor kwargs AND argparse simultaneously.
- The old approach handled this by merging args into kwargs;
- now that online and offline paths are explicitly separated, this needs to be reconsidered.
alex-jw-brooks
left a comment
There was a problem hiding this comment.
Some thoughts, thanks! I need to understand a bit better how this ties into #3366, so will take another look a bit later, but I think we need to try to simplify the config stuff as much as possible, with more clear patterns for implementing new potential engine types
| "data_parallel_size": getattr(pc, "data_parallel_size", 1) or 1, | ||
| "prefill_context_parallel_size": getattr(pc, "prefill_context_parallel_size", 1) or 1, | ||
| "sequence_parallel_size": getattr(pc, "sequence_parallel_size", 1) or 1, | ||
| "cfg_parallel_size": getattr(pc, "cfg_parallel_size", 1) or 1, |
There was a problem hiding this comment.
I think this is likely not needed? I.e., why not just check
- if config is a vLLM Config, get the attributes for non diffusion models
- if config is an OmniDiffusionConfig, get the attributes for diffusion models
in the caller, rather than creating a new dict unifying things? Defaults are already handled in the dataclass init, and we lose type safety by pulling things out into another dict here, and it will get uglier if new engine types are added later on
| pc = config.parallel_config | ||
|
|
||
| # DiffusionParallelConfig uses ``world_size`` as the total device count | ||
| if hasattr(pc, "world_size"): |
There was a problem hiding this comment.
This should check that the type is a OmniDiffusionConfig instead of checking hasattr dynamically, and the type hint should be config: OmniDiffusionConfig | VllmConfig so that its more clear. Then we should throw an explicit error if config is not one of those types, not rely on getattr / try excepts to resolve the world size
| class ReplicaInitCfgRuntime: | ||
| """Minimal per-replica runtime info (used instead of OmegaConf roundtrip).""" | ||
|
|
||
| devices: str | None = None |
There was a problem hiding this comment.
Will this be extended later? Seems strange to have a dataclass wrapping just one field
| # ================================================================== | ||
|
|
||
| @staticmethod | ||
| def add_cli_args( |
There was a problem hiding this comment.
This isn't called anwhere with omni_args_only=False, right?
I think we should rename this to something like register_omni_args, remove the param, and assume the parser has already been initialized for vLLM for now, otherwise it's confusing what this is meant to be doing
| engine_args._explicit_fields = frozenset( | ||
| attr for attr in attrs if hasattr(args, attr) and getattr(args, attr) is not None | ||
| ) | ||
| return engine_args |
There was a problem hiding this comment.
I think we need to coordinate landing this PR and the tracking parser to make sure behavior is stable #3366
| def _get_config_logger(): | ||
| from vllm.logger import init_logger | ||
|
|
||
| return init_logger("vllm_omni.config.vllm_omni_config") |
There was a problem hiding this comment.
Logger should just be initialized at the top of the file so other things can use it, rather than the fallback case needing to initialize the logger to be used
| if resolved_stages: | ||
| return bool( | ||
| getattr( | ||
| getattr(stage_configs[0], "engine_args", None), |
There was a problem hiding this comment.
I don't think having special resolution for async chunk as a key is a good idea. It should just be handled by the normal precedence resolution?
I think we should just use the normal resolution and then validate that the set of resolved values for async chunk is consistent across stages, and throw an error if it isn't
| # extract_stage_metadata / build_diffusion_config expect. | ||
| default_cfg = DefaultDiffusionCfg( | ||
| engine_args=cli_overrides, | ||
| runtime={"devices": cli_overrides.get("devices")}, |
There was a problem hiding this comment.
Splitting runtime with special keys out from cli_overrides to pass here feels like an an antipattern to me
| # the top-level ``OmniEngineArgs``, not by individual stage engines. | ||
| # Previously derived from ``OrchestratorArgs`` / ``SHARED_FIELDS`` in | ||
| # ``arg_utils``; now defined here as a single source of truth. | ||
| _OMNI_INTERNAL_KEYS: frozenset[str] = frozenset( |
There was a problem hiding this comment.
I think we need to figure out a better way to handle this with more robust filtering, hardcoding the keys like this is error prone, especially with them being a mix of top-level configuration and stage type specific options
E.g., if these keys are really never expected to be forward to stage init, that means the dependent stages should never expect to access them from the built object, so do we actually need to fully drop them? I think hardcoding keys in this way is more of an indicator that we do not have the right config separation / levels of encapsulation
CC @lishunyang12 @gcanlin also in case either of you have thoughts
|
|
||
| # ── Cluster / backend ────────────────────────────────────────── | ||
|
|
||
| worker_backend: str = "multi_process" |
There was a problem hiding this comment.
I think this is where the override semantics start to drift.
- Once
OmniEngineArgscarries non-None defaults,_build_cli_overrides()will treat them as merge inputs before YAML resolution, which can turn fallback defaults into effective overrides. - Under the current
None-sentinel model, defaults should only materialize after CLI/YAML merge is finished, not in the unified input layer.
| Pattern (matching vLLM's ``EngineArgs``): | ||
|
|
||
| Online: ``OmniEngineArgs.from_cli_args(parsed_args)`` | ||
| Offline: ``OmniEngineArgs(model="...", tensor_parallel_size=2, ...)`` |
There was a problem hiding this comment.
Offline OmniEngineArgs(model="...", ...) never sets _explicit_fields.
- Without nullification (CLI-only), EngineArgs defaults all leak as global per-stage overrides via
_build_cli_overrides, silently clobbering deploy YAML values. - Offline needs
create()or equivalent to tag explicit fields.
Signed-off-by: wuhang <wuhang6@huawei.com> # Conflicts: # vllm_omni/engine/arg_utils.py # vllm_omni/engine/async_omni_engine.py # vllm_omni/engine/orchestrator.py # vllm_omni/engine/stage_init_utils.py # vllm_omni/entrypoints/cli/serve.py
…oject#3754 — unified VllmOmniConfig + OmniArgumentParser Introduce a unified VllmOmniConfig frozen dataclass as the single source of truth for all runtime configuration (pipeline topology, per-stage VllmConfig/OmniDiffusionConfig, orchestrator settings, connectors, PD config). Add OmniArgumentParser that injects model-specific defaults from deploy YAML as action.default before parse — replacing nullify_stage_engine_defaults, detect_explicit_cli_keys, deploy_override_field_names, and the two-phase kwargs partitioning (split_kwargs / OrchestratorArgs / SHARED_FIELDS). Removals: - StageConfig, ModelPipeline classes + to_omegaconf() OmegaConf roundtrip - merge_pipeline_deploy, _build_engine_args, _build_extras, _apply_platform_overrides - StageConfigFactory.create_from_model / _create_from_registry / create_default_diffusion - _explicit_fields / explicit_kwargs / create() sentinel on OmniEngineArgs - _create_default_diffusion_stage_cfg + diffusion kwargs injection loop - _strip_single_engine_args + _PARENT_ARGS_KEEP/STRIP/NO_WARN frozensets - load_and_resolve_stage_configs / load_stage_configs_from_yaml - OmniBase.from_cli_args Config flow (after): OmniArgumentParser.parse_args() # inject deploy YAML as action.default OmniEngineArgs.from_cli_args(args) # extract resolved args build_vllm_omni_config(model, ...) # single factory -> VllmOmniConfig AsyncOmniEngine(omni_config=...) # read config directly Net: -1347 lines across 10 files. Signed-off-by: wuh <wuh@users.noreply.github.com>
|
resolve conflicts |
I'll post a RFC first because the change is huge. Maybe delay this PR until a later release. |
|
resolve conflicts |
Purpose
Introduce a unified
VllmOmniConfigfrozen dataclass as the single source of truth for all runtime configuration (pipeline topology, per-stage VllmConfig/OmniDiffusionConfig, orchestrator settings, connectors, PD config). ConsolidateOmniEngineArgsfollowing vLLM'sEngineArgs→VllmConfigpattern to eliminate scattered kwargs,split_kwargspartitioning (OrchestratorArgs+SHARED_FIELDS), and the two-phase config injection loop.Six redundancies addressed:
build_engine_args_dictin_initialize_llm_replica(now reads TP from pre-builtVllmConfig.parallel_config)extract_stage_metadataper replica (new path reads fromStageResolvedConfig)build_vllm_omni_configTest Plan
Test Result
(skipped:
test_create_omni_config_returns_vllm_omni_config— requires local model path viaVLLM_OMNI_TEST_MODEL)Essential Elements Checklist
pytest tests/test_arg_utils.py tests/entrypoints/test_async_omni_diffusion_config.pyOmniEngineArgs.add_cli_args(omni_args_only=True)is the only new public interface