[Config Refactor] Derive deploy override fields from stage config by xiaohajiayou · Pull Request #3162 · vllm-project/vllm-omni

xiaohajiayou · 2026-04-26T16:55:08Z

Purpose

Move deploy override field ownership from arg_utils.py to stage_config.py.

The nullify path now derives deploy-overridable fields from the deploy schema/merge logic instead of maintaining a duplicated manual allowlist in arg_utils.py. This keeps the field set aligned with DeployConfig, StageDeployConfig, and special deploy/runtime fields such as async_chunk and devices.

Test Plan

Run syntax checks for touched modules.
Run entrypoint tests covering direct Omni(...) construction and deprecated from_cli_args(...) compatibility.
Run targeted nullify/from_cli_args tests.

Test Result

Passed: .venv/bin/python -m py_compile ...
Passed: .venv/bin/pytest tests/entrypoints/test_omni_entrypoints.py -q
- 39 passed
Passed: targeted entrypoint nullify/from_cli_args tests
- 2 passed

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b54ccf2c68

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

hsliuustc0106

Cross-PR Config Refactor Review

This is one of several coordinated config refactor PRs. Reviewed together with #3160, #3154, #3144, #3128, #3120, #3139.

What this PR does

Moves deploy_override_field_names() from arg_utils.py to stage_config.py, where the deploy schema fields actually live. The override field set is now computed from _STAGE_DEPLOY_FIELDS | _PIPELINE_WIDE_ENGINE_FIELDS | {"async_chunk", "devices"}.

Assessment

Clean, small, single-purpose. No issues.

Merge order note

This PR touches omni_base.py (the from_cli_args import path), which is also touched by #3160 (rewrites from_cli_args) and #3144 (deprecates it). These should merge in order: #3162 → #3144 → #3160, to minimize conflict surface. #3160 removes the section that #3162 modifies, so #3162 must merge first.

hsliuustc0106

Cross-PR Config Refactor Review

This is one of several coordinated config refactor PRs. Reviewed together with #3160, #3154, #3144, #3128, #3120, #3139.

What this PR does

Moves deploy_override_field_names() from arg_utils.py to stage_config.py, where the deploy schema fields actually live. The override field set is now computed from the stage deploy fields, pipeline-wide engine fields, plus async_chunk and devices.

Assessment

Clean, small, single-purpose. No issues.

Merge order note

This PR touches omni_base.py (the from_cli_args import path), which is also touched by #3160 (rewrites from_cli_args) and #3144 (deprecates it). These should merge in order: #3162 -> #3144 -> #3160, to minimize conflict surface. #3160 removes the section that #3162 modifies, so #3162 must merge first.

gcanlin

I think we have two concerns about #3078:

the maintainability of white list for engine args,
the default value consistency with upstream

This PR has fixed 1 clearly. Could you please confirm the second point?

xiaohajiayou · 2026-04-27T07:58:15Z

2. the default value consistency with upstream

I’ve considered this issue as well, and it shouldn’t be a problem with the current implementation.

For upstream-owned fields, the intended behavior is:

explicit user value > deploy/yaml value > upstream vLLM default

The key detail is that upstream default consistency does not come from downstream treating None as a default. Instead, non-explicit None values are dropped during stage resolution, so the field is omitted from the resolved stage config. Then OmniEngineArgs(**...) falls back to the inherited upstream default.

For llm stage, the default-value / precedence flow (new deploy path) is like:

                     user input
            ┌──────────────────────────┐
            │ CLI / parser / kwargs /  │
            │ engine_args.create(...)  │
            └────────────┬─────────────┘
                         │
                         │ non-explicit override fields
                         │ are normalized to None
                         ▼
            ┌──────────────────────────┐
            │ _resolve_stage_configs() │
            │ load_and_resolve_stage   │
            │ _configs(...)            │
            └────────────┬─────────────┘
                         │
                         │ new deploy path:
                         │ only non-None explicit overrides survive
                         ▼
            ┌──────────────────────────┐
            │ StageConfigFactory       │
            │ _create_from_registry()  │
            │                          │
            │ explicit_overrides =     │
            │ {k: v for k, v in        │
            │  cli_overrides.items()   │
            │  if v is not None}       │
            └────────────┬─────────────┘
                         │
                         │ None values are dropped here
                         ▼
            ┌──────────────────────────┐
            │ StageConfig              │
            │                          │
            │ yaml_engine_args         │
            │ + runtime_overrides      │
            └────────────┬─────────────┘
                         │
                         │ if a field was not explicitly overridden:
                         │ - use deploy/yaml value if present
                         │ - otherwise omit the field
                         ▼
            ┌──────────────────────────┐
            │ to_omegaconf()           │
            │ resolved stage_config    │
            └────────────┬─────────────┘
                         │
                         ▼
            ┌──────────────────────────┐
            │ build_vllm_config()      │
            │ OmniEngineArgs(**dict)   │
            └────────────┬─────────────┘
                         │
                         │ omitted field => dataclass default applies
                         ▼
            ┌──────────────────────────┐
            │ OmniEngineArgs defaults  │
            │                          │
            │ upstream-owned fields    │
            │ inherit vLLM defaults    │
            │ (dtype="auto",           │
            │  enforce_eager=False,    │
            │  tp=1, etc.)             │
            └────────────┬─────────────┘
                         │
                         ▼
            ┌──────────────────────────┐
            │ create_engine_config()   │
            │ super().create_model_    │
            │ config()                 │
            └────────────┬─────────────┘
                         │
                         ▼
            ┌──────────────────────────┐
            │ final vLLM ModelConfig   │
            │ / VllmConfig             │
            └──────────────────────────┘

gcanlin · 2026-04-28T01:27:26Z

It seems that there exists the same bug in this PR. I pull it and test the bad case:

stages:
  - stage_id: 0
    gpu_memory_utilization: 0.9
    devices: "0"
    profiler_config:
      profiler: torch
      torch_profiler_dir: ./thinker-omni
    default_sampling_params:
      temperature: 0.4
      top_p: 0.9
      top_k: 1
      max_tokens: 2048
      seed: 42
      repetition_penalty: 1.05

(APIServer pid=1102818) INFO 04-28 01:25:40 [serving.py:45] OpenAIServingRealtime initialized for task: realtime
(APIServer pid=1102818) INFO 04-28 01:25:40 [api_server.py:416] Starting vLLM API server 0 on http://0.0.0.0:8091
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:37] Available routes are:
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /openapi.json, Methods: GET, HEAD
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /docs, Methods: GET, HEAD
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /docs/oauth2-redirect, Methods: GET, HEAD
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /redoc, Methods: GET, HEAD
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /tokenize, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /detokenize, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /load, Methods: GET
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /version, Methods: GET
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /health, Methods: GET
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /metrics, Methods: GET
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /ping, Methods: GET
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /ping, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /invocations, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/chat/completions/batch, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/responses, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/responses/{response_id}, Methods: GET
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/responses/{response_id}/cancel, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/completions, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/messages, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/messages/count_tokens, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /inference/v1/generate, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /scale_elastic_ep, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /is_scaling_elastic_ep, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/chat/completions/render, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/completions/render, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/chat/completions, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/audio/speech, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/audio/speech/batch, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/audio/voices, Methods: GET
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/audio/voices, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/audio/voices/{name}, Methods: DELETE
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /health, Methods: GET
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/models, Methods: GET
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/images/generations, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/images/edits, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/videos, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/videos/sync, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/videos, Methods: GET
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/videos/{video_id}, Methods: GET
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/videos/{video_id}, Methods: DELETE
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/videos/{video_id}/content, Methods: GET
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/omni/sleep, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/omni/wakeup, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:57] Route: /v1/audio/speech/stream, Endpoint: streaming_speech
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:57] Route: /v1/video/chat/stream, Endpoint: streaming_video_chat
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:57] Route: /v1/realtime, Endpoint: realtime_websocket
(APIServer pid=1102818) INFO:     Started server process [1102818]
(APIServer pid=1102818) INFO:     Waiting for application startup.
(APIServer pid=1102818) INFO:     Application startup complete.
(APIServer pid=1102818) INFO:     127.0.0.1:41786 - "POST /start_profile HTTP/1.1" 404 Not Found

xiaohajiayou · 2026-04-28T10:31:09Z

It seems that there exists the same bug in this PR. I pull it and test the bad case:
stages:
  - stage_id: 0
    gpu_memory_utilization: 0.9
    devices: "0"
    profiler_config:
      profiler: torch
      torch_profiler_dir: ./thinker-omni

Root cause
profiler_config was not included in StageDeployConfig / DeployConfig, so it was missing from the whitelist used by:
nullify_stage_engine_defaults()
As a result, in the online serving path, the default value (ProfilerConfig()) coming from vLLM via:
serve_parser = make_arg_parser(serve_parser)
was not cleared, and incorrectly overwrote the deploy YAML, leading to /start_profile returning 404.
Fix in 20890dc
- Add missing fields (including profiler_config) to StageDeployConfig / DeployConfig
- Normalize several fields to None (instead of concrete defaults)

This ensures that only values explicitly provided in the deploy YAML are included in the resulting StageConfig, while missing fields are left unset and handled by downstream default resolution. CLI defaults (both online and offline) no longer leak into the final config, like:

For LLM stages, missing fields fall back to the default values defined in vLLM EngineArgs:

vllm-omni/vllm_omni/engine/stage_init_utils.py

Lines 433 to 441 in e18cb89

    
           if engine_args_dict is None: 
        
               engine_args_dict = build_engine_args_dict( 
        
                   stage_config, 
        
                   model, 
        
                   stage_connector_spec=stage_connector_spec, 
        
               ) 
        
           filtered_engine_args_dict = filter_dataclass_kwargs(OmniEngineArgs, engine_args_dict) 
        
           omni_engine_args = OmniEngineArgs(**filtered_engine_args_dict)

For diffusion stages, missing fields are handled by _create_default_diffusion_stage_cfg, which provides safe defaults:

vllm-omni/vllm_omni/engine/async_omni_engine.py

Lines 1285 to 1292 in e18cb89

    
           stage_engine_args = { 
        
               "max_num_seqs": 1, 
        
               "parallel_config": parallel_config, 
        
               "model_class_name": kwargs.get("model_class_name", None), 
        
               "step_execution": kwargs.get("step_execution", False), 
        
               "vae_use_slicing": kwargs.get("vae_use_slicing", False), 
        
               "vae_use_tiling": kwargs.get("vae_use_tiling", False), 
        
               "cache_backend": cache_backend,

This unifies the override behavior while delegating default resolution to the appropriate downstream layer.

xiaohajiayou · 2026-04-28T11:16:17Z

Related context

[Refactor] Remove redundant StageDeployConfig fields, delegate to vLLM defaults #3128: Removed the duplicated maintenance of vLLM fields and defaults in StageDeployConfig and DeployConfig
[Refactor] Replace args whitelist with explicit CLI key detection #3160: Exposed an issue where
```
serve_parser = make_arg_parser(serve_parser)
```
pulls in vLLM defaults, which override values defined in the deploy YAML

Follow-up question

This leads to a design question:

Should deploy YAML fields be fully defined via StageDeployConfig / DeployConfig,
or should users be allowed to extend them freely?

Option 1: Strict schema

Keep StageDeployConfig / DeployConfig as the full schema, but set all defaults to None.
Final defaults are resolved only through:

OmniEngineArgs(**filtered_engine_args_dict)

Implications:

nullify can use these fields as a whitelist to preserve override semantics
Default resolution is unified in EngineArgs
The changes in [Refactor] Remove redundant StageDeployConfig fields, delegate to vLLM defaults #3128 may no longer be necessary under this model

Option 2: Allow YAML extensibility

Allow users to extend deploy YAML beyond the predefined schema.

Current issue:

profiler_config is not defined in StageDeployConfig, but can still be parsed into stage_config
However, in the online path,
```
serve_parser = make_arg_parser(serve_parser)
```
injects vLLM defaults
Since profiler_config is not part of the schema, its CLI default takes highest precedence and overrides the user-provided YAML value

In this case, if we want to support extensible YAML with correct override semantics:

We must ensure that all inputs before stage_config construction are explicit user inputs only
In the online path, this likely requires:
- removing make_arg_parser(serve_parser)
- and applying default completion after stage_config construction via EngineArgs
Alternatively, we could maintain a whitelist of diffusion-specific fields, and set all other fields to None before constructing stage_config, so that only explicitly supported fields participate in override resolution

This approach is more flexible, but may be more complex to maintain.

Could you please take a look when you have time and advise how to resolve these issues?
@lishunyang12 @gcanlin @hsliuustc0106

gcanlin · 2026-04-28T12:02:21Z

+    data_parallel_size: int | None = None
+    pipeline_parallel_size: int | None = None
+    config_format: str | None = None
+    load_format: str | None = None
+    tokenizer_mode: str | None = None


This change will also lead to the same issue that is setting the whitelist for the engine args of vllm.

Yes, I agree this points to a more fundamental issue: how to determine which fields are truly intended to be user-overridable.

The complexity comes from the fact that we currently have three different input sources into Omni(), while also mixing LLM and diffusion configurations in the same flow.

To correctly distinguish explicit user inputs, we have two relatively straightforward options:

In the current override mechanism, any field with a non-None value when constructing StageConfig is treated as an explicit user input.

Maintain a complete field set in StageDeployConfig / DeployConfig (i.e., a whitelist), and set all defaults to None, so that only explicitly provided values participate in override

Alternatively, we would need to maintain a diffusion-specific blocklist, and set all other fields to None.

Otherwise, we would need to introduce a more complex mechanism, such as:

removing make_arg_parser(serve_parser) in the online path

and applying default completion only after stage_config construction via EngineArgs

The offline path is still unclear under this approach and would require further design.

lishunyang12

Re: design question — I'd lean Option 1. The whitelist drift is what just bit us with profiler_config, and Option 2 has the same shape — every new vLLM EngineArg silently breaks override semantics until someone hits it. Strict schema forces the failure at PR time, which is cheaper than runtime. The sync tax per vLLM bump is real but small.

lishunyang12 · 2026-04-28T12:42:38Z

+    gpu_memory_utilization: float | None = None
+    tensor_parallel_size: int | None = None
+    enforce_eager: bool | None = None
    max_num_batched_tokens: int = 32768


Intentional to keep max_num_seqs, max_num_batched_tokens, and trust_remote_code (L446) concrete? Same leakage path as the others if the user omits them — these would still override upstream defaults.

I have not changed these defaults to None in this PR yet because their current values are not the same as vLLM defaults, and #3128's description explicitly listed some of them as retained vLLM-Omni-specific behavior, e.g. max_num_seqs (64 vs vLLM's 256).

That said, I noticed that #3128's actual diff also removes max_num_seqs. So if the intended direction is to delegate these defaults back to vLLM, then I agree these fields can likely be changed to None as well, I updated in 418c850

lishunyang12 · 2026-04-28T12:48:26Z

how to choose what do you think?

Tbh I'd go Option 1. The profiler_config bug is exactly the failure mode Option 2 perpetuates — a field silently mis-overridden because the schema doesn't know about it, surfaced only at user runtime. Option 1 makes adding a vLLM field an explicit PR-time decision instead of "did anyone notice?".

The maintenance tax is overstated IMO — this PR added 8 fields at once to clear backlog, but steady-state is more like 1-2 per vLLM bump. test_nullify_stage_engine_defaults_resets_inherited_defaults already catches drift.

engine_extras already exists on StageDeployConfig as the forward-compat escape hatch — if a user needs a knob not in the schema yet, they drop it there. So Option 1 + engine_extras covers most of Option 2's flexibility without the override-semantics complexity.

Option 2 also has a hidden coupling: making it work cleanly requires either removing make_arg_parser(serve_parser) (large surface area, risks CLI parity regressions), or relying on detect_explicit_cli_keys everywhere — which only works in from_cli_args, not programmatic Omni(...). The "diffusion-specific whitelist" fallback is just Option 1 with extra steps.

Concrete suggestion: land this PR as Option 1, then in a follow-up (a) make max_num_seqs / max_num_batched_tokens / trust_remote_code also default to None if those concrete defaults aren't load-bearing, and (b) document engine_extras as the escape hatch. Revisit #3128 only after this lands — may become unnecessary.

reidliu41 · 2026-04-28T14:20:21Z

Option 1 looks like the most maintainable path for the current code structure.

The main value is making override semantics fail early: new upstream engine fields have to be represented intentionally in the deploy schema, instead of silently flowing through or being masked by parser/dataclass defaults. That gives reviewers and tests a concrete place to catch drift, while keeping the current None-means-omitted resolution model small and predictable.

alex-jw-brooks · 2026-04-28T15:00:11Z

I think we need a few different pieces for long-term sustainability.

Generic utils for being able to distinguish between unset defaults vs passed values for

dataclasses
stuff parsed by argparse

Setting all defaults to None / having lots of sentinel is hard for readability, plus being able to distinguish between default values the user passed vs default values that were set because they weren't passed is useful. These would be useful in a lot of places though, because vLLM Omni has the recurring issue of having to tell if values were actually set by the user before deciding if they should actually override stuff in vLLM.

Minimize white/black lists as much as possible. Having too many string lists that correspond to things like signatures is dangerous, since things can be renamed etc. While the signatures are hopefully stable, we should pull as much from vLLM as we can for this.
In cases where white/black lists are actually needed, we should enforce that they are aligned correctly in our CI, e.g., by comparing to the corresponding objects in vLLM and ensuring that are actually valid in the context that they are used. This is not ideal still, but then at least we can catch things that are potentially renamed, removed, etc

I'll look into some of these threads as well @xiaohajiayou, but happy to collaborate however 🙂

alceops · 2026-04-28T15:34:11Z

I reviewed this against #3220 and the schema-derived override direction looks right. One small review-readiness gap: profiler_config itself is not covered end-to-end by a deploy YAML load + merge test, and the newly declared top-level config_format / load_format / tokenizer_mode fields are in _PIPELINE_WIDE_ENGINE_FIELDS but not yet copied in load_deploy_config(). I have a tiny delta with two focused tests plus the loader-list addition if useful; happy to send it as a follow-up patch rather than opening a competing implementation.

xiaohajiayou · 2026-04-28T17:09:09Z

I reviewed this against #3220 and the schema-derived override direction looks right. One small review-readiness gap: profiler_config itself is not covered end-to-end by a deploy YAML load + merge test, and the newly declared top-level config_format / load_format / tokenizer_mode fields are in _PIPELINE_WIDE_ENGINE_FIELDS but not yet copied in load_deploy_config(). I have a tiny delta with two focused tests plus the loader-list addition if useful; happy to send it as a follow-up patch rather than opening a competing implementation.

Based on the existing YAMLs, these fields are currently configured per stage rather than at the top level. So I think a better first step is to put config_format, load_format, and tokenizer_mode into StageDeployConfig.

This keeps the existing YAML syntax unchanged, while allowing this PR’s schema-derived whitelist to recognize these fields explicitly and preserve the correct override semantics., I fix this in 9ad134e.

xiaohajiayou · 2026-04-28T17:40:27Z

I think we need a few different pieces for long-term sustainability.

Generic utils for being able to distinguish between unset defaults vs passed values for

dataclasses

stuff parsed by argparse

Thanks, I agree with this direction.

I think there are two related but separate layers here:

For deploy YAML, if we choose Option 1 / strict schema, then StageDeployConfig and DeployConfig become the full deploy schema. Under that model, they naturally also act as the whitelist for deploy override semantics: only fields intentionally represented in the deploy schema can participate in override resolution. Based on that, this PR’s current None sentinel based filtering can land first to preserve the correct override semantics.
Separately, I agree the current override/default handling is not ideal as a long-term mechanism. Using None as the unset sentinel makes the schema less readable, and a generic way to distinguish “user explicitly passed this value” from “this value came from a default” would be cleaner for both dataclasses and argparse inputs.

So I see this PR as the smaller correctness step: keep the strict deploy schema as the override whitelist and make the current precedence behavior correct. Then we can follow up with a more general explicit-value tracking utility / CI alignment checks to make the mechanism more maintainable.

alex-jw-brooks · 2026-04-28T18:30:39Z

Cool, I think we are on the same page. I agree, I think there is a lot of nuance to this, but it's best to decouple discussions about the implementation from the user experience, and it shouldn't block this PR, more just discussion for the future 🙂

After thinking about this a bit more, maybe it's confusing to have the schema set top-level keys that are applied to all stages without letting those stages also specify it themselves + just validating against bad behavior. As a minimal example with one stage:

If you can do something like this:

dtype: float32
stages:
  - stage_id: 0
    gpu_memory_utilization: 0.9
    max_num_seqs: 32

You should be able to do something like this too:

stages:
  - stage_id: 0
    gpu_memory_utilization: 0.9
    max_num_seqs: 32
    dtype: float32

But if that's true, this is natural feeling as well:

stages:
  - stage_id: 0
    gpu_memory_utilization: 0.9
    max_num_seqs: 32
    dtype: float32
  - stage_id: 0
    gpu_memory_utilization: 0.9
    max_num_seqs: 32
    dtype: float16

and you'd just expect it to throw an error since dtype needs consistent values for now. Having to know about which keys are allowed where and treating everything as separate adds another layer of complexity, when the resolved stage config should be directly translatable to engine args of that corresponding stage type (although Diffusion is kind of bundling everything into the OmniDiffusionConfig at the moment).

Being able to essentially map it to an engine args like thing directly and push the validation further down after initial merging and resolution also makes more sense for extensibility. E.g., if we add reconstruction engine for world models later on, the stage config object will probably get even more confusing because it will need to be compatible with those stages too

Signed-off-by: xiaohajiayou <923390377@qq.com>

xiaohajiayou · 2026-04-30T15:34:02Z

I was able to reproduce the reported stage-0 torch-profiler case locally using Qwen3-TTS-12Hz-1.7B-Base.

For reference, the stage-0 configuration I used is:

stages:
  - stage_id: 0
    max_num_seqs: 10
    gpu_memory_utilization: 0.3
    async_scheduling: true
    max_num_batched_tokens: 512
    max_model_len: 4096
    devices: "0"
    output_connectors:
      to_stage_1: connector_of_shared_memory
    profiler_config:
      profiler: torch
      torch_profiler_dir: ./thinker-omni

Results:

POST /start_profile succeeded
The TTS request completed successfully
POST /stop_profile completed successfully
Profiler artifacts were correctly generated under thinker-omni/...

Based on this, the issue reported in #3220 appears to be resolved.

Relevant logs:

(APIServer pid=10835) INFO 04-30 23:25:49 [api_server.py:2834] Starting profiler for stages: [0]
(APIServer pid=10835) INFO 04-30 23:25:49 [api_server.py:2837] Profiler started.
(APIServer pid=10835) INFO:     127.0.0.1:37682 - "POST /start_profile HTTP/1.1" 200 OK

(APIServer pid=10835) INFO 04-30 23:26:04 [serving_speech.py:1795] TTS speech request speech-ab3370b6070557b8: text='Hello, this is a profiler validation request.', model=Base
(APIServer pid=10835) INFO 04-30 23:26:04 [orchestrator.py:894] [Orchestrator] _handle_add_request: stage=0 req=speech-ab3370b6070557b8 prompt_type=OmniEngineCoreRequest original_prompt_type=dict final_stage=1 num_sampling_params=2
(APIServer pid=10835) INFO 04-30 23:26:04 [stage_engine_core_client.py:230] [StageEngineCoreClient] Stage-0 adding request: speech-ab3370b6070557b8
(APIServer pid=10835) INFO 04-30 23:26:04 [stage_engine_core_client.py:230] [StageEngineCoreClient] Stage-1 adding request: speech-ab3370b6070557b8
(StageEngineCoreProc pid=11234) INFO 04-30 23:26:17 [qwen3_tts_code2wav.py:288] Code2Wav codec: frames=2 q=16 uniq=26 range=[38,1960] batch=1
(StageEngineCoreProc pid=11234) WARNING 04-30 23:26:20 [qwen3_tts_code2wav.py:260] Code2Wav input_ids length 1 not divisible by num_quantizers 16; skipping malformed request.
(APIServer pid=10835) INFO:     127.0.0.1:41142 - "POST /v1/audio/speech HTTP/1.1" 200 OK

(APIServer pid=10835) INFO 04-30 23:26:30 [api_server.py:2860] Stopping profiler for stages: [0]
(StageEngineCoreProc pid=10998) INFO 04-30 23:27:37 [omni_torch_profiler.py:174] [Rank 0] Trace exported to /root/vllm-omni/thinker-omni/20260430-232549_stage0_rank0_1777562749/trace_rank0.json
(StageEngineCoreProc pid=10998) INFO 04-30 23:27:37 [omni_torch_profiler.py:179] [Rank 0] Triggered background compression for /root/vllm-omni/thinker-omni/20260430-232549_stage0_rank0_1777562749/trace_rank0.json
(StageEngineCoreProc pid=10998) WARNING 04-30 23:28:26 [omni_torch_profiler.py:349] [Rank 0] pandas not available, skip Excel export: No module named 'pandas'
(StageEngineCoreProc pid=10998) INFO 04-30 23:28:40 [wrapper.py:66] Profiler stopped successfully.
(APIServer pid=10835) INFO 04-30 23:28:40 [api_server.py:2863] Profiler stopped.

xiaohajiayou · 2026-04-30T15:38:11Z

The current CI issues seem to be caused by missing fields in some model YAMLs. Previously, these were implicitly handled by default fallbacks. However, after setting these defaults to None in the stage config, the fallback behavior from vLLM no longer matches the expected values.

To address this, I’ve added the required default values back into the affected model YAMLs in e2aa6aa.
Please let me know if there’s anything I might have missed. @lishunyang12 @gcanlin @hsliuustc0106

lishunyang12 · 2026-04-30T17:24:43Z

Please leave a Todo issue for follow-ups. @xiaohajiayou

…nfig (vllm-project#3162)" This reverts commit 01ebc0c. Co-authored-by: Gaohan123 <20148503+Gaohan123@users.noreply.github.com>

…nfig (#3162)" This reverts commit 01ebc0c. Signed-off-by: GitHub <noreply@github.com> Co-authored-by: Gaohan123 <20148503+Gaohan123@users.noreply.github.com>

…lm-project#3162) Signed-off-by: xiaohajiayou <923390377@qq.com>

…lm-project#3162) Signed-off-by: xiaohajiayou <923390377@qq.com> Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>

…lm-project#3162) Signed-off-by: xiaohajiayou <923390377@qq.com>

xiaohajiayou requested a review from hsliuustc0106 as a code owner April 26, 2026 16:55

xiaohajiayou mentioned this pull request Apr 26, 2026

[Config Refactor] sentinel default precedence #3078

Merged

5 tasks

chatgpt-codex-connector Bot reviewed Apr 26, 2026

View reviewed changes

Comment thread examples/offline_inference/dynin_omni/end2end.py Outdated

xiaohajiayou force-pushed the whitelist-optimization branch from b54ccf2 to d63ea16 Compare April 26, 2026 17:08

xiaohajiayou mentioned this pull request Apr 26, 2026

[Refactor] Replace args whitelist with explicit CLI key detection #3160

Closed

hsliuustc0106 reviewed Apr 26, 2026

View reviewed changes

fhfuih mentioned this pull request Apr 27, 2026

[CI Failure]: Diffusion X2I(&A&T) · Doc Test, test_text_to_image.py, pydantic_core._pydantic_core.ValidationError: 1 validation error for DiffusionParallelConfig #3123

Closed

1 task

gcanlin reviewed Apr 27, 2026

View reviewed changes

gcanlin reviewed Apr 28, 2026

View reviewed changes

lishunyang12 reviewed Apr 28, 2026

View reviewed changes

gcanlin mentioned this pull request Apr 28, 2026

[Bug]: The field profiler_config in deploy yaml can't be passed correctly #3220

Closed

1 task

alex-jw-brooks mentioned this pull request Apr 28, 2026

[Config Refactor] Validate Engine Args From CLI #3008

Closed

xiaohajiayou force-pushed the whitelist-optimization branch from f437427 to 418c850 Compare April 28, 2026 17:33

Gaohan123 added this to the v0.20.0 milestone Apr 30, 2026

xiaohajiayou force-pushed the whitelist-optimization branch 3 times, most recently from a8410ed to b05cb02 Compare April 30, 2026 13:58

xiaohajiayou added 6 commits April 30, 2026 22:02

Derive deploy override fields from stage config

1f4f16f

Signed-off-by: xiaohajiayou <923390377@qq.com>

Fix: add missing fields to StageDeployConfig and DeployConfig whitelist

eab479c

Signed-off-by: xiaohajiayou <923390377@qq.com>

Move format fields to stage deploy config

8b0c3f2

Signed-off-by: xiaohajiayou <923390377@qq.com>

Delegate deploy defaults to engine args

9a92282

Signed-off-by: xiaohajiayou <923390377@qq.com>

Guard deploy override defaults

cb96db8

Signed-off-by: xiaohajiayou <923390377@qq.com>

Update max_num_seqs default test

ab6fc64

Signed-off-by: xiaohajiayou <923390377@qq.com>

xiaohajiayou force-pushed the whitelist-optimization branch 2 times, most recently from 2ea08fc to 176246d Compare April 30, 2026 14:30

Fix deploy configs for CI startup

e2aa6aa

Signed-off-by: xiaohajiayou <923390377@qq.com>

xiaohajiayou force-pushed the whitelist-optimization branch from 176246d to e2aa6aa Compare April 30, 2026 14:44

Merge branch 'main' into whitelist-optimization

c4514c7

lishunyang12 merged commit 01ebc0c into vllm-project:main Apr 30, 2026
8 checks passed

Copilot AI mentioned this pull request Apr 30, 2026

Revert "[Config Refactor] Derive deploy override fields from stage config (#3162)" Gaohan123/vllm-omni#1

Closed

Gaohan123 mentioned this pull request Apr 30, 2026

[CI failed]Revert "[Config Refactor] Derive deploy override fields from stage config" #3287

Merged

Copilot AI mentioned this pull request Apr 30, 2026

[WIP] Revert commit 01ebc0cd for whitelist optimization with DCO #3288

Closed

3 tasks

xiaohajiayou mentioned this pull request May 1, 2026

[BugFix] Fix Whitelist optimization CI failure #3290

Merged

5 tasks

lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026

[Config Refactor] Derive deploy override fields from stage config (vl…

9ffb74b

…lm-project#3162) Signed-off-by: xiaohajiayou <923390377@qq.com>

xiaohajiayou mentioned this pull request May 2, 2026

[Followup] Deploy YAML field ownership: stage-level defaults, user knobs, and model-owned config #3313

Open

clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026

[Config Refactor] Derive deploy override fields from stage config (vl…

5956753

…lm-project#3162) Signed-off-by: xiaohajiayou <923390377@qq.com>

xiaohajiayou mentioned this pull request May 19, 2026

[RFC]: Model-Aware Argument Default Resolution #3735

Open

1 task

Conversation

xiaohajiayou commented Apr 26, 2026

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Cross-PR Config Refactor Review

What this PR does

Assessment

Merge order note

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Cross-PR Config Refactor Review

What this PR does

Assessment

Merge order note

Uh oh!

gcanlin left a comment

Choose a reason for hiding this comment

Uh oh!

xiaohajiayou commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gcanlin commented Apr 28, 2026

Uh oh!

xiaohajiayou commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xiaohajiayou commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related context

Follow-up question

Option 1: Strict schema

Option 2: Allow YAML extensibility

Uh oh!

gcanlin Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

xiaohajiayou Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

xiaohajiayou Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lishunyang12 commented Apr 28, 2026

Uh oh!

reidliu41 commented Apr 28, 2026

Uh oh!

alex-jw-brooks commented Apr 28, 2026

Uh oh!

alceops commented Apr 28, 2026

Uh oh!

xiaohajiayou commented Apr 28, 2026

Uh oh!

xiaohajiayou commented Apr 28, 2026

Uh oh!

alex-jw-brooks commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xiaohajiayou commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xiaohajiayou commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

xiaohajiayou commented Apr 27, 2026 •

edited

Loading

xiaohajiayou commented Apr 28, 2026 •

edited

Loading

xiaohajiayou commented Apr 28, 2026 •

edited

Loading

xiaohajiayou Apr 28, 2026 •

edited

Loading

xiaohajiayou Apr 28, 2026 •

edited

Loading

alex-jw-brooks commented Apr 28, 2026 •

edited

Loading

xiaohajiayou commented Apr 30, 2026 •

edited

Loading

xiaohajiayou commented Apr 30, 2026 •

edited

Loading

lishunyang12 commented Apr 30, 2026 •

edited

Loading