Skip to content

[Config Refactor] Derive deploy override fields from stage config#3162

Merged
lishunyang12 merged 8 commits into
vllm-project:mainfrom
xiaohajiayou:whitelist-optimization
Apr 30, 2026
Merged

[Config Refactor] Derive deploy override fields from stage config#3162
lishunyang12 merged 8 commits into
vllm-project:mainfrom
xiaohajiayou:whitelist-optimization

Conversation

@xiaohajiayou
Copy link
Copy Markdown
Contributor

Purpose

Move deploy override field ownership from arg_utils.py to stage_config.py.

The nullify path now derives deploy-overridable fields from the deploy schema/merge logic instead of maintaining a duplicated manual allowlist in arg_utils.py. This keeps the field set aligned with DeployConfig, StageDeployConfig, and special deploy/runtime fields such as async_chunk and devices.

Test Plan

  • Run syntax checks for touched modules.
  • Run entrypoint tests covering direct Omni(...) construction and deprecated from_cli_args(...) compatibility.
  • Run targeted nullify/from_cli_args tests.

Test Result

  • Passed: .venv/bin/python -m py_compile ...
  • Passed: .venv/bin/pytest tests/entrypoints/test_omni_entrypoints.py -q
    • 39 passed
  • Passed: targeted entrypoint nullify/from_cli_args tests
    • 2 passed

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b54ccf2c68

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread examples/offline_inference/dynin_omni/end2end.py Outdated
Copy link
Copy Markdown
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cross-PR Config Refactor Review

This is one of several coordinated config refactor PRs. Reviewed together with #3160, #3154, #3144, #3128, #3120, #3139.

What this PR does

Moves deploy_override_field_names() from arg_utils.py to stage_config.py, where the deploy schema fields actually live. The override field set is now computed from _STAGE_DEPLOY_FIELDS | _PIPELINE_WIDE_ENGINE_FIELDS | {"async_chunk", "devices"}.

Assessment

Clean, small, single-purpose. No issues.

Merge order note

This PR touches omni_base.py (the from_cli_args import path), which is also touched by #3160 (rewrites from_cli_args) and #3144 (deprecates it). These should merge in order: #3162#3144#3160, to minimize conflict surface. #3160 removes the section that #3162 modifies, so #3162 must merge first.

Copy link
Copy Markdown
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cross-PR Config Refactor Review

This is one of several coordinated config refactor PRs. Reviewed together with #3160, #3154, #3144, #3128, #3120, #3139.

What this PR does

Moves deploy_override_field_names() from arg_utils.py to stage_config.py, where the deploy schema fields actually live. The override field set is now computed from the stage deploy fields, pipeline-wide engine fields, plus async_chunk and devices.

Assessment

Clean, small, single-purpose. No issues.

Merge order note

This PR touches omni_base.py (the from_cli_args import path), which is also touched by #3160 (rewrites from_cli_args) and #3144 (deprecates it). These should merge in order: #3162 -> #3144 -> #3160, to minimize conflict surface. #3160 removes the section that #3162 modifies, so #3162 must merge first.

Copy link
Copy Markdown
Collaborator

@gcanlin gcanlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have two concerns about #3078:

  1. the maintainability of white list for engine args,
  2. the default value consistency with upstream

This PR has fixed 1 clearly. Could you please confirm the second point?

@xiaohajiayou
Copy link
Copy Markdown
Contributor Author

xiaohajiayou commented Apr 27, 2026

2. the default value consistency with upstream

I’ve considered this issue as well, and it shouldn’t be a problem with the current implementation.

For upstream-owned fields, the intended behavior is:

explicit user value > deploy/yaml value > upstream vLLM default

The key detail is that upstream default consistency does not come from downstream treating None as a default. Instead, non-explicit None values are dropped during stage resolution, so the field is omitted from the resolved stage config. Then OmniEngineArgs(**...) falls back to the inherited upstream default.

For llm stage, the default-value / precedence flow (new deploy path) is like:

                     user input
            ┌──────────────────────────┐
            │ CLI / parser / kwargs /  │
            │ engine_args.create(...)  │
            └────────────┬─────────────┘
                         │
                         │ non-explicit override fields
                         │ are normalized to None
                         ▼
            ┌──────────────────────────┐
            │ _resolve_stage_configs() │
            │ load_and_resolve_stage   │
            │ _configs(...)            │
            └────────────┬─────────────┘
                         │
                         │ new deploy path:
                         │ only non-None explicit overrides survive
                         ▼
            ┌──────────────────────────┐
            │ StageConfigFactory       │
            │ _create_from_registry()  │
            │                          │
            │ explicit_overrides =     │
            │ {k: v for k, v in        │
            │  cli_overrides.items()   │
            │  if v is not None}       │
            └────────────┬─────────────┘
                         │
                         │ None values are dropped here
                         ▼
            ┌──────────────────────────┐
            │ StageConfig              │
            │                          │
            │ yaml_engine_args         │
            │ + runtime_overrides      │
            └────────────┬─────────────┘
                         │
                         │ if a field was not explicitly overridden:
                         │ - use deploy/yaml value if present
                         │ - otherwise omit the field
                         ▼
            ┌──────────────────────────┐
            │ to_omegaconf()           │
            │ resolved stage_config    │
            └────────────┬─────────────┘
                         │
                         ▼
            ┌──────────────────────────┐
            │ build_vllm_config()      │
            │ OmniEngineArgs(**dict)   │
            └────────────┬─────────────┘
                         │
                         │ omitted field => dataclass default applies
                         ▼
            ┌──────────────────────────┐
            │ OmniEngineArgs defaults  │
            │                          │
            │ upstream-owned fields    │
            │ inherit vLLM defaults    │
            │ (dtype="auto",           │
            │  enforce_eager=False,    │
            │  tp=1, etc.)             │
            └────────────┬─────────────┘
                         │
                         ▼
            ┌──────────────────────────┐
            │ create_engine_config()   │
            │ super().create_model_    │
            │ config()                 │
            └────────────┬─────────────┘
                         │
                         ▼
            ┌──────────────────────────┐
            │ final vLLM ModelConfig   │
            │ / VllmConfig             │
            └──────────────────────────┘

@gcanlin
Copy link
Copy Markdown
Collaborator

gcanlin commented Apr 28, 2026

It seems that there exists the same bug in this PR. I pull it and test the bad case:

stages:
  - stage_id: 0
    gpu_memory_utilization: 0.9
    devices: "0"
    profiler_config:
      profiler: torch
      torch_profiler_dir: ./thinker-omni
    default_sampling_params:
      temperature: 0.4
      top_p: 0.9
      top_k: 1
      max_tokens: 2048
      seed: 42
      repetition_penalty: 1.05
(APIServer pid=1102818) INFO 04-28 01:25:40 [serving.py:45] OpenAIServingRealtime initialized for task: realtime
(APIServer pid=1102818) INFO 04-28 01:25:40 [api_server.py:416] Starting vLLM API server 0 on http://0.0.0.0:8091
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:37] Available routes are:
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /openapi.json, Methods: GET, HEAD
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /docs, Methods: GET, HEAD
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /docs/oauth2-redirect, Methods: GET, HEAD
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /redoc, Methods: GET, HEAD
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /tokenize, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /detokenize, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /load, Methods: GET
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /version, Methods: GET
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /health, Methods: GET
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /metrics, Methods: GET
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /ping, Methods: GET
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /ping, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /invocations, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/chat/completions/batch, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/responses, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/responses/{response_id}, Methods: GET
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/responses/{response_id}/cancel, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/completions, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/messages, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/messages/count_tokens, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /inference/v1/generate, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /scale_elastic_ep, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /is_scaling_elastic_ep, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/chat/completions/render, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/completions/render, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/chat/completions, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/audio/speech, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/audio/speech/batch, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/audio/voices, Methods: GET
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/audio/voices, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/audio/voices/{name}, Methods: DELETE
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /health, Methods: GET
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/models, Methods: GET
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/images/generations, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/images/edits, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/videos, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/videos/sync, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/videos, Methods: GET
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/videos/{video_id}, Methods: GET
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/videos/{video_id}, Methods: DELETE
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/videos/{video_id}/content, Methods: GET
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/omni/sleep, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:46] Route: /v1/omni/wakeup, Methods: POST
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:57] Route: /v1/audio/speech/stream, Endpoint: streaming_speech
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:57] Route: /v1/video/chat/stream, Endpoint: streaming_video_chat
(APIServer pid=1102818) INFO 04-28 01:25:40 [launcher.py:57] Route: /v1/realtime, Endpoint: realtime_websocket
(APIServer pid=1102818) INFO:     Started server process [1102818]
(APIServer pid=1102818) INFO:     Waiting for application startup.
(APIServer pid=1102818) INFO:     Application startup complete.
(APIServer pid=1102818) INFO:     127.0.0.1:41786 - "POST /start_profile HTTP/1.1" 404 Not Found

@xiaohajiayou
Copy link
Copy Markdown
Contributor Author

xiaohajiayou commented Apr 28, 2026

It seems that there exists the same bug in this PR. I pull it and test the bad case:

stages:
  - stage_id: 0
    gpu_memory_utilization: 0.9
    devices: "0"
    profiler_config:
      profiler: torch
      torch_profiler_dir: ./thinker-omni
  • Root cause
    profiler_config was not included in StageDeployConfig / DeployConfig, so it was missing from the whitelist used by:
    nullify_stage_engine_defaults()
    As a result, in the online serving path, the default value (ProfilerConfig()) coming from vLLM via:
    serve_parser = make_arg_parser(serve_parser)
    was not cleared, and incorrectly overwrote the deploy YAML, leading to /start_profile returning 404.
  • Fix in 20890dc
    • Add missing fields (including profiler_config) to StageDeployConfig / DeployConfig
    • Normalize several fields to None (instead of concrete defaults)

This ensures that only values explicitly provided in the deploy YAML are included in the resulting StageConfig, while missing fields are left unset and handled by downstream default resolution. CLI defaults (both online and offline) no longer leak into the final config, like:

  • For LLM stages, missing fields fall back to the default values defined in vLLM EngineArgs:

    if engine_args_dict is None:
    engine_args_dict = build_engine_args_dict(
    stage_config,
    model,
    stage_connector_spec=stage_connector_spec,
    )
    filtered_engine_args_dict = filter_dataclass_kwargs(OmniEngineArgs, engine_args_dict)
    omni_engine_args = OmniEngineArgs(**filtered_engine_args_dict)

  • For diffusion stages, missing fields are handled by _create_default_diffusion_stage_cfg, which provides safe defaults:

    stage_engine_args = {
    "max_num_seqs": 1,
    "parallel_config": parallel_config,
    "model_class_name": kwargs.get("model_class_name", None),
    "step_execution": kwargs.get("step_execution", False),
    "vae_use_slicing": kwargs.get("vae_use_slicing", False),
    "vae_use_tiling": kwargs.get("vae_use_tiling", False),
    "cache_backend": cache_backend,

This unifies the override behavior while delegating default resolution to the appropriate downstream layer.

@xiaohajiayou
Copy link
Copy Markdown
Contributor Author

xiaohajiayou commented Apr 28, 2026

Related context

Follow-up question

This leads to a design question:

Should deploy YAML fields be fully defined via StageDeployConfig / DeployConfig,
or should users be allowed to extend them freely?

Option 1: Strict schema

Keep StageDeployConfig / DeployConfig as the full schema, but set all defaults to None.
Final defaults are resolved only through:

OmniEngineArgs(**filtered_engine_args_dict)

Implications:

Option 2: Allow YAML extensibility

Allow users to extend deploy YAML beyond the predefined schema.

Current issue:

  • profiler_config is not defined in StageDeployConfig, but can still be parsed into stage_config

  • However, in the online path,

    serve_parser = make_arg_parser(serve_parser)

    injects vLLM defaults

  • Since profiler_config is not part of the schema, its CLI default takes highest precedence and overrides the user-provided YAML value

In this case, if we want to support extensible YAML with correct override semantics:

  • We must ensure that all inputs before stage_config construction are explicit user inputs only

  • In the online path, this likely requires:

    • removing make_arg_parser(serve_parser)
    • and applying default completion after stage_config construction via EngineArgs
  • Alternatively, we could maintain a whitelist of diffusion-specific fields, and set all other fields to None before constructing stage_config, so that only explicitly supported fields participate in override resolution

This approach is more flexible, but may be more complex to maintain.

Could you please take a look when you have time and advise how to resolve these issues?
@lishunyang12 @gcanlin @hsliuustc0106

Comment thread vllm_omni/config/stage_config.py Outdated
Comment on lines +452 to +456
data_parallel_size: int | None = None
pipeline_parallel_size: int | None = None
config_format: str | None = None
load_format: str | None = None
tokenizer_mode: str | None = None
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change will also lead to the same issue that is setting the whitelist for the engine args of vllm.

Copy link
Copy Markdown
Contributor Author

@xiaohajiayou xiaohajiayou Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree this points to a more fundamental issue: how to determine which fields are truly intended to be user-overridable.

The complexity comes from the fact that we currently have three different input sources into Omni(), while also mixing LLM and diffusion configurations in the same flow.

To correctly distinguish explicit user inputs, we have two relatively straightforward options:

In the current override mechanism, any field with a non-None value when constructing StageConfig is treated as an explicit user input.

  • Maintain a complete field set in StageDeployConfig / DeployConfig (i.e., a whitelist), and set all defaults to None, so that only explicitly provided values participate in override

  • Alternatively, we would need to maintain a diffusion-specific blocklist, and set all other fields to None.

Otherwise, we would need to introduce a more complex mechanism, such as:

  • removing make_arg_parser(serve_parser) in the online path
  • and applying default completion only after stage_config construction via EngineArgs
  • The offline path is still unclear under this approach and would require further design.

Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: design question — I'd lean Option 1. The whitelist drift is what just bit us with profiler_config, and Option 2 has the same shape — every new vLLM EngineArg silently breaks override semantics until someone hits it. Strict schema forces the failure at PR time, which is cheaper than runtime. The sync tax per vLLM bump is real but small.

Comment thread vllm_omni/config/stage_config.py Outdated
gpu_memory_utilization: float | None = None
tensor_parallel_size: int | None = None
enforce_eager: bool | None = None
max_num_batched_tokens: int = 32768
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional to keep max_num_seqs, max_num_batched_tokens, and trust_remote_code (L446) concrete? Same leakage path as the others if the user omits them — these would still override upstream defaults.

Copy link
Copy Markdown
Contributor Author

@xiaohajiayou xiaohajiayou Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not changed these defaults to None in this PR yet because their current values are not the same as vLLM defaults, and #3128's description explicitly listed some of them as retained vLLM-Omni-specific behavior, e.g. max_num_seqs (64 vs vLLM's 256).

That said, I noticed that #3128's actual diff also removes max_num_seqs. So if the intended direction is to delegate these defaults back to vLLM, then I agree these fields can likely be changed to None as well, I updated in 418c850

@lishunyang12
Copy link
Copy Markdown
Collaborator

how to choose what do you think?

Tbh I'd go Option 1. The profiler_config bug is exactly the failure mode Option 2 perpetuates — a field silently mis-overridden because the schema doesn't know about it, surfaced only at user runtime. Option 1 makes adding a vLLM field an explicit PR-time decision instead of "did anyone notice?".

The maintenance tax is overstated IMO — this PR added 8 fields at once to clear backlog, but steady-state is more like 1-2 per vLLM bump. test_nullify_stage_engine_defaults_resets_inherited_defaults already catches drift.

engine_extras already exists on StageDeployConfig as the forward-compat escape hatch — if a user needs a knob not in the schema yet, they drop it there. So Option 1 + engine_extras covers most of Option 2's flexibility without the override-semantics complexity.

Option 2 also has a hidden coupling: making it work cleanly requires either removing make_arg_parser(serve_parser) (large surface area, risks CLI parity regressions), or relying on detect_explicit_cli_keys everywhere — which only works in from_cli_args, not programmatic Omni(...). The "diffusion-specific whitelist" fallback is just Option 1 with extra steps.

Concrete suggestion: land this PR as Option 1, then in a follow-up (a) make max_num_seqs / max_num_batched_tokens / trust_remote_code also default to None if those concrete defaults aren't load-bearing, and (b) document engine_extras as the escape hatch. Revisit #3128 only after this lands — may become unnecessary.

@reidliu41
Copy link
Copy Markdown
Contributor

Option 1 looks like the most maintainable path for the current code structure.

The main value is making override semantics fail early: new upstream engine fields have to be represented intentionally in the deploy schema, instead of silently flowing through or being masked by parser/dataclass defaults. That gives reviewers and tests a concrete place to catch drift, while keeping the current None-means-omitted resolution model small and predictable.

@alex-jw-brooks
Copy link
Copy Markdown
Contributor

I think we need a few different pieces for long-term sustainability.

  1. Generic utils for being able to distinguish between unset defaults vs passed values for
  • dataclasses
  • stuff parsed by argparse

Setting all defaults to None / having lots of sentinel is hard for readability, plus being able to distinguish between default values the user passed vs default values that were set because they weren't passed is useful. These would be useful in a lot of places though, because vLLM Omni has the recurring issue of having to tell if values were actually set by the user before deciding if they should actually override stuff in vLLM.

  1. Minimize white/black lists as much as possible. Having too many string lists that correspond to things like signatures is dangerous, since things can be renamed etc. While the signatures are hopefully stable, we should pull as much from vLLM as we can for this.

  2. In cases where white/black lists are actually needed, we should enforce that they are aligned correctly in our CI, e.g., by comparing to the corresponding objects in vLLM and ensuring that are actually valid in the context that they are used. This is not ideal still, but then at least we can catch things that are potentially renamed, removed, etc

I'll look into some of these threads as well @xiaohajiayou, but happy to collaborate however 🙂

@alceops
Copy link
Copy Markdown

alceops commented Apr 28, 2026

I reviewed this against #3220 and the schema-derived override direction looks right. One small review-readiness gap: profiler_config itself is not covered end-to-end by a deploy YAML load + merge test, and the newly declared top-level config_format / load_format / tokenizer_mode fields are in _PIPELINE_WIDE_ENGINE_FIELDS but not yet copied in load_deploy_config(). I have a tiny delta with two focused tests plus the loader-list addition if useful; happy to send it as a follow-up patch rather than opening a competing implementation.

@xiaohajiayou
Copy link
Copy Markdown
Contributor Author

I reviewed this against #3220 and the schema-derived override direction looks right. One small review-readiness gap: profiler_config itself is not covered end-to-end by a deploy YAML load + merge test, and the newly declared top-level config_format / load_format / tokenizer_mode fields are in _PIPELINE_WIDE_ENGINE_FIELDS but not yet copied in load_deploy_config(). I have a tiny delta with two focused tests plus the loader-list addition if useful; happy to send it as a follow-up patch rather than opening a competing implementation.

Based on the existing YAMLs, these fields are currently configured per stage rather than at the top level. So I think a better first step is to put config_format, load_format, and tokenizer_mode into StageDeployConfig.

This keeps the existing YAML syntax unchanged, while allowing this PR’s schema-derived whitelist to recognize these fields explicitly and preserve the correct override semantics., I fix this in 9ad134e.

@xiaohajiayou xiaohajiayou force-pushed the whitelist-optimization branch from f437427 to 418c850 Compare April 28, 2026 17:33
@xiaohajiayou
Copy link
Copy Markdown
Contributor Author

I think we need a few different pieces for long-term sustainability.

  1. Generic utils for being able to distinguish between unset defaults vs passed values for
  • dataclasses
  • stuff parsed by argparse

Thanks, I agree with this direction.

I think there are two related but separate layers here:

  1. For deploy YAML, if we choose Option 1 / strict schema, then StageDeployConfig and DeployConfig become the full deploy schema. Under that model, they naturally also act as the whitelist for deploy override semantics: only fields intentionally represented in the deploy schema can participate in override resolution. Based on that, this PR’s current None sentinel based filtering can land first to preserve the correct override semantics.

  2. Separately, I agree the current override/default handling is not ideal as a long-term mechanism. Using None as the unset sentinel makes the schema less readable, and a generic way to distinguish “user explicitly passed this value” from “this value came from a default” would be cleaner for both dataclasses and argparse inputs.

So I see this PR as the smaller correctness step: keep the strict deploy schema as the override whitelist and make the current precedence behavior correct. Then we can follow up with a more general explicit-value tracking utility / CI alignment checks to make the mechanism more maintainable.

@alex-jw-brooks
Copy link
Copy Markdown
Contributor

alex-jw-brooks commented Apr 28, 2026

Cool, I think we are on the same page. I agree, I think there is a lot of nuance to this, but it's best to decouple discussions about the implementation from the user experience, and it shouldn't block this PR, more just discussion for the future 🙂

After thinking about this a bit more, maybe it's confusing to have the schema set top-level keys that are applied to all stages without letting those stages also specify it themselves + just validating against bad behavior. As a minimal example with one stage:

If you can do something like this:

dtype: float32
stages:
  - stage_id: 0
    gpu_memory_utilization: 0.9
    max_num_seqs: 32

You should be able to do something like this too:

stages:
  - stage_id: 0
    gpu_memory_utilization: 0.9
    max_num_seqs: 32
    dtype: float32

But if that's true, this is natural feeling as well:

stages:
  - stage_id: 0
    gpu_memory_utilization: 0.9
    max_num_seqs: 32
    dtype: float32
  - stage_id: 0
    gpu_memory_utilization: 0.9
    max_num_seqs: 32
    dtype: float16

and you'd just expect it to throw an error since dtype needs consistent values for now. Having to know about which keys are allowed where and treating everything as separate adds another layer of complexity, when the resolved stage config should be directly translatable to engine args of that corresponding stage type (although Diffusion is kind of bundling everything into the OmniDiffusionConfig at the moment).

Being able to essentially map it to an engine args like thing directly and push the validation further down after initial merging and resolution also makes more sense for extensibility. E.g., if we add reconstruction engine for world models later on, the stage config object will probably get even more confusing because it will need to be compatible with those stages too

@Gaohan123 Gaohan123 added this to the v0.20.0 milestone Apr 30, 2026
@xiaohajiayou xiaohajiayou force-pushed the whitelist-optimization branch 3 times, most recently from a8410ed to b05cb02 Compare April 30, 2026 13:58
Signed-off-by: xiaohajiayou <923390377@qq.com>
Signed-off-by: xiaohajiayou <923390377@qq.com>
Signed-off-by: xiaohajiayou <923390377@qq.com>
Signed-off-by: xiaohajiayou <923390377@qq.com>
Signed-off-by: xiaohajiayou <923390377@qq.com>
Signed-off-by: xiaohajiayou <923390377@qq.com>
@xiaohajiayou xiaohajiayou force-pushed the whitelist-optimization branch 2 times, most recently from 2ea08fc to 176246d Compare April 30, 2026 14:30
Signed-off-by: xiaohajiayou <923390377@qq.com>
@xiaohajiayou xiaohajiayou force-pushed the whitelist-optimization branch from 176246d to e2aa6aa Compare April 30, 2026 14:44
@xiaohajiayou
Copy link
Copy Markdown
Contributor Author

xiaohajiayou commented Apr 30, 2026

I was able to reproduce the reported stage-0 torch-profiler case locally using Qwen3-TTS-12Hz-1.7B-Base.

For reference, the stage-0 configuration I used is:

stages:
  - stage_id: 0
    max_num_seqs: 10
    gpu_memory_utilization: 0.3
    async_scheduling: true
    max_num_batched_tokens: 512
    max_model_len: 4096
    devices: "0"
    output_connectors:
      to_stage_1: connector_of_shared_memory
    profiler_config:
      profiler: torch
      torch_profiler_dir: ./thinker-omni

Results:

  • POST /start_profile succeeded
  • The TTS request completed successfully
  • POST /stop_profile completed successfully
  • Profiler artifacts were correctly generated under thinker-omni/...

Based on this, the issue reported in #3220 appears to be resolved.

Relevant logs:

(APIServer pid=10835) INFO 04-30 23:25:49 [api_server.py:2834] Starting profiler for stages: [0]
(APIServer pid=10835) INFO 04-30 23:25:49 [api_server.py:2837] Profiler started.
(APIServer pid=10835) INFO:     127.0.0.1:37682 - "POST /start_profile HTTP/1.1" 200 OK

(APIServer pid=10835) INFO 04-30 23:26:04 [serving_speech.py:1795] TTS speech request speech-ab3370b6070557b8: text='Hello, this is a profiler validation request.', model=Base
(APIServer pid=10835) INFO 04-30 23:26:04 [orchestrator.py:894] [Orchestrator] _handle_add_request: stage=0 req=speech-ab3370b6070557b8 prompt_type=OmniEngineCoreRequest original_prompt_type=dict final_stage=1 num_sampling_params=2
(APIServer pid=10835) INFO 04-30 23:26:04 [stage_engine_core_client.py:230] [StageEngineCoreClient] Stage-0 adding request: speech-ab3370b6070557b8
(APIServer pid=10835) INFO 04-30 23:26:04 [stage_engine_core_client.py:230] [StageEngineCoreClient] Stage-1 adding request: speech-ab3370b6070557b8
(StageEngineCoreProc pid=11234) INFO 04-30 23:26:17 [qwen3_tts_code2wav.py:288] Code2Wav codec: frames=2 q=16 uniq=26 range=[38,1960] batch=1
(StageEngineCoreProc pid=11234) WARNING 04-30 23:26:20 [qwen3_tts_code2wav.py:260] Code2Wav input_ids length 1 not divisible by num_quantizers 16; skipping malformed request.
(APIServer pid=10835) INFO:     127.0.0.1:41142 - "POST /v1/audio/speech HTTP/1.1" 200 OK

(APIServer pid=10835) INFO 04-30 23:26:30 [api_server.py:2860] Stopping profiler for stages: [0]
(StageEngineCoreProc pid=10998) INFO 04-30 23:27:37 [omni_torch_profiler.py:174] [Rank 0] Trace exported to /root/vllm-omni/thinker-omni/20260430-232549_stage0_rank0_1777562749/trace_rank0.json
(StageEngineCoreProc pid=10998) INFO 04-30 23:27:37 [omni_torch_profiler.py:179] [Rank 0] Triggered background compression for /root/vllm-omni/thinker-omni/20260430-232549_stage0_rank0_1777562749/trace_rank0.json
(StageEngineCoreProc pid=10998) WARNING 04-30 23:28:26 [omni_torch_profiler.py:349] [Rank 0] pandas not available, skip Excel export: No module named 'pandas'
(StageEngineCoreProc pid=10998) INFO 04-30 23:28:40 [wrapper.py:66] Profiler stopped successfully.
(APIServer pid=10835) INFO 04-30 23:28:40 [api_server.py:2863] Profiler stopped.

@xiaohajiayou
Copy link
Copy Markdown
Contributor Author

xiaohajiayou commented Apr 30, 2026

The current CI issues seem to be caused by missing fields in some model YAMLs. Previously, these were implicitly handled by default fallbacks. However, after setting these defaults to None in the stage config, the fallback behavior from vLLM no longer matches the expected values.

To address this, I’ve added the required default values back into the affected model YAMLs in e2aa6aa.
Please let me know if there’s anything I might have missed. @lishunyang12 @gcanlin @hsliuustc0106

@lishunyang12 lishunyang12 merged commit 01ebc0c into vllm-project:main Apr 30, 2026
8 checks passed
@lishunyang12
Copy link
Copy Markdown
Collaborator

lishunyang12 commented Apr 30, 2026

Please leave a Todo issue for follow-ups. @xiaohajiayou

Copilot AI added a commit to Gaohan123/vllm-omni that referenced this pull request Apr 30, 2026
…nfig (vllm-project#3162)"

This reverts commit 01ebc0c.

Co-authored-by: Gaohan123 <20148503+Gaohan123@users.noreply.github.com>
Copilot AI added a commit that referenced this pull request Apr 30, 2026
…nfig (#3162)"

This reverts commit 01ebc0c.

Signed-off-by: GitHub <noreply@github.com>

Co-authored-by: Gaohan123 <20148503+Gaohan123@users.noreply.github.com>
lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026
sphinxkkkbc pushed a commit to sphinxkkkbc/vllm-omni that referenced this pull request May 4, 2026
…lm-project#3162)

Signed-off-by: xiaohajiayou <923390377@qq.com>
Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: The field profiler_config in deploy yaml can't be passed correctly

8 participants