[Refactor] Replace args whitelist with explicit CLI key detection by gcanlin · Pull Request #3160 · vllm-project/vllm-omni

gcanlin · 2026-04-26T14:38:25Z

Summary

Remove nullify_stage_engine_defaults and its hardcoded whitelist — the old design
required manually maintaining _DEPLOY_ENGINE_ARG_OVERRIDE_FIELDS for every vLLM parameter
that should be YAML-configurable. Unlisted fields (like profiler_config) silently broke.
Introduce detect_explicit_cli_keys mechanism — parses sys.argv to determine which
flags the user actually typed, so only those become overrides. Argparse defaults never
shadow deploy YAML values.
Add dict→ProfilerConfig conversion in OmniEngineArgs.__post_init__ so that
profiler_config: {profiler: torch, ...} from YAML is correctly deserialized.

Config merge chain (after this PR)

                    ┌─────────────────────┐
                    │   sys.argv[1:]      │
                    └────────┬────────────┘
                             │
                  detect_explicit_cli_keys(argv, parser)
                             │
                             ▼
                    ┌─────────────────────┐
                    │ _explicit_cli_keys  │  ← set of dest names user actually typed
                    │  e.g. {"max_num_seqs", "dtype"}
                    └────────┬────────────┘
                             │
          ┌──────────────────┼──────────────────┐
          │ serve.py         │ api_server.py     │ omni_base.py
          │ (vllm omni serve)│ (__main__)        │ (from_cli_args)
          └──────┬───────────┴──────┬────────────┴──────┐
                 │                  │                    │
                 ▼                  ▼                    ▼
          omni_run_server    omni_run_server       OmniBase.__init__
          (hasattr guard)    (fallback compute)         │
                 │                  │                    │
                 └──────────┬───────┘                    │
                            ▼                            ▼
                  build_async_omni_from_stage_config     │
                  kwargs.setdefault("_explicit_cli_keys")│
                            │                            │
                            └────────────┬───────────────┘
                                         ▼
                              AsyncOmniEngine.__init__
                                         │
                                         ▼
                              _resolve_stage_configs
                              (keeps _explicit_cli_keys in kwargs)
                                         │
                                         ▼
                           load_stage_configs_from_model
                                         │
                    ┌────────────────────┼────────────────────┐
                    │                    │                    │
                    ▼                    ▼                    ▼
             pop _explicit_cli_keys   cli_overrides      deploy YAML
                    │              = all argparse args    (per-stage)
                    │                    │                    │
                    ▼                    ▼                    │
              filter cli_overrides:     │                    │
              keep only keys in         │                    │
              _explicit_cli_keys        │                    │
                    │                    │                    │
                    └────────┬───────────┘                    │
                             ▼                               │
                     filtered overrides                      │
                     (only user-typed)                       │
                             │                               │
                             └───────────┬───────────────────┘
                                         ▼
                              build_stage_runtime_overrides
                              YAML ← merge ← filtered CLI overrides
                                         │
                                         ▼
                              OmniEngineArgs(**merged)
                              __post_init__: dict → ProfilerConfig
                                         │
                                         ▼
                              create_engine_config() → VllmConfig → Worker

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin · 2026-04-26T14:39:30Z

@amy-why-3459 Do you find that profiler_config can't be passed by deploy config? This PR should fix it.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 95ae0221a0

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-26T14:44:23Z

    args_dict = vars(args).copy()
-    args_dict.pop("_cli_explicit_keys", None)
    config_path, stage_configs = load_and_resolve_stage_configs(
        model,
        args_dict.get("stage_configs_path"),


Propagate explicit CLI keys in headless mode

The headless path now passes raw vars(args) into load_and_resolve_stage_configs without attaching _explicit_cli_keys, so load_stage_configs_from_model() does not run the new explicit-key filter and treats all non-None argparse defaults as real CLI overrides. In headless runs this silently overwrites deploy YAML values (for example dtype, parallelism, batching knobs) even when the user did not provide those flags, which is the exact precedence regression this refactor is meant to prevent.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-26T14:44:23Z

 def deploy_override_field_names() -> frozenset[str]:
-    """Return kwargs whose parser defaults must not override deploy YAML."""
-    return _DEPLOY_ENGINE_ARG_OVERRIDE_FIELDS | _DEPLOY_RUNTIME_OVERRIDE_FIELDS
+    """Return kwargs whose parser defaults must not override deploy YAML.
+
+    Dynamically computed from all ``OmniEngineArgs`` fields so that newly
+    added engine args (e.g. ``profiler_config``) are automatically covered
+    without maintaining a manual whitelist.
+    """
+    engine_fields = frozenset(f.name for f in fields(OmniEngineArgs))
+    # Orchestrator-only keys are handled separately; shared keys need to
+    # flow to both orchestrator and engine, so exclude them here.
+    return (engine_fields - orchestrator_field_names()) | _DEPLOY_RUNTIME_OVERRIDE_FIELDS


Keep backward-compatible nullify helper export

This change deletes nullify_stage_engine_defaults from vllm_omni.engine.arg_utils without migrating remaining callers, but multiple shipped offline examples and tests still import that symbol (some at module import time), so those entrypoints now fail immediately with import-time errors. If this helper is being retired, the same commit should either update all in-repo call sites or provide a compatibility shim.

Useful? React with 👍 / 👎.

lishunyang12 · 2026-04-26T15:20:08Z

+    engine_fields = frozenset(f.name for f in fields(OmniEngineArgs))
+    # Orchestrator-only keys are handled separately; shared keys need to
+    # flow to both orchestrator and engine, so exclude them here.
+    return (engine_fields - orchestrator_field_names()) | _DEPLOY_RUNTIME_OVERRIDE_FIELDS


Tbh this scope feels too large for what was flagged on #3078. Why not just add profiler_config to _DEPLOY_ENGINE_ARG_OVERRIDE_FIELDS and a dict→ProfilerConfig conversion in OmniEngineArgs.__post_init__? Removing nullify_stage_engine_defaults also breaks ~12 example imports (examples/offline_inference/qwen3_omni/end2end.py:24, etc.) and 3 tests in tests/test_arg_utils.py.

Or if you want to drop the manual list entirely, invert the direction: a small _DIFFUSION_REQUIRED_PARSER_DEFAULTS denylist (the few fields pure-diffusion needs argparse defaults for) and nullify engine_fields - orchestrator_fields - denylist. Same size as today's whitelist but adding a new engine field auto-gets deploy-override semantics — only diffusion-only required fields need listing.

lishunyang12 · 2026-04-26T15:20:08Z

@@ -542,7 +539,6 @@ def run_headless(args: argparse.Namespace) -> None:
        raise ValueError("headless mode requires worker_backend=multi_process")

    args_dict = vars(args).copy()


Even keeping this approach, _explicit_cli_keys is never set on args for the headless path — omni_run_server sets it but run_headless doesn't, so load_stage_configs_from_model skips the filter and argparse defaults still override deploy YAML.

lishunyang12 · 2026-04-26T15:25:08Z

@@ -496,8 +475,16 @@ def orchestrator_field_names() -> frozenset[str]:


 def deploy_override_field_names() -> frozenset[str]:


Seems unused after this PR — nullify_stage_engine_defaults and from_cli_args were the only callers. _DEPLOY_RUNTIME_OVERRIDE_FIELDS too.

xiaohajiayou · 2026-04-26T17:20:25Z

I think there is still a subtle difference here.

As discussed in #3078, the requirement is not only about CLI/argv-based paths. Users can provide explicit overrides through at least three entry paths:

Omni(..., engine_args=...)
parser / CLI-based paths
direct Omni(**kwargs) paths

The explicit CLI-key approach works well for parser / argv-based paths, but it cannot distinguish user intent for the other two paths, because those values do not necessarily come from sys.argv.

That is why the nullification approach was introduced: before parse_args(), we set defaults for deploy-overridable fields to None. After parsing, any non-None value for those fields means it was explicitly provided by the user, regardless of whether the caller later constructs Omni via from_cli_args, direct Omni(**kwargs), or another kwargs-based path.

This gives the same precedence behavior across the three input scenarios:

explicit user override > deploy YAML > dataclass/default value

If the main concern of this PR is removing the duplicated hardcoded allowlist from arg_utils.py, #3162 addresses that by deriving the nullified field set from the already-maintained deploy schema in stage_config.py (DeployConfig, StageDeployConfig, and special deploy/runtime fields such as async_chunk and devices).

Could you take a look at #3162 and see whether that direction resolves the maintainability concern while preserving consistent override semantics for non-argv construction paths?

hsliuustc0106

Cross-PR Config Refactor Review

This is one of several coordinated config refactor PRs. Reviewed together with #3162, #3154, #3144, #3128, #3120, #3139.

What this PR does

Replaces the hardcoded engine arg override whitelist with dynamic field detection from OmniEngineArgs -- new engine args are automatically covered.
Removes nullify_stage_engine_defaults from production entrypoints, replacing it with explicit_cli_keys filtering in load_stage_configs_from_model.
Adds profiler_config dict to dataclass conversion in OmniEngineArgs.

One issue: merge order dependency with #3144

This PR removes nullify_stage_engine_defaults from arg_utils.py. But #3144 adds callers of that same function in 12+ offline example scripts, importing it from arg_utils. This creates a hard dependency:

If #3160 merges first, #3144's imports will fail.
If #3144 merges first, this PR will have a file-level conflict on arg_utils.py.

Recommendation: #3160 should merge last in the chain. Either keep nullify_stage_engine_defaults as a thin shim in arg_utils.py that warns and delegates, or coordinate the merge so all of #3144's callers land first.

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

zhumingjue138 · 2026-04-27T07:44:54Z

please add ut test case if it is necessary

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin · 2026-04-28T12:40:15Z

@hsliuustc0106 @lishunyang12 @xiaohajiayou I will close this PR because I have no more bandwidth to handle it. But please continue to fix the bug. Thanks

lishunyang12 · 2026-04-28T12:41:46Z

@hsliuustc0106 @lishunyang12 @xiaohajiayou I will close this PR because I have no more bandwidth to handle it. But please continue to fix the bug. Thanks

Copy

[Refactor] Replace args whitelist with explicit CLI key detection

95ae022

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin requested a review from hsliuustc0106 as a code owner April 26, 2026 14:38

gcanlin requested a review from lishunyang12 April 26, 2026 14:42

chatgpt-codex-connector Bot reviewed Apr 26, 2026

View reviewed changes

lishunyang12 reviewed Apr 26, 2026

View reviewed changes

hsliuustc0106 mentioned this pull request Apr 26, 2026

[Config Refactor] Derive deploy override fields from stage config #3162

Merged

5 tasks

hsliuustc0106 reviewed Apr 26, 2026

View reviewed changes

fhfuih mentioned this pull request Apr 27, 2026

[CI Failure]: Diffusion X2I(&A&T) · Doc Test, test_text_to_image.py, pydantic_core._pydantic_core.ValidationError: 1 validation error for DiffusionParallelConfig #3123

Closed

1 task

fix

2ed7228

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

ut

bf2965a

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin closed this Apr 28, 2026

		@@ -542,7 +539,6 @@ def run_headless(args: argparse.Namespace) -> None:
		raise ValueError("headless mode requires worker_backend=multi_process")

		args_dict = vars(args).copy()

		@@ -496,8 +475,16 @@ def orchestrator_field_names() -> frozenset[str]:


		def deploy_override_field_names() -> frozenset[str]:

Conversation

gcanlin commented Apr 26, 2026

Summary

Config merge chain (after this PR)

Uh oh!

gcanlin commented Apr 26, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

xiaohajiayou commented Apr 26, 2026

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Cross-PR Config Refactor Review

What this PR does

One issue: merge order dependency with #3144

Uh oh!

zhumingjue138 commented Apr 27, 2026

Uh oh!

gcanlin commented Apr 28, 2026

Uh oh!

lishunyang12 commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants