[Refactor] Replace args whitelist with explicit CLI key detection#3160
[Refactor] Replace args whitelist with explicit CLI key detection#3160gcanlin wants to merge 3 commits into
Conversation
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
|
@amy-why-3459 Do you find that |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 95ae0221a0
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| args_dict = vars(args).copy() | ||
| args_dict.pop("_cli_explicit_keys", None) | ||
| config_path, stage_configs = load_and_resolve_stage_configs( | ||
| model, | ||
| args_dict.get("stage_configs_path"), |
There was a problem hiding this comment.
Propagate explicit CLI keys in headless mode
The headless path now passes raw vars(args) into load_and_resolve_stage_configs without attaching _explicit_cli_keys, so load_stage_configs_from_model() does not run the new explicit-key filter and treats all non-None argparse defaults as real CLI overrides. In headless runs this silently overwrites deploy YAML values (for example dtype, parallelism, batching knobs) even when the user did not provide those flags, which is the exact precedence regression this refactor is meant to prevent.
Useful? React with 👍 / 👎.
| def deploy_override_field_names() -> frozenset[str]: | ||
| """Return kwargs whose parser defaults must not override deploy YAML.""" | ||
| return _DEPLOY_ENGINE_ARG_OVERRIDE_FIELDS | _DEPLOY_RUNTIME_OVERRIDE_FIELDS | ||
| """Return kwargs whose parser defaults must not override deploy YAML. | ||
|
|
||
| Dynamically computed from all ``OmniEngineArgs`` fields so that newly | ||
| added engine args (e.g. ``profiler_config``) are automatically covered | ||
| without maintaining a manual whitelist. | ||
| """ | ||
| engine_fields = frozenset(f.name for f in fields(OmniEngineArgs)) | ||
| # Orchestrator-only keys are handled separately; shared keys need to | ||
| # flow to both orchestrator and engine, so exclude them here. | ||
| return (engine_fields - orchestrator_field_names()) | _DEPLOY_RUNTIME_OVERRIDE_FIELDS |
There was a problem hiding this comment.
Keep backward-compatible nullify helper export
This change deletes nullify_stage_engine_defaults from vllm_omni.engine.arg_utils without migrating remaining callers, but multiple shipped offline examples and tests still import that symbol (some at module import time), so those entrypoints now fail immediately with import-time errors. If this helper is being retired, the same commit should either update all in-repo call sites or provide a compatibility shim.
Useful? React with 👍 / 👎.
| engine_fields = frozenset(f.name for f in fields(OmniEngineArgs)) | ||
| # Orchestrator-only keys are handled separately; shared keys need to | ||
| # flow to both orchestrator and engine, so exclude them here. | ||
| return (engine_fields - orchestrator_field_names()) | _DEPLOY_RUNTIME_OVERRIDE_FIELDS |
There was a problem hiding this comment.
Tbh this scope feels too large for what was flagged on #3078. Why not just add profiler_config to _DEPLOY_ENGINE_ARG_OVERRIDE_FIELDS and a dict→ProfilerConfig conversion in OmniEngineArgs.__post_init__? Removing nullify_stage_engine_defaults also breaks ~12 example imports (examples/offline_inference/qwen3_omni/end2end.py:24, etc.) and 3 tests in tests/test_arg_utils.py.
There was a problem hiding this comment.
Or if you want to drop the manual list entirely, invert the direction: a small _DIFFUSION_REQUIRED_PARSER_DEFAULTS denylist (the few fields pure-diffusion needs argparse defaults for) and nullify engine_fields - orchestrator_fields - denylist. Same size as today's whitelist but adding a new engine field auto-gets deploy-override semantics — only diffusion-only required fields need listing.
| @@ -542,7 +539,6 @@ def run_headless(args: argparse.Namespace) -> None: | |||
| raise ValueError("headless mode requires worker_backend=multi_process") | |||
|
|
|||
| args_dict = vars(args).copy() | |||
There was a problem hiding this comment.
Even keeping this approach, _explicit_cli_keys is never set on args for the headless path — omni_run_server sets it but run_headless doesn't, so load_stage_configs_from_model skips the filter and argparse defaults still override deploy YAML.
| @@ -496,8 +475,16 @@ def orchestrator_field_names() -> frozenset[str]: | |||
|
|
|||
|
|
|||
| def deploy_override_field_names() -> frozenset[str]: | |||
There was a problem hiding this comment.
Seems unused after this PR — nullify_stage_engine_defaults and from_cli_args were the only callers. _DEPLOY_RUNTIME_OVERRIDE_FIELDS too.
|
I think there is still a subtle difference here. As discussed in #3078, the requirement is not only about CLI/argv-based paths. Users can provide explicit overrides through at least three entry paths:
The explicit CLI-key approach works well for parser / argv-based paths, but it cannot distinguish user intent for the other two paths, because those values do not necessarily come from That is why the nullification approach was introduced: before This gives the same precedence behavior across the three input scenarios: If the main concern of this PR is removing the duplicated hardcoded allowlist from Could you take a look at #3162 and see whether that direction resolves the maintainability concern while preserving consistent override semantics for non-argv construction paths? |
hsliuustc0106
left a comment
There was a problem hiding this comment.
Cross-PR Config Refactor Review
This is one of several coordinated config refactor PRs. Reviewed together with #3162, #3154, #3144, #3128, #3120, #3139.
What this PR does
- Replaces the hardcoded engine arg override whitelist with dynamic field detection from OmniEngineArgs -- new engine args are automatically covered.
- Removes nullify_stage_engine_defaults from production entrypoints, replacing it with explicit_cli_keys filtering in load_stage_configs_from_model.
- Adds profiler_config dict to dataclass conversion in OmniEngineArgs.
One issue: merge order dependency with #3144
This PR removes nullify_stage_engine_defaults from arg_utils.py. But #3144 adds callers of that same function in 12+ offline example scripts, importing it from arg_utils. This creates a hard dependency:
- If #3160 merges first, #3144's imports will fail.
- If #3144 merges first, this PR will have a file-level conflict on arg_utils.py.
Recommendation: #3160 should merge last in the chain. Either keep nullify_stage_engine_defaults as a thin shim in arg_utils.py that warns and delegates, or coordinate the merge so all of #3144's callers land first.
|
please add ut test case if it is necessary |
|
@hsliuustc0106 @lishunyang12 @xiaohajiayou I will close this PR because I have no more bandwidth to handle it. But please continue to fix the bug. Thanks |
Copy |
Summary
nullify_stage_engine_defaultsand its hardcoded whitelist — the old designrequired manually maintaining
_DEPLOY_ENGINE_ARG_OVERRIDE_FIELDSfor every vLLM parameterthat should be YAML-configurable. Unlisted fields (like
profiler_config) silently broke.detect_explicit_cli_keysmechanism — parsessys.argvto determine whichflags the user actually typed, so only those become overrides. Argparse defaults never
shadow deploy YAML values.
OmniEngineArgs.__post_init__so thatprofiler_config: {profiler: torch, ...}from YAML is correctly deserialized.Config merge chain (after this PR)