[Bugfix] Fix CLI defaults overriding stage configs#2656
[Bugfix] Fix CLI defaults overriding stage configs#2656xiaohajiayou wants to merge 1 commit intovllm-project:mainfrom
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
Signed-off-by: xiaohajiayou <923390377@qq.com>
136b153 to
bc95df2
Compare
|
we will remove cli related args from yaml very recently cc @lishunyang12 @wuhang2014 |
lishunyang12
left a comment
There was a problem hiding this comment.
Review of PR #2656: Fix CLI defaults overriding stage configs
Overall
The PR correctly identifies a real problem: argparse default values were overriding model-specific stage config defaults in the legacy YAML path, which is the wrong precedence. The fix introduces parse_args_with_explicit_overrides to distinguish user-supplied CLI args from parser defaults, then threads explicit_overrides through the merge logic so that only explicitly typed CLI args can override stage YAML defaults.
The precedence model is clear and well-documented in the PR description. The test coverage is good and verifies the four key scenarios (explicit YAML wins over explicit overrides, explicit overrides win over default YAML, plain kwargs do not override default YAML, plain kwargs fill missing keys).
Issues and suggestions
1. explicit_overrides leaks into the new-path (StageConfigFactory) as an extraneous engine arg (minor)
In AsyncOmniEngine._resolve_stage_configs, _cli_explicit_keys is popped from kwargs (line 1299), but explicit_overrides is not. It flows through base_kwargs -> load_and_resolve_stage_configs -> load_stage_configs_from_model -> cli_overrides -> StageConfigFactory.create_from_model. There, it becomes a key in the overrides dict with a dict value, which is harmless today (silently ignored by _merge_cli_overrides) but is conceptually wrong and could cause confusing debug output or breakage if future code validates override keys.
Consider popping explicit_overrides from kwargs in _resolve_stage_configs (alongside _cli_explicit_keys), or stripping it in load_stage_configs_from_model before building cli_overrides.
2. Two parallel explicit-override mechanisms doing the same thing (design concern)
The new path already has detect_explicit_cli_keys -> _cli_explicit_keys -> cli_explicit_keys. This PR introduces a second mechanism: parse_args_with_explicit_overrides -> _explicit_overrides -> explicit_overrides in kwargs. Both solve the same problem (distinguish user-typed CLI args from defaults) but take different approaches (set of dest names vs. dict of dest->value).
For the legacy YAML path this is fine since cli_explicit_keys wasn't wired there. But having two mechanisms increases cognitive load. A follow-up could unify them -- e.g., parse_args_with_explicit_overrides could populate _cli_explicit_keys (as a set) and the legacy path could extract explicit values from that set + the full kwargs.
3. getattr(args, "_explicit_overrides", None) repeated ~25 times across examples (nit)
Every example script does getattr(args, "_explicit_overrides", None). Since parse_args_with_explicit_overrides always sets this attribute, args._explicit_overrides would suffice. But more importantly, this boilerplate could be avoided if Omni.__init__ extracted _explicit_overrides from _cli_explicit_keys-style info, or if a helper extracted the overrides dict. The current approach puts the burden on every caller to remember the getattr pattern.
4. Subparser traversal in parse_args_with_explicit_overrides is shallow (minor)
The function iterates parser._actions and checks one level of subparser choices. If there are nested subparsers (unlikely but possible), deeper options would be missed. The current depth is probably sufficient for this codebase.
5. _strip_single_engine_args filtering (question)
In _resolve_stage_configs, when stage_configs_path is not None, kwargs go through _strip_single_engine_args. Does that strip explicit_overrides? If not, same leak issue as point 1.
Correctness of the merge logic change
The key change in load_stage_configs_from_yaml looks correct:
- Before:
merge_configs(base_engine_args_tmp, stage_arg.engine_args)-- stage YAML wins over ALL caller kwargs - After: First merge plain kwargs under stage YAML (stage wins), then merge explicit overrides on top (explicit wins over stage when
prefer_stage_engine_args=True, stage wins whenprefer_stage_engine_args=False)
This correctly implements the stated precedence: explicit YAML > explicit_overrides > default YAML > plain kwargs.
Tests
Good coverage of the four precedence cases. The mocker.patch approach for resolve_model_config_path and load_yaml_config is clean. The lambda path: __import__(...) pattern in side_effect is a bit unusual -- a local import at the top of the test would be cleaner -- but it works.
Summary
The fix is correct for the legacy YAML path and the tests cover the important cases. The main concern is the explicit_overrides key leaking into the new config path as an extraneous kwarg. That should be addressed (either in this PR or a quick follow-up) to avoid subtle issues down the road.
|
This PR is being closed due to the ongoing YAML refactor in #2383. |
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Fix #2655
Summary
This PR clarifies the precedence relationship between caller args, CLI defaults, and stage configs.
Previously, when a model had a built-in default stage config, overlapping caller / CLI args were always overridden by the stage config, so those user-supplied settings could not take effect.
In
#2076, we introduced explicit precedence handling between caller args and stage configs, but that change did not distinguish:As a result, parser defaults could incorrectly override default stage configs, while the intended override behavior was still ambiguous in some direct-call paths.
Current precedence
This PR makes the precedence rules explicit:
stage_configs_pathYAMLexplicit_overridesThat means:
What changed
explicit_overridesthrough online/offline parser-based pathsexplicit_overridesparticipate in conflict resolution with default stage configsEssential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)