Skip to content

[Bugfix] Fix CLI defaults overriding stage configs#2656

Closed
xiaohajiayou wants to merge 1 commit intovllm-project:mainfrom
xiaohajiayou:fix-explicit-overrides-stage-config
Closed

[Bugfix] Fix CLI defaults overriding stage configs#2656
xiaohajiayou wants to merge 1 commit intovllm-project:mainfrom
xiaohajiayou:fix-explicit-overrides-stage-config

Conversation

@xiaohajiayou
Copy link
Copy Markdown
Contributor

@xiaohajiayou xiaohajiayou commented Apr 9, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Fix #2655
Summary

This PR clarifies the precedence relationship between caller args, CLI defaults, and stage configs.

Previously, when a model had a built-in default stage config, overlapping caller / CLI args were always overridden by the stage config, so those user-supplied settings could not take effect.

In #2076, we introduced explicit precedence handling between caller args and stage configs, but that change did not distinguish:

  • explicit CLI overrides vs parser-provided CLI defaults
  • explicit overrides vs plain direct Python kwargs used in tests and scripts

As a result, parser defaults could incorrectly override default stage configs, while the intended override behavior was still ambiguous in some direct-call paths.

Current precedence

This PR makes the precedence rules explicit:

  1. Explicit stage_configs_path YAML
  2. explicit_overrides
  3. Default model stage config
  4. Plain caller kwargs / CLI defaults

That means:

  • explicit YAML remains authoritative over caller-provided overrides
  • explicit CLI overrides can override default model stage configs
  • CLI default values and plain kwargs only fill fields missing from the default stage config, and no longer override its defaults

What changed

  • Added explicit CLI override extraction for parser-based entrypoints
  • Passed explicit_overrides through online/offline parser-based paths
  • Updated stage config merge logic so only explicit_overrides participate in conflict resolution with default stage configs
  • Kept plain kwargs as fallback values for fields not defined in stage configs
  • Added/updated tests to cover these precedence rules

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@xiaohajiayou xiaohajiayou marked this pull request as draft April 9, 2026 18:13
Signed-off-by: xiaohajiayou <923390377@qq.com>
@xiaohajiayou xiaohajiayou force-pushed the fix-explicit-overrides-stage-config branch from 136b153 to bc95df2 Compare April 9, 2026 18:16
@lishunyang12
Copy link
Copy Markdown
Collaborator

lishunyang12 commented Apr 9, 2026

Please check #2072 and #2383, happy to receive any feedback, we are about to implement a large scale config refractoring and this feature will be covered.

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

we will remove cli related args from yaml very recently cc @lishunyang12 @wuhang2014

Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of PR #2656: Fix CLI defaults overriding stage configs

Overall

The PR correctly identifies a real problem: argparse default values were overriding model-specific stage config defaults in the legacy YAML path, which is the wrong precedence. The fix introduces parse_args_with_explicit_overrides to distinguish user-supplied CLI args from parser defaults, then threads explicit_overrides through the merge logic so that only explicitly typed CLI args can override stage YAML defaults.

The precedence model is clear and well-documented in the PR description. The test coverage is good and verifies the four key scenarios (explicit YAML wins over explicit overrides, explicit overrides win over default YAML, plain kwargs do not override default YAML, plain kwargs fill missing keys).

Issues and suggestions

1. explicit_overrides leaks into the new-path (StageConfigFactory) as an extraneous engine arg (minor)

In AsyncOmniEngine._resolve_stage_configs, _cli_explicit_keys is popped from kwargs (line 1299), but explicit_overrides is not. It flows through base_kwargs -> load_and_resolve_stage_configs -> load_stage_configs_from_model -> cli_overrides -> StageConfigFactory.create_from_model. There, it becomes a key in the overrides dict with a dict value, which is harmless today (silently ignored by _merge_cli_overrides) but is conceptually wrong and could cause confusing debug output or breakage if future code validates override keys.

Consider popping explicit_overrides from kwargs in _resolve_stage_configs (alongside _cli_explicit_keys), or stripping it in load_stage_configs_from_model before building cli_overrides.

2. Two parallel explicit-override mechanisms doing the same thing (design concern)

The new path already has detect_explicit_cli_keys -> _cli_explicit_keys -> cli_explicit_keys. This PR introduces a second mechanism: parse_args_with_explicit_overrides -> _explicit_overrides -> explicit_overrides in kwargs. Both solve the same problem (distinguish user-typed CLI args from defaults) but take different approaches (set of dest names vs. dict of dest->value).

For the legacy YAML path this is fine since cli_explicit_keys wasn't wired there. But having two mechanisms increases cognitive load. A follow-up could unify them -- e.g., parse_args_with_explicit_overrides could populate _cli_explicit_keys (as a set) and the legacy path could extract explicit values from that set + the full kwargs.

3. getattr(args, "_explicit_overrides", None) repeated ~25 times across examples (nit)

Every example script does getattr(args, "_explicit_overrides", None). Since parse_args_with_explicit_overrides always sets this attribute, args._explicit_overrides would suffice. But more importantly, this boilerplate could be avoided if Omni.__init__ extracted _explicit_overrides from _cli_explicit_keys-style info, or if a helper extracted the overrides dict. The current approach puts the burden on every caller to remember the getattr pattern.

4. Subparser traversal in parse_args_with_explicit_overrides is shallow (minor)

The function iterates parser._actions and checks one level of subparser choices. If there are nested subparsers (unlikely but possible), deeper options would be missed. The current depth is probably sufficient for this codebase.

5. _strip_single_engine_args filtering (question)

In _resolve_stage_configs, when stage_configs_path is not None, kwargs go through _strip_single_engine_args. Does that strip explicit_overrides? If not, same leak issue as point 1.

Correctness of the merge logic change

The key change in load_stage_configs_from_yaml looks correct:

  • Before: merge_configs(base_engine_args_tmp, stage_arg.engine_args) -- stage YAML wins over ALL caller kwargs
  • After: First merge plain kwargs under stage YAML (stage wins), then merge explicit overrides on top (explicit wins over stage when prefer_stage_engine_args=True, stage wins when prefer_stage_engine_args=False)

This correctly implements the stated precedence: explicit YAML > explicit_overrides > default YAML > plain kwargs.

Tests

Good coverage of the four precedence cases. The mocker.patch approach for resolve_model_config_path and load_yaml_config is clean. The lambda path: __import__(...) pattern in side_effect is a bit unusual -- a local import at the top of the test would be cleaner -- but it works.

Summary

The fix is correct for the legacy YAML path and the tests cover the important cases. The main concern is the explicit_overrides key leaking into the new config path as an extraneous kwarg. That should be addressed (either in this PR or a quick follow-up) to avoid subtle issues down the road.

@xiaohajiayou
Copy link
Copy Markdown
Contributor Author

This PR is being closed due to the ongoing YAML refactor in #2383.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Fix CLI defaults overriding stage configs

3 participants