Skip to content

[Refactor] Remove redundant StageDeployConfig fields, delegate to vLLM defaults#3128

Open
gcanlin wants to merge 18 commits into
vllm-project:mainfrom
gcanlin:pipeline-clean
Open

[Refactor] Remove redundant StageDeployConfig fields, delegate to vLLM defaults#3128
gcanlin wants to merge 18 commits into
vllm-project:mainfrom
gcanlin:pipeline-clean

Conversation

@gcanlin
Copy link
Copy Markdown
Collaborator

@gcanlin gcanlin commented Apr 25, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Remove fields from StageDeployConfig that duplicate vLLM EngineArgs defaults:

  • gpu_memory_utilization
  • tensor_parallel_size
  • enforce_eager
  • max_num_batched_tokens
  • max_model_len
  • async_scheduling

These fields now flow through engine_extras and inherit vLLM's hardware-specific defaults. Existing YAML configs remain compatible as unrecognized fields automatically enter engine_extras.

Retained fields with vllm-omni specific behavior:

  • max_num_seqs (default 64 vs vLLM's 256)
  • devices (stage device assignment)
  • default_sampling_params / subtalker_sampling_params (per-stage sampling)

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

…M defaults

Signed-off-by: gcanlin <canlinguosdu@gmail.com>
@gcanlin gcanlin requested a review from hsliuustc0106 as a code owner April 25, 2026 02:58
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@gcanlin gcanlin requested review from hsliuustc0106, linyueqian, lishunyang12 and princepride and removed request for hsliuustc0106 April 25, 2026 02:58
@gcanlin gcanlin changed the title [Refactor] Remove redundant StageDeployConfig fields, delegate to vLL… [Refactor] Remove redundant StageDeployConfig fields, delegate to vLLM defaults Apr 25, 2026
@gcanlin gcanlin added the ready label to trigger buildkite CI label Apr 25, 2026
Copy link
Copy Markdown
Collaborator Author

@gcanlin gcanlin Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that this field is changing the default value from vllm. And it produces the different behavior before stage config refactoring. So I think we don't need to set the default on vllm-omni side. vLLM has the selector method for this field:

https://github.com/vllm-project/vllm/blob/e54894fc85a9861fb38a49701b5844462c3d77e4/vllm/engine/arg_utils.py#L2179-L2260

@gcanlin
Copy link
Copy Markdown
Collaborator Author

gcanlin commented Apr 25, 2026

cc @amy-why-3459

@amy-why-3459
Copy link
Copy Markdown
Contributor

cc @amy-why-3459

I completely agree with your point of view. I believe that the default parameters should be consistent with vLLM; otherwise, performance will be affected by the inconsistency in default parameters.

@amy-why-3459
Copy link
Copy Markdown
Contributor

If it involves modifying deployment parameters, I suggest testing with nightly-test.

@gcanlin gcanlin added omni-test label to trigger buildkite omni model test in nightly CI nightly-test label to trigger buildkite nightly test CI and removed omni-test label to trigger buildkite omni model test in nightly CI labels Apr 25, 2026
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
@gcanlin gcanlin added omni-test label to trigger buildkite omni model test in nightly CI and removed nightly-test label to trigger buildkite nightly test CI ready label to trigger buildkite CI labels Apr 25, 2026
gcanlin added 2 commits April 25, 2026 07:27
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
@gcanlin
Copy link
Copy Markdown
Collaborator Author

gcanlin commented Apr 25, 2026

cc @yenuo26

Comment thread tests/dfx/perf/scripts/run_benchmark.py Outdated


DEPLOY_CONFIGS_DIR = Path(__file__).parent.parent / "deploy"
DEPLOY_CONFIGS_DIR = Path(__file__).resolve().parents[4] / "vllm_omni" / "deploy"
Copy link
Copy Markdown
Collaborator

@yenuo26 yenuo26 Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you can use get_deploy_config_path in tests/helpers/stage_config.py

Signed-off-by: gcanlin <canlinguosdu@gmail.com>
"stage_args": {
0: {"engine_args.enable_prefix_caching": True},
"stages": {
0: {"enable_prefix_caching": True},
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this not support passing arguments from the command line?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. But would be better to unify it in a follow-up PR.

@hsliuustc0106 hsliuustc0106 added the ready label to trigger buildkite CI label Apr 25, 2026
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
stage_id: int
max_num_seqs: int = 64
gpu_memory_utilization: float = 0.9
tensor_parallel_size: int = 1
Copy link
Copy Markdown
Contributor

@xiaohajiayou xiaohajiayou Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the requirement where fields defined in deploy YAML are selectively overridden,
these fields still need to be retained as a whitelist to preserve the override semantics.

The core issue here is handling default values. A more consistent approach would be to treat
all values as optional (i.e., default to None), and introduce a filtering
step in deploy_create_from_registry(), such as:

explicit_overrides = {k: v for k, v in override_fields.items() if v is not None}

This ensures that only values explicitly provided in the deploy YAML are included in the resulting StageConfig, while missing fields are left unset and handled by downstream default resolution.

With this design, as discussed in #3162:

  • For LLM stages, missing fields will fall back to the default values defined in vLLM's EngineArgs.
  • For diffusion stages, missing fields will be handled by _create_default_diffusion_stage_cfg,
    which provides safe defaults.

This unifies the override behavior while delegating default resolution to the appropriate
downstream layer.

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

lgtm, the CI failure may not relate to this PR, resolve conflicts please

@gcanlin gcanlin enabled auto-merge (squash) April 28, 2026 12:26
@gcanlin gcanlin disabled auto-merge April 28, 2026 12:26
@hsliuustc0106 hsliuustc0106 removed ready label to trigger buildkite CI omni-test label to trigger buildkite omni model test in nightly CI labels Apr 29, 2026
@hsliuustc0106 hsliuustc0106 added the high priority high priority issue, needs to be done asap label Apr 30, 2026
gcanlin added 2 commits May 2, 2026 06:58
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
@gcanlin gcanlin added the omni-test label to trigger buildkite omni model test in nightly CI label May 2, 2026
gcanlin and others added 5 commits May 3, 2026 11:32
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
@hsliuustc0106 hsliuustc0106 added the ready label to trigger buildkite CI label May 3, 2026
gcanlin added 2 commits May 4, 2026 01:36
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
@Gaohan123
Copy link
Copy Markdown
Collaborator

Fix conflicts and pre-commit problems please

@Gaohan123 Gaohan123 removed ready label to trigger buildkite CI high priority high priority issue, needs to be done asap omni-test label to trigger buildkite omni model test in nightly CI labels May 6, 2026
@Gaohan123 Gaohan123 modified the milestones: v0.20.0, v0.22.0 May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants