[Bugfix] Remove duplicated config keyword max batch size#1851
Conversation
…cross multiple files This commit updates the documentation and configuration files to replace instances of `max_batch_size` with `max_num_seqs`, reflecting a change in how concurrent processing is managed in the system. The updates include changes in stage configurations, user guides, and examples to ensure consistency and clarity in the usage of the new parameter. Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2021d3e276
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if hasattr(stage_arg, "runtime") and stage_arg.runtime is not None and stage_type != "diffusion": | ||
| runtime_cfg = stage_arg.runtime | ||
| max_batch_size = int(runtime_cfg.get("max_batch_size", 1) or 1) | ||
| base_engine_args_tmp["max_num_seqs"] = max_batch_size | ||
| base_engine_args_tmp.async_chunk = global_async_chunk | ||
| stage_arg.engine_args = base_engine_args_tmp |
There was a problem hiding this comment.
Keep runtime.max_batch_size compatibility in YAML loader
This change removes the runtime.max_batch_size -> engine_args.max_num_seqs mapping, but there are still shipped stage configs that only set the runtime key (for example tests/perf/stage_configs/qwen3_tts.yaml:12 and benchmarks/qwen3-tts/vllm_omni/configs/qwen3_tts_bs4.yaml:10). After this commit those configs load without max_num_seqs, so _stage_worker falls back to batch size 1 and serializes requests, which materially skews perf/benchmark results.
Useful? React with 👍 / 👎.
| pass | ||
|
|
||
| max_batch_size = int(runtime_cfg.get("max_batch_size", 1) or 1) | ||
| max_batch_size = int(engine_args.get("max_num_seqs", 1)) |
There was a problem hiding this comment.
Read max_num_seqs before filtering diffusion args
For diffusion stages, engine_args is filtered by OmniDiffusionConfig before this line runs, and that dataclass has no max_num_seqs field, so the value is dropped and this always resolves to the default of 1. As a result, any diffusion stage config that sets engine_args.max_num_seqs cannot actually increase batching anymore.
Useful? React with 👍 / 👎.
hsliuustc0106
left a comment
There was a problem hiding this comment.
do we use this for control max_concurrency?
Signed-off-by: Zhou Taichang <tzhouam@connect.ust.hk>
hsliuustc0106
left a comment
There was a problem hiding this comment.
Summary
Comprehensive migration from runtime.max_batch_size to engine_args.max_num_seqs across the codebase. This standardizes concurrency control under engine_args and aligns stage config semantics with vLLM scheduler behavior.
Validated
- ✅ DCO signed
- ✅ All CI checks passed (build, pre-commit, buildkite, docs)
- ✅ Runtime code paths updated in
omni.py,async_omni.py,omni_stage.py - ✅ Legacy conversion logic removed from
load_stage_configs_from_yaml - ✅ All docs and examples updated consistently
- ✅ Test result shows Qwen3-Omni online serving works
Scope
51 files touched covering:
- Stage config docs and quickstart
- All model/executor stage YAMLs
- Platform-specific configs (NPU/ROCm/XPU)
- E2E/perf test configs
- Offline inference example READMEs
Clean refactor with clear motivation (issue #695).
Yes, as discussed before, vllm using max_num_seq to control the max concurrency, the max batch size defined by us is duplicated and misleading. |
| pass | ||
|
|
||
| max_batch_size = int(runtime_cfg.get("max_batch_size", 1) or 1) | ||
| max_batch_size = int(engine_args.get("max_num_seqs", 1)) |
There was a problem hiding this comment.
Should we consider backward compatibility for max_batch_size?
There was a problem hiding this comment.
same question: if the users have already put it inro production, how do we inform them this yaml arg changes?
There was a problem hiding this comment.
added both warning in the running code and doc
Signed-off-by: Zhou Taichang <tzhouam@connect.ust.hk>
|
resolve conflicts please. |
…ross multiple YAML files and update related documentation. This change enhances compatibility and prepares for future deprecations of max_batch_size. Additionally, added tests to ensure proper migration and handling of the new parameter.
…tageConfigFactory. Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
…g_reqs and add max_num_seqs to config aligned with changes introduced in vllm-project#1851 Signed-off-by: jader <yjader@foxmail.com>
…g_reqs and add max_num_seqs to config aligned with changes introduced in vllm-project#1851 Signed-off-by: jader <yjader@foxmail.com>
…g_reqs and add max_num_seqs to config aligned with changes introduced in vllm-project#1851 Signed-off-by: jader <yjader@foxmail.com>
…t#1851) Signed-off-by: tzhouam <tzhouam@connect.ust.hk> Signed-off-by: Zhou Taichang <tzhouam@connect.ust.hk>
Purpose
This PR migrates stage concurrency configuration from
runtime.max_batch_sizetoengine_args.max_num_seqsacross code, docs, examples, and test configs, as described in #695 .The change standardizes concurrency control under
engine_args, aligns stage config semantics with vLLM scheduler behavior, and removes legacy runtime-to-engine arg mapping.runtime.max_batch_sizewithengine_args.max_num_seqsin:engine_args.max_num_seqs:omni.pyandasync_omni.pyomni_stage.pyload_stage_configs_from_yamlthat copiedruntime.max_batch_sizeintoengine_args.max_num_seqs.max_batch_sizetomax_num_seqsfor consistency.Test Plan
Tested on qwen 3 omni online.
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)