-
Notifications
You must be signed in to change notification settings - Fork 843
[Config Refactor][2/N] Pipeline + Deploy Config Schema #2383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
hsliuustc0106
merged 112 commits into
vllm-project:main
from
lishunyang12:config-refactor-2a
Apr 19, 2026
Merged
Changes from all commits
Commits
Show all changes
112 commits
Select commit
Hold shift + click to select a range
b81b882
[Config Refactor][2/N] PipelineConfig + DeployConfig dataclasses + qw…
lishunyang12 be79bf6
Wire pipeline registry + deploy config for qwen3_omni, delete legacy …
lishunyang12 6845cd0
Fix stale qwen3_omni doc references
lishunyang12 0101625
Add --deploy-config, --stage-overrides CLI args and wire e2e
lishunyang12 df7adff
Align to final RFC design: model_arch→pipeline, connectors→deploy, sa…
lishunyang12 59102e3
Clean up: trim docstrings/comments, merge tests into test_config_factory
lishunyang12 a9f1c57
Remove RFC/name references from comments
lishunyang12 1195d35
Add TODOs for follow-up migration work
lishunyang12 eac199f
Add @lishunyang12 to TODOs
lishunyang12 5956f0d
Strip remaining comments from stage_config.py
lishunyang12 1f9e080
Strip remaining comments
lishunyang12 e368a7b
Refactor merge_pipeline_deploy: use asdict loop instead of field-by-f…
lishunyang12 20020c8
Simplify deploy config parsing and platform overrides
lishunyang12 d3f1833
Move deploy format detection to engine layer, minimize conftest changes
lishunyang12 3e7e22b
Add base_config inheritance for deploy YAMLs, slim CI configs to over…
lishunyang12 3c3417e
Remove default-value fields from main deploy YAML
lishunyang12 2456286
Add tests for base_config inheritance, platform overrides, CLI flow, …
lishunyang12 047655f
Fix base_config relative paths in dfx deploy configs
lishunyang12 8043433
Fix pre-existing test: make MockParallelConfig a dataclass
lishunyang12 00729ea
Fix: import pipeline module before checking registry
lishunyang12 bc9442b
Fix device assignment: fall back to flat devices when engine_args is …
lishunyang12 0726182
Fix OmegaConf serialization: convert dataclasses in cli_overrides
lishunyang12 52fc87d
Fix: filter argparse defaults from CLI overrides in registry path
lishunyang12 57e83cf
Fix: only apply per-stage CLI overrides in registry path
lishunyang12 df29048
Add repro script for online serving config path
lishunyang12 1f00d04
Address review feedback
lishunyang12 7ef989e
Move qwen3_omni CI deploy YAMLs to vllm_omni/deploy/ci/
lishunyang12 cd5d06e
Unify CI deploy YAMLs and fix CLI/YAML precedence
lishunyang12 8fa180f
Fix pre-commit: collapse wrapped strings in qwen3_omni/pipeline.py
lishunyang12 3f6fd2d
Add CPU-only smoke scripts for deploy schema and CLI explicit-key par…
lishunyang12 e2305b3
Add e2e serve smoke script for PR #2383
lishunyang12 70bc4c8
e2e_serve_smoke: print log snippets during readiness wait
lishunyang12 3c17d12
Migrate qwen2_5_omni and qwen3_tts to pipeline+deploy schema
lishunyang12 d277f9d
Update docs and examples to point at new deploy paths
lishunyang12 f805cd1
Complete refactoring for qwen3_omni_moe / qwen2_5_omni / qwen3_tts
lishunyang12 95d66c2
Address amy-why-3459 review feedback
lishunyang12 8eae5b4
Make prefix_caching and async_chunk CLI-overridable
lishunyang12 5f9abb8
Loosen qwen2_5_omni shared-cuda:0 budgets for flashinfer warmup
lishunyang12 6e5dd59
Drop qwen2_5_omni stage 0 to 0.48 for token2wav warmup spike
lishunyang12 a2b34f0
Disable flashinfer autotune on qwen2_5_omni token2wav stage
lishunyang12 dbb2ba2
Auto-set VLLM_ALLOW_LONG_MAX_MODEL_LEN for stages with explicit max_m…
lishunyang12 e986a16
Drop multiconnector deploy yamls in favor of base_config overlays
lishunyang12 1a7246c
Add --pipeline CLI flag to override deploy yaml pipeline selector
lishunyang12 52a4a5d
Drop qwen3_tts variant yamls in favor of CLI flag composition
lishunyang12 73c9ab3
Address #2383 review feedback
lishunyang12 dff7278
Move CI overlays from deploy/ci/*.yaml into tests/utils.py Python dicts
lishunyang12 b6733e6
Unify qwen3_tts pipeline: dispatch processors from async_chunk bool
lishunyang12 29c1816
Trim verbose comments
lishunyang12 466e828
Restore DFX deploy overlays and wire scripts to --deploy-config
lishunyang12 4b755b0
Restore DeployConfig.pipeline for variant topologies; drop dead qwen3…
lishunyang12 85aebc3
Collapse DFX tests; drop deploy overlays and obsolete update blocks
lishunyang12 d5c9fa3
Incorporate #2740 helpers: build_stage_runtime_overrides, strip_paren…
lishunyang12 e0e7955
Collapse qwen3_tts_bs{1,4,16}.yaml into BATCH_SIZE env + --stage-over…
lishunyang12 5b5bddb
Move thinker-only test yaml into _CI_OVERLAYS; drop orphan docs example
lishunyang12 5389122
Trim verbose comments and doc prose
lishunyang12 8772afa
Restore StageType re-export from vllm_omni.config
lishunyang12 3654c57
Replace stage CLI blacklists with OrchestratorArgs dataclass
lishunyang12 792ee10
Auto-discover pipelines; extract parent-args contracts
lishunyang12 91ad09c
Address reviews: tokenizer guard, direct setattr, qwen3_tts alias
lishunyang12 7e7e1be
Enable async_chunk by default for qwen2_5_omni
lishunyang12 7e782c7
Address reviews: JSON parse error, README wording
lishunyang12 ccb517e
Move lazy imports to top of file
lishunyang12 3eb75af
Address reviews: docs schema, override example, benchmark cleanup
lishunyang12 b1c9d1c
Rename qwen3_omni_moe.yaml to qwen3_omni.yaml for consistency
lishunyang12 ac0bd4e
Revert "Rename qwen3_omni_moe.yaml to qwen3_omni.yaml for consistency"
lishunyang12 3a3791b
Refactor merge_pipeline_deploy; add BVA tests
lishunyang12 404c1ed
Share detect_explicit_cli_keys between online and offline entry points
lishunyang12 6241b68
Pass _cli_explicit_keys in offline examples for the 3 migrated models
lishunyang12 095d1ea
Add Omni.from_args / AsyncOmni.from_args for error-proof argparse entry
lishunyang12 71f73c1
Rename from_args to from_cli_args to align with OmniEngineArgs
lishunyang12 4eb6395
Drop redundant default values from deploy YAMLs
lishunyang12 f7aab26
Rename arg_classification to arg_routing
lishunyang12 0e5e9d7
Merge arg_routing into arg_utils
lishunyang12 baac75e
Remove redundant edges: section from deploy YAMLs
lishunyang12 ac563e0
Drop more redundant default values from deploy YAMLs
lishunyang12 e1e6bc6
Keep devices/gpu_memory_utilization/shm_threshold_bytes for deploymen…
lishunyang12 18e82b6
Promote pipeline-wide settings from per-stage to top-level DeployConfig
lishunyang12 59a678d
Document implicit defaults and enforce_eager choices in deploy YAMLs
lishunyang12 78c8c5e
Drop redundant 'Parse X' section comments in initialization.py
lishunyang12 bf883c3
Revert "Drop redundant 'Parse X' section comments in initialization.py"
lishunyang12 f048711
Merge branch 'main' into config-refactor-2a
lishunyang12 20ab452
Fix OmniServerStageCli to recognize new deploy YAML 'stages' key
lishunyang12 9ede944
Fix advanced_model fixture to strip load_format from new-schema yamls
lishunyang12 59804d4
Capture stage subprocess logs for debugging failing startups
lishunyang12 5cfacb7
Apply ruff format to stage subprocess error raise
lishunyang12 5e1feec
docs: clarify runtime.devices are logical indices
lishunyang12 56b07e6
test(dfx): clarify --deploy-config and --stage-overrides compose
lishunyang12 89ed8bb
test(dfx): clarify --deploy-config and --stage-overrides compose in s…
lishunyang12 077d08d
test(qwen3_tts): drop dead get_stage_config wrapper; call get_deploy_…
lishunyang12 648fb0f
[qwen3_tts] remove redundant no_async_chunk pipeline alias (async_chu…
lishunyang12 386a903
cli: flag --stage-configs-path as deprecated in its help string
lishunyang12 fd2b847
entrypoints: make detect_explicit_cli_keys parser-aware to resolve re…
lishunyang12 d450e3e
entrypoints: raise on --stage-configs-path + --deploy-config both set
lishunyang12 93342ff
config: map StageExecutionType to scheduler class (not dotted-path st…
lishunyang12 0f6063d
config: raise on invalid stage_id / unmapped execution_type in get_sc…
lishunyang12 dc51960
config: warn on silent clobber in _deep_merge_stage when types mismatch
lishunyang12 2077d35
config: raise if async_chunk=True but no stage declares an async handler
lishunyang12 a43e29e
Merge branch 'main' into config-refactor-2a
lishunyang12 aed3671
cli: stash parser via type(self) so docs hook doesn't hit NameError
lishunyang12 acba8f4
config: accept custom_process_next_stage_input_func as valid async_ch…
lishunyang12 4e11fdd
Config CI Fixes (#53)
alex-jw-brooks bbb0fb9
test(qwen3_omni): drop stage_args engine_args override; rely on pipel…
lishunyang12 24aa5e3
Merge branch 'main' into config-refactor-2a
lishunyang12 fb426c7
test(qwen3_omni_realtime): migrate from deleted legacy yaml to ci ove…
lishunyang12 963df1d
Merge branch 'main' into config-refactor-2a
lishunyang12 ca9e775
test(qwen3_omni): restore async_chunk variant in expansion test; log …
lishunyang12 11a16c1
test(dfx): add extra_cli_args passthrough; restore qwen3_omni async_c…
lishunyang12 22fb84d
deploy(qwen2_5_omni): default async_chunk to false; feature not yet s…
lishunyang12 b53967d
test(dfx): scope qwen3_omni json diffs to yaml-path-only; preserve ma…
lishunyang12 8c6b746
Merge branch 'main' into config-refactor-2a
hsliuustc0106 93106e2
test(conftest): tail OmniServerStageCli per-stage logs to stdout on t…
lishunyang12 bfaccfd
serve(headless): forward cli_explicit_keys so argparse defaults don't…
lishunyang12 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -35,8 +35,8 @@ MODEL=Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice bash run_benchmark.sh --async-only | |
| # Use a Voice Clone model | ||
| MODEL=Qwen/Qwen3-TTS-12Hz-1.7B-Base TASK_TYPE=Base bash run_benchmark.sh --async-only | ||
|
|
||
| # Use bs16 config for higher throughput | ||
| STAGE_CONFIG=vllm_omni/configs/qwen3_tts_bs16.yaml bash run_benchmark.sh --async-only | ||
| # Use batch size 16 for higher throughput | ||
| BATCH_SIZE=16 bash run_benchmark.sh --async-only | ||
|
|
||
| # Custom GPU, prompt count, concurrency levels | ||
| GPU_DEVICE=1 NUM_PROMPTS=20 CONCURRENCY="1 4" bash run_benchmark.sh | ||
|
|
@@ -50,7 +50,8 @@ GPU_DEVICE=1 NUM_PROMPTS=20 CONCURRENCY="1 4" bash run_benchmark.sh | |
| CUDA_VISIBLE_DEVICES=0 python -m vllm_omni.entrypoints.cli.main serve \ | ||
| "Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice" \ | ||
| --omni --host 127.0.0.1 --port 8000 \ | ||
| --stage-configs-path benchmarks/qwen3-tts/vllm_omni/configs/qwen3_tts_bs1.yaml \ | ||
| --deploy-config vllm_omni/deploy/qwen3_tts.yaml \ | ||
| --stage-overrides '{"0":{"max_num_seqs":1,"gpu_memory_utilization":0.3,"max_num_batched_tokens":512},"1":{"max_num_seqs":1,"gpu_memory_utilization":0.3,"max_num_batched_tokens":8192}}' \ | ||
| --trust-remote-code | ||
| ``` | ||
|
|
||
|
|
@@ -84,16 +85,19 @@ python benchmarks/qwen3-tts/plot_results.py \ | |
| --output results/comparison.png | ||
| ``` | ||
|
|
||
| ## Stage Configs | ||
| ## Batch-size presets | ||
|
|
||
| | Config | max_num_seqs | Description | | ||
| |--------|:------------:|-------------| | ||
| | `vllm_omni/configs/qwen3_tts_bs1.yaml` | 1 | Single-request processing (lowest latency) | | ||
| | `vllm_omni/configs/qwen3_tts_bs16.yaml` | 16 | High-throughput concurrent processing | | ||
| The bench script loads the bundled production deploy (`vllm_omni/deploy/qwen3_tts.yaml`) and layers per-stage budgets on top via `--stage-overrides`, driven by the `BATCH_SIZE` env var. Each batch size picks compatible per-stage `max_num_seqs`, `max_num_batched_tokens`, and `gpu_memory_utilization` defaults: | ||
|
|
||
| All configs use a 2-stage pipeline (Talker -> Code2Wav) with `async_chunk` streaming enabled. The `SharedMemoryConnector` streams codec frames (25-frame chunks with 25-frame context overlap) between stages. | ||
| | `BATCH_SIZE` | Description | | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same here |
||
| |:--:|-------------| | ||
| | `1` (default) | Single-request processing (lowest latency) | | ||
| | `4` | Moderate-throughput concurrent processing | | ||
| | `16` | High-throughput concurrent processing | | ||
|
|
||
| The model is specified via the CLI `--model` flag (or `MODEL` env var), so the same configs work for both the 0.6B and 1.7B model variants. | ||
| The 2-stage pipeline (Talker -> Code2Wav) runs with `async_chunk` streaming enabled via the prod deploy; the `SharedMemoryConnector` streams codec frames (25-frame chunks with 25-frame context overlap) between stages. | ||
|
|
||
| The model is specified via the CLI `--model` flag (or `MODEL` env var), so the same bench script works for both the 0.6B and 1.7B model variants. | ||
|
|
||
| ## Metrics | ||
|
|
||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
89 changes: 0 additions & 89 deletions
89
benchmarks/qwen3-tts/vllm_omni/configs/qwen3_tts_bs16.yaml
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this
BATCH_SIZEactually about concurrency? @linyueqian