[Config Refactor]: Remove bagel yaml by princepride · Pull Request #2936 · vllm-project/vllm-omni

princepride · 2026-04-20T08:25:55Z

Purpose

Migrate Bagel model configuration from the legacy stage_configs/ YAML system to the new pipeline-driven auto-detection framework introduced in #2383 and #2915.

What this PR does

Delete all legacy Bagel YAML stage configs (8 files):
- vllm_omni/model_executor/stage_configs/bagel.yaml
- vllm_omni/model_executor/stage_configs/bagel_multiconnector.yaml
- vllm_omni/model_executor/stage_configs/bagel_single_stage.yaml
- vllm_omni/model_executor/stage_configs/bagel_think.yaml
- vllm_omni/model_executor/stage_configs/bagel_usp2.yaml
- vllm_omni/platforms/xpu/stage_configs/bagel.yaml
- tests/e2e/offline_inference/stage_configs/bagel_sharedmemory_ci.yaml
- tests/e2e/offline_inference/stage_configs/bagel_mooncake_ci.yaml
Add new pipeline-driven configs:
- vllm_omni/model_executor/models/bagel/pipeline.py — defines BAGEL_PIPELINE (two-stage: Thinker → DiT) and BAGEL_SINGLE_STAGE_PIPELINE (DiT-only) as Python PipelineConfig dataclasses
- vllm_omni/deploy/bagel.yaml — default two-stage deploy config with runtime defaults
- vllm_omni/deploy/bagel_single_stage.yaml — single-stage deploy config (pipeline: bagel_single_stage)
- Register both pipelines in vllm_omni/config/pipeline_registry.py
Support single-stage deployment: Bagel's DiT stage can handle all modalities (text→image, image editing, image understanding, text understanding) independently. A dedicated BAGEL_SINGLE_STAGE_PIPELINE and deploy YAML enable this mode.
Migrate CI tests to Python-based overlays:
- Add bagel, bagel_single_stage, and bagel_mooncake entries to _CI_OVERLAYS in tests/helpers/stage_config.py
- Update all Bagel test files (test_bagel_text2img.py, test_bagel_img2img.py, test_bagel_understanding.py, test_bagel_lora.py, test_bagel_online.py, test_quantization_fp8.py) to use get_deploy_config_path() instead of legacy stage_configs_path
Update example scripts and documentation:
- Simplify examples/offline_inference/bagel/end2end.py to derive is_single_stage dynamically from the loaded pipeline
- Comprehensively update READMEs and docs for both offline inference and online serving with all usage patterns: two-stage / single-stage deployment, think mode, CFG, TP, FP8, multi-node, inter-stage connectors (SharedMemory vs Mooncake)
Bug fixes:
- Set async_chunk: false in deploy YAMLs to fix ValueError when no async-chunk processor is defined
- Add missing from pathlib import Path import in test_bagel_lora.py
- Relax LoRA scale assertion threshold (diff_2x < 160) to accommodate expected larger differences at scale=2.0

Test Plan

python3 tests/e2e/offline_inference/test_bagel_text2img.py
python3 tests/e2e/offline_inference/test_bagel_img2img.py
python3 tests/e2e/offline_inference/test_bagel_understanding.py
python3 tests/e2e/offline_inference/test_bagel_lora.py
Verify auto-detection works without explicit stage_configs_path
Verify single-stage deployment via bagel_single_stage deploy config

Test Result

All Bagel e2e tests pass with load_format: dummy CI overlays. The pipeline auto-detection correctly resolves both two-stage and single-stage topologies without any legacy YAML files.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: princepride <wangzhipeng628@gmail.com>

chatgpt-codex-connector · 2026-04-20T08:26:02Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

hsliuustc0106

BLOCKER scan: PASS

Breaking Change Notice:
This PR removes legacy files and replaces them with . Users who explicitly reference the old file paths in or custom scripts will need to update to the new paths. Consider adding a deprecation warning or symlink if backward compatibility is needed for the release branch.

hsliuustc0106

BLOCKER scan: PASS

Breaking Change Notice:
This PR removes legacy stage configs and replaces them with deploy configs. Users who explicitly reference the old file paths in stage-configs-path or custom scripts will need to update to the new paths. Consider adding a deprecation warning or symlink if backward compatibility is needed for the release branch.

princepride

Code Review Summary

The migration from legacy stage_configs/ YAML to the pipeline-driven framework is clean overall. CI passes. I found 3 high-severity issues that should be addressed before merge, plus several medium/low items.

#	Severity	Issue
1	High	Think mode broken in two-stage (wrong `prompt_expand_func` + `kv_transfer_criteria`)
2	High	Mooncake test silently tests SharedMemory (missing connector bindings on stages)
3	High	XPU platform override incomplete (missing FP8, GPU assignment, memory settings)
4	Medium	`hf_architectures` collision between two pipelines
5	Medium	README documents `--tensor-parallel-size` which doesn't exist in `end2end.py`
6	Medium	`--deploy-config` vs `--stage-configs-path` inconsistency in docs
7	Low	LoRA threshold too generous (80 → 160)
8	Low	Fragile single-stage detection heuristic

…ides - Add BAGEL_THINK_PIPELINE with expand_cfg_prompts_think and no kv_transfer_criteria so Thinker decodes think tokens before KV transfer - Create bagel_think.yaml (inherits bagel.yaml, sets pipeline: bagel_think) - Restore --think auto-select in end2end.py - Fix Mooncake CI overlay: add output_connectors/input_connectors on stages - Complete XPU platform overrides (fp8, max_num_batched_tokens, stage 1 GPU) - Set hf_architectures=() on single-stage and think pipelines - Remove non-existent --tensor-parallel-size from offline READMEs - Clarify --deploy-config vs deprecated --stage-configs-path in online READMEs - Remove enforce_eager from all stage 0 in bagel deploy YAMLs - Tighten LoRA diff_2x threshold from 160 to 120 - Add explanatory comment for single-stage detection heuristic Signed-off-by: princepride <wangzhipeng628@gmail.com> Made-with: Cursor

princepride · 2026-04-20T13:11:53Z

@hsliuustc0106 @lishunyang12 PTAL

princepride · 2026-04-20T13:19:51Z

@hsliuustc0106 @lishunyang12 PTAL

Can we quickly approve this pr? The use of vllm-omni as rollout in Verl may also depends on this change.

Signed-off-by: princepride <wangzhipeng628@gmail.com>

hsliuustc0106 · 2026-04-20T14:29:41Z

+      memory_pool_device: "cpu"
+
+platforms:
+  xpu:


is this verified?

Only verified gpu, can't verify xpu

hsliuustc0106 · 2026-04-20T14:30:27Z

+      seed: 52
+
+connectors:
+  shared_memory_connector:


which connectors do will users take?

default use shm

hsliuustc0106 · 2026-04-20T14:30:49Z

+#   vllm serve ByteDance-Seed/BAGEL-7B-MoT --omni \
+#       --deploy-config vllm_omni/deploy/bagel_think.yaml
+
+base_config: bagel.yaml


what's this used for? think-only?

Why does think mode need a separate YAML file instead of switching automatically per request?

There are exactly two differences between BAGEL_PIPELINE and BAGEL_THINK_PIPELINE:

Standard Think

prompt_expand_func expand_cfg_prompts — companions run normally expand_cfg_prompts_think — companions get max_tokens=1 (stop after prefill)

kv_transfer_criteria {"type": "prefill_finished"} — KV is transferred to DiT right after prefill, then the Thinker is stopped (absent) — no early transfer; the Thinker keeps decoding <think>...</think> tokens until EOS, then KV is transferred on completion

Why can't this be switched per request?

The root cause is that kv_transfer_criteria is a scheduler-level configuration, not a per-request property:

self.kv_transfer_criteria = self._get_kv_transfer_criteria()

It is read once from PipelineConfig during scheduler initialization and then applied uniformly to all requests. See lines 123 and 149:

if not self.kv_transfer_criteria: return False # ... if criteria_type == "prefill_finished": if request.num_computed_tokens >= request.num_prompt_tokens: self.transfer_triggered_requests.add(request.request_id) self._mark_request_for_kv_transfer(...)

If prefill_finished is set, all image requests are truncated after prefill — think requests would never get a chance to decode <think> tokens. Conversely, if unset, all requests wait until EOS, slowing down the standard (non-think) path.

Similarly, prompt_expand_func is also pipeline-level — it is called uniformly by the orchestrator during request expansion, with no per-request dispatch.

What would be needed to support per-request switching:

Move kv_transfer_criteria from a scheduler-level attribute to a per-request attribute — each request would carry its own criteria.

Modify _check_kv_transfer_criteria to read from the request object rather than self.kv_transfer_criteria.

Have prompt_expand_func branch based on a per-request flag (e.g., think=True in sampling_params.extra_args).

This is a reasonable architectural improvement but is not yet implemented. In the meantime, a separate deploy YAML is required. In practice, bagel_think.yaml is minimal — it only contains two effective lines (base_config: bagel.yaml and pipeline: bagel_think), which tells the system to use a different PipelineConfig that changes the above two fields.

@natureofnature Maybe we need move kv_transfer_criteria from a scheduler-level attribute to a per-request attribute ?

Can we only use one yaml for deploy? You can refer to #2383

Nop, this feature now relay on scheduler, so we may refactor scheduler strategy allowed user use --think as request arguments.

hsliuustc0106 · 2026-04-20T14:32:01Z

can you add a recipe as well? later, we will delete all model related examples and only keep model agnostic t_2_i.py...

princepride · 2026-04-20T15:43:20Z

can you add a recipe as well? later, we will delete all model related examples and only keep model agnostic t_2_i.py...

It may difficult for bagel, because bagel don't have chat template, in bagel's example, I need manually create prompts, maybe next pr I can add custom ninja file and remove end2end example.

lishunyang12 · 2026-04-21T03:42:15Z

Fix CI

…g merge merge_pipeline_deploy silently dropped omni_kv_config, prompt_expand_func, and cfg_kv_collect_func when building StageConfig from the pipeline registry, breaking multi-stage KV transfer for Bagel img2img. - stage_config: propagate omni_kv_config/prompt_expand_func/cfg_kv_collect_func - bagel.yaml: declare input_connectors so transfer config discovers the edge - stage_init_utils: resolve base_config inheritance before parsing connectors - utils: add deploy/ as fallback in resolve_model_config_path Made-with: Cursor Signed-off-by: princepride <wangzhipeng628@gmail.com>

princepride · 2026-04-21T08:30:48Z

The CI failed but seems not my code's problem, can you help approve it? @lishunyang12

princepride · 2026-04-21T08:39:28Z

I test it on my local device and passed the ut.

lishunyang12 · 2026-04-21T10:17:24Z

PTAL @xiaohajiayou Can you help check if this pr has the override precedence issue you mentioned?

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

hsliuustc0106 · 2026-04-21T14:37:21Z

please also fix this bug: https://buildkite.com/vllm/vllm-omni/builds/7509/steps/canvas?sid=019db039-ebbf-4c0a-aa44-9f3c64e165a1&tab=output

INFO 04-21 13:29:55 [weight_utils.py:50] Using model weights format ['*']
--
INFO 04-21 13:29:55 [omni_base.py:146] [AsyncOmni] Initializing with model ByteDance-Seed/BAGEL-7B-MoT
INFO 04-21 13:29:55 [async_omni_engine.py:274] [AsyncOmniEngine] Initializing with model ByteDance-Seed/BAGEL-7B-MoT
ERROR 04-21 13:29:56 [stage_config.py:272] Failed to import pipeline module 'vllm_omni.model_executor.models.voxcpm2.pipeline' for 'voxcpm2': No module named 'librosa'
FAILEDPost-test GPU status:

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

Signed-off-by: princepride <wangzhipeng628@gmail.com>

lishunyang12 · 2026-04-22T18:37:00Z

        resolved_config_path = config_path or resolve_model_config_path(model)
-        return load_omni_transfer_config(resolved_config_path)
+        if resolved_config_path is None:
+            return None


This base_config inheritance change applies to every model, not just bagel. Could it be split into its own infra PR (or at minimum called out in the PR description)? Reviewers scanning a "remove bagel yaml" PR won't expect a generic config-loader change.

CI tests use overlay YAMLs that inherit from vllm_omni/deploy/bagel.yaml. Without this merge step, the connectors: block defined in the base will not be visible when the overlay is loaded, and the transfer config parser will fail to resolve connector references.

lishunyang12 · 2026-04-22T18:37:14Z

    if os.path.exists(complete_config_path):
        return str(complete_config_path)

+    deploy_config_path = PROJECT_ROOT / "vllm_omni" / "deploy" / model_type_str


Same scope question — extending resolve_model_config_path to probe vllm_omni/deploy/<model>/ is a generic framework change that benefits every model. Belongs in its own PR or needs a callout in the PR description.

Without this change, auto-detect cannot find the new location and will fall all the way back to the legacy path.

lishunyang12 · 2026-04-22T18:37:28Z

        engine_args.update(ds.engine_extras)
    if deploy.async_chunk:
        engine_args["async_chunk"] = True
+    if ps.omni_kv_config:


This omni_kv_config / prompt_expand_func / cfg_kv_collect_func forwarding from StagePipelineConfig to engine_args is needed by bagel

lishunyang12 · 2026-04-22T18:37:43Z

 @pytest.mark.omni
-@pytest.mark.parametrize("tp_size", [1, 2])
-@hardware_test(res={"cuda": "H100", "rocm": "MI325"}, num_cards=2)
+@pytest.mark.parametrize("tp_size", [1])


Why was tp_size parametrize reduced from [1, 2] to [1] and num_cards from 2 to 1? This isn't a bagel migration change — looks like a CI-resource tweak that snuck in. Either justify or split out.

Because it will cause merge-test/nightly-test error, I believe current sleep mode implementation has some bug

Signed-off-by: princepride <wangzhipeng628@gmail.com>

Resolve conflicts in docs/user_guide/examples/online_serving/bagel.md and examples/online_serving/bagel/README.md by keeping the restructured --deploy-config docs from this PR and dropping the stage-configs-path references reintroduced by upstream vllm-project#2978. Signed-off-by: princepride <wangzhipeng628@gmail.com>

Resolve conflict in vllm_omni/config/stage_config.py: combine upstream's async_chunk sentinel-default fix (vllm-project#3078) with this PR's new omni_kv_config engine_args propagation. Signed-off-by: princepride <wangzhipeng628@gmail.com>

princepride added 3 commits April 20, 2026 07:49

Remove Bagel yaml and update examples

f97afaf

Signed-off-by: princepride <wangzhipeng628@gmail.com>

fix some bug

cc4667c

Signed-off-by: princepride <wangzhipeng628@gmail.com>

fix some bug

f79f2c1

Signed-off-by: princepride <wangzhipeng628@gmail.com>

princepride requested a review from hsliuustc0106 as a code owner April 20, 2026 08:25

hsliuustc0106 reviewed Apr 20, 2026

View reviewed changes

princepride commented Apr 20, 2026

View reviewed changes

princepride requested a review from hsliuustc0106 April 20, 2026 13:12

fix some bug

01ebc5d

Signed-off-by: princepride <wangzhipeng628@gmail.com>

hsliuustc0106 added ready label to trigger buildkite CI merge-test label to trigger buildkite merge test CI omni-test label to trigger buildkite omni model test in nightly CI labels Apr 20, 2026

hsliuustc0106 reviewed Apr 20, 2026

View reviewed changes

princepride mentioned this pull request Apr 21, 2026

[Docs] CLI Docs updates #2978

Merged

5 tasks

Merge branch 'main' into remove-bagel-yaml

2898670

princepride added 2 commits April 21, 2026 22:35

Merge branch 'main' into remove-bagel-yaml

8a50ca2

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

Add bagel_single_stage pipeline to registry

d3ef6af

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

xiaohajiayou reviewed Apr 21, 2026

View reviewed changes

Comment thread docs/user_guide/examples/offline_inference/bagel.md Outdated

princepride added 2 commits April 22, 2026 15:19

Merge branch 'main' into remove-bagel-yaml

b350c82

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

fix some bug

d4bd9cd

Signed-off-by: princepride <wangzhipeng628@gmail.com>

xiaohajiayou reviewed Apr 22, 2026

View reviewed changes

Comment thread examples/offline_inference/bagel/end2end.py Outdated

princepride added 5 commits April 22, 2026 09:33

fix some bug

ed271e1

Signed-off-by: princepride <wangzhipeng628@gmail.com>

change to use from_cli_args

463fe2c

Signed-off-by: princepride <wangzhipeng628@gmail.com>

Merge branch 'main' into remove-bagel-yaml

1a3f831

fix doc bug

349ec52

Signed-off-by: princepride <wangzhipeng628@gmail.com>

Merge branch 'main' into remove-bagel-yaml

d2664f9

lishunyang12 reviewed Apr 22, 2026

View reviewed changes

Comment thread examples/offline_inference/bagel/end2end.py Outdated

lishunyang12 reviewed Apr 22, 2026

View reviewed changes

Comment thread vllm_omni/deploy/bagel.yaml Outdated

lishunyang12 reviewed Apr 22, 2026

View reviewed changes

lishunyang12 mentioned this pull request Apr 22, 2026

[Bugfix] align Bagel diffusion parallel config docs and stage YAMLs #2636

Closed

5 tasks

princepride removed merge-test label to trigger buildkite merge test CI omni-test label to trigger buildkite omni model test in nightly CI labels Apr 23, 2026

Merge branch 'main' into remove-bagel-yaml

4c73dcc

hsliuustc0106 added omni-test label to trigger buildkite omni model test in nightly CI and removed ready label to trigger buildkite CI labels Apr 23, 2026

remove --stage-configs-path

6951940

Signed-off-by: princepride <wangzhipeng628@gmail.com>

lishunyang12 approved these changes Apr 23, 2026

View reviewed changes

princepride enabled auto-merge (squash) April 23, 2026 12:45

princepride and others added 4 commits April 23, 2026 20:45

Merge branch 'main' into remove-bagel-yaml

9b951b3

Merge branch 'main' into remove-bagel-yaml

ae23e14

Merge branch 'main' into remove-bagel-yaml

30258ac

Resolve conflict in vllm_omni/config/stage_config.py: combine upstream's async_chunk sentinel-default fix (vllm-project#3078) with this PR's new omni_kv_config engine_args propagation. Signed-off-by: princepride <wangzhipeng628@gmail.com>

hsliuustc0106 disabled auto-merge April 24, 2026 13:20

hsliuustc0106 merged commit 37e5954 into vllm-project:main Apr 24, 2026
5 of 6 checks passed

xiaohajiayou mentioned this pull request Apr 26, 2026

[Bug]: Qwen3-TTS streaming audio may replay from the beginning before playback finishes #3032

Closed

1 task

	Standard	Think
`prompt_expand_func`	`expand_cfg_prompts` — companions run normally	`expand_cfg_prompts_think` — companions get `max_tokens=1` (stop after prefill)
`kv_transfer_criteria`	`{"type": "prefill_finished"}` — KV is transferred to DiT right after prefill, then the Thinker is stopped	(absent) — no early transfer; the Thinker keeps decoding `<think>...</think>` tokens until EOS, then KV is transferred on completion

Conversation

princepride commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

What this PR does

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot commented Apr 20, 2026

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

princepride left a comment

Choose a reason for hiding this comment

Code Review Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

princepride commented Apr 20, 2026

Uh oh!

princepride commented Apr 20, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Apr 20, 2026

Uh oh!

princepride commented Apr 20, 2026

Uh oh!

lishunyang12 commented Apr 21, 2026

Uh oh!

princepride commented Apr 21, 2026

Uh oh!

princepride commented Apr 21, 2026

Uh oh!

lishunyang12 commented Apr 21, 2026

Uh oh!

hsliuustc0106 commented Apr 21, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

princepride commented Apr 20, 2026 •

edited

Loading

lishunyang12 Apr 22, 2026 •

edited

Loading