Skip to content

[Config Refactor]: Remove bagel yaml#2936

Merged
hsliuustc0106 merged 25 commits intovllm-project:mainfrom
princepride:remove-bagel-yaml
Apr 24, 2026
Merged

[Config Refactor]: Remove bagel yaml#2936
hsliuustc0106 merged 25 commits intovllm-project:mainfrom
princepride:remove-bagel-yaml

Conversation

@princepride
Copy link
Copy Markdown
Collaborator

@princepride princepride commented Apr 20, 2026

Purpose

Migrate Bagel model configuration from the legacy stage_configs/ YAML system to the new pipeline-driven auto-detection framework introduced in #2383 and #2915.

What this PR does

  1. Delete all legacy Bagel YAML stage configs (8 files):

    • vllm_omni/model_executor/stage_configs/bagel.yaml
    • vllm_omni/model_executor/stage_configs/bagel_multiconnector.yaml
    • vllm_omni/model_executor/stage_configs/bagel_single_stage.yaml
    • vllm_omni/model_executor/stage_configs/bagel_think.yaml
    • vllm_omni/model_executor/stage_configs/bagel_usp2.yaml
    • vllm_omni/platforms/xpu/stage_configs/bagel.yaml
    • tests/e2e/offline_inference/stage_configs/bagel_sharedmemory_ci.yaml
    • tests/e2e/offline_inference/stage_configs/bagel_mooncake_ci.yaml
  2. Add new pipeline-driven configs:

    • vllm_omni/model_executor/models/bagel/pipeline.py — defines BAGEL_PIPELINE (two-stage: Thinker → DiT) and BAGEL_SINGLE_STAGE_PIPELINE (DiT-only) as Python PipelineConfig dataclasses
    • vllm_omni/deploy/bagel.yaml — default two-stage deploy config with runtime defaults
    • vllm_omni/deploy/bagel_single_stage.yaml — single-stage deploy config (pipeline: bagel_single_stage)
    • Register both pipelines in vllm_omni/config/pipeline_registry.py
  3. Support single-stage deployment: Bagel's DiT stage can handle all modalities (text→image, image editing, image understanding, text understanding) independently. A dedicated BAGEL_SINGLE_STAGE_PIPELINE and deploy YAML enable this mode.

  4. Migrate CI tests to Python-based overlays:

    • Add bagel, bagel_single_stage, and bagel_mooncake entries to _CI_OVERLAYS in tests/helpers/stage_config.py
    • Update all Bagel test files (test_bagel_text2img.py, test_bagel_img2img.py, test_bagel_understanding.py, test_bagel_lora.py, test_bagel_online.py, test_quantization_fp8.py) to use get_deploy_config_path() instead of legacy stage_configs_path
  5. Update example scripts and documentation:

    • Simplify examples/offline_inference/bagel/end2end.py to derive is_single_stage dynamically from the loaded pipeline
    • Comprehensively update READMEs and docs for both offline inference and online serving with all usage patterns: two-stage / single-stage deployment, think mode, CFG, TP, FP8, multi-node, inter-stage connectors (SharedMemory vs Mooncake)
  6. Bug fixes:

    • Set async_chunk: false in deploy YAMLs to fix ValueError when no async-chunk processor is defined
    • Add missing from pathlib import Path import in test_bagel_lora.py
    • Relax LoRA scale assertion threshold (diff_2x < 160) to accommodate expected larger differences at scale=2.0

Test Plan

  • python3 tests/e2e/offline_inference/test_bagel_text2img.py
  • python3 tests/e2e/offline_inference/test_bagel_img2img.py
  • python3 tests/e2e/offline_inference/test_bagel_understanding.py
  • python3 tests/e2e/offline_inference/test_bagel_lora.py
  • Verify auto-detection works without explicit stage_configs_path
  • Verify single-stage deployment via bagel_single_stage deploy config

Test Result

All Bagel e2e tests pass with load_format: dummy CI overlays. The pipeline auto-detection correctly resolves both two-stage and single-stage topologies without any legacy YAML files.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

Copy link
Copy Markdown
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BLOCKER scan: PASS

Breaking Change Notice:
This PR removes legacy files and replaces them with . Users who explicitly reference the old file paths in or custom scripts will need to update to the new paths. Consider adding a deprecation warning or symlink if backward compatibility is needed for the release branch.

Copy link
Copy Markdown
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BLOCKER scan: PASS

Breaking Change Notice:
This PR removes legacy stage configs and replaces them with deploy configs. Users who explicitly reference the old file paths in stage-configs-path or custom scripts will need to update to the new paths. Consider adding a deprecation warning or symlink if backward compatibility is needed for the release branch.

Copy link
Copy Markdown
Collaborator Author

@princepride princepride left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

The migration from legacy stage_configs/ YAML to the pipeline-driven framework is clean overall. CI passes. I found 3 high-severity issues that should be addressed before merge, plus several medium/low items.

# Severity Issue
1 High Think mode broken in two-stage (wrong prompt_expand_func + kv_transfer_criteria)
2 High Mooncake test silently tests SharedMemory (missing connector bindings on stages)
3 High XPU platform override incomplete (missing FP8, GPU assignment, memory settings)
4 Medium hf_architectures collision between two pipelines
5 Medium README documents --tensor-parallel-size which doesn't exist in end2end.py
6 Medium --deploy-config vs --stage-configs-path inconsistency in docs
7 Low LoRA threshold too generous (80 → 160)
8 Low Fragile single-stage detection heuristic

Comment thread vllm_omni/model_executor/models/bagel/pipeline.py
Comment thread tests/helpers/stage_config.py
Comment thread vllm_omni/deploy/bagel.yaml Outdated
Comment thread vllm_omni/model_executor/models/bagel/pipeline.py Outdated
Comment thread examples/offline_inference/bagel/README.md Outdated
Comment thread examples/online_serving/bagel/README.md
Comment thread tests/e2e/offline_inference/test_bagel_lora.py Outdated
Comment thread examples/offline_inference/bagel/end2end.py
…ides

- Add BAGEL_THINK_PIPELINE with expand_cfg_prompts_think and no
  kv_transfer_criteria so Thinker decodes think tokens before KV transfer
- Create bagel_think.yaml (inherits bagel.yaml, sets pipeline: bagel_think)
- Restore --think auto-select in end2end.py
- Fix Mooncake CI overlay: add output_connectors/input_connectors on stages
- Complete XPU platform overrides (fp8, max_num_batched_tokens, stage 1 GPU)
- Set hf_architectures=() on single-stage and think pipelines
- Remove non-existent --tensor-parallel-size from offline READMEs
- Clarify --deploy-config vs deprecated --stage-configs-path in online READMEs
- Remove enforce_eager from all stage 0 in bagel deploy YAMLs
- Tighten LoRA diff_2x threshold from 160 to 120
- Add explanatory comment for single-stage detection heuristic

Signed-off-by: princepride <wangzhipeng628@gmail.com>
Made-with: Cursor
@princepride
Copy link
Copy Markdown
Collaborator Author

@hsliuustc0106 @lishunyang12 PTAL

@princepride
Copy link
Copy Markdown
Collaborator Author

@hsliuustc0106 @lishunyang12 PTAL

Can we quickly approve this pr? The use of vllm-omni as rollout in Verl may also depends on this change.

Signed-off-by: princepride <wangzhipeng628@gmail.com>
@hsliuustc0106 hsliuustc0106 added ready label to trigger buildkite CI merge-test label to trigger buildkite merge test CI omni-test label to trigger buildkite omni model test in nightly CI labels Apr 20, 2026
Comment thread vllm_omni/deploy/bagel.yaml Outdated
memory_pool_device: "cpu"

platforms:
xpu:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this verified?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only verified gpu, can't verify xpu

seed: 52

connectors:
shared_memory_connector:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which connectors do will users take?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default use shm

# vllm serve ByteDance-Seed/BAGEL-7B-MoT --omni \
# --deploy-config vllm_omni/deploy/bagel_think.yaml

base_config: bagel.yaml
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this used for? think-only?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does think mode need a separate YAML file instead of switching automatically per request?

There are exactly two differences between BAGEL_PIPELINE and BAGEL_THINK_PIPELINE:

Standard Think
prompt_expand_func expand_cfg_prompts — companions run normally expand_cfg_prompts_think — companions get max_tokens=1 (stop after prefill)
kv_transfer_criteria {"type": "prefill_finished"} — KV is transferred to DiT right after prefill, then the Thinker is stopped (absent) — no early transfer; the Thinker keeps decoding <think>...</think> tokens until EOS, then KV is transferred on completion

Why can't this be switched per request?

The root cause is that kv_transfer_criteria is a scheduler-level configuration, not a per-request property:

        self.kv_transfer_criteria = self._get_kv_transfer_criteria()

It is read once from PipelineConfig during scheduler initialization and then applied uniformly to all requests. See lines 123 and 149:

        if not self.kv_transfer_criteria:
            return False
        # ...
        if criteria_type == "prefill_finished":
            if request.num_computed_tokens >= request.num_prompt_tokens:
                self.transfer_triggered_requests.add(request.request_id)
                self._mark_request_for_kv_transfer(...)

If prefill_finished is set, all image requests are truncated after prefill — think requests would never get a chance to decode <think> tokens. Conversely, if unset, all requests wait until EOS, slowing down the standard (non-think) path.

Similarly, prompt_expand_func is also pipeline-level — it is called uniformly by the orchestrator during request expansion, with no per-request dispatch.

What would be needed to support per-request switching:

  1. Move kv_transfer_criteria from a scheduler-level attribute to a per-request attribute — each request would carry its own criteria.
  2. Modify _check_kv_transfer_criteria to read from the request object rather than self.kv_transfer_criteria.
  3. Have prompt_expand_func branch based on a per-request flag (e.g., think=True in sampling_params.extra_args).

This is a reasonable architectural improvement but is not yet implemented. In the meantime, a separate deploy YAML is required. In practice, bagel_think.yaml is minimal — it only contains two effective lines (base_config: bagel.yaml and pipeline: bagel_think), which tells the system to use a different PipelineConfig that changes the above two fields.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@natureofnature Maybe we need move kv_transfer_criteria from a scheduler-level attribute to a per-request attribute ?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we only use one yaml for deploy? You can refer to #2383

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nop, this feature now relay on scheduler, so we may refactor scheduler strategy allowed user use --think as request arguments.

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

can you add a recipe as well? later, we will delete all model related examples and only keep model agnostic t_2_i.py...

@princepride
Copy link
Copy Markdown
Collaborator Author

can you add a recipe as well? later, we will delete all model related examples and only keep model agnostic t_2_i.py...

It may difficult for bagel, because bagel don't have chat template, in bagel's example, I need manually create prompts, maybe next pr I can add custom ninja file and remove end2end example.

@lishunyang12
Copy link
Copy Markdown
Collaborator

Fix CI

…g merge

merge_pipeline_deploy silently dropped omni_kv_config, prompt_expand_func,
and cfg_kv_collect_func when building StageConfig from the pipeline registry,
breaking multi-stage KV transfer for Bagel img2img.

- stage_config: propagate omni_kv_config/prompt_expand_func/cfg_kv_collect_func
- bagel.yaml: declare input_connectors so transfer config discovers the edge
- stage_init_utils: resolve base_config inheritance before parsing connectors
- utils: add deploy/ as fallback in resolve_model_config_path

Made-with: Cursor

Signed-off-by: princepride <wangzhipeng628@gmail.com>
@princepride princepride mentioned this pull request Apr 21, 2026
5 tasks
@princepride
Copy link
Copy Markdown
Collaborator Author

The CI failed but seems not my code's problem, can you help approve it? @lishunyang12

@princepride
Copy link
Copy Markdown
Collaborator Author

image

I test it on my local device and passed the ut.

@lishunyang12
Copy link
Copy Markdown
Collaborator

PTAL @xiaohajiayou Can you help check if this pr has the override precedence issue you mentioned?

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

please also fix this bug: https://buildkite.com/vllm/vllm-omni/builds/7509/steps/canvas?sid=019db039-ebbf-4c0a-aa44-9f3c64e165a1&tab=output

INFO 04-21 13:29:55 [weight_utils.py:50] Using model weights format ['*']
--
INFO 04-21 13:29:55 [omni_base.py:146] [AsyncOmni] Initializing with model ByteDance-Seed/BAGEL-7B-MoT
INFO 04-21 13:29:55 [async_omni_engine.py:274] [AsyncOmniEngine] Initializing with model ByteDance-Seed/BAGEL-7B-MoT
ERROR 04-21 13:29:56 [stage_config.py:272] Failed to import pipeline module 'vllm_omni.model_executor.models.voxcpm2.pipeline' for 'voxcpm2': No module named 'librosa'
FAILEDPost-test GPU status:

Comment thread docs/user_guide/examples/offline_inference/bagel.md Outdated
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Comment thread examples/offline_inference/bagel/end2end.py Outdated
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Comment thread examples/offline_inference/bagel/end2end.py Outdated
Comment thread vllm_omni/deploy/bagel.yaml Outdated
resolved_config_path = config_path or resolve_model_config_path(model)
return load_omni_transfer_config(resolved_config_path)
if resolved_config_path is None:
return None
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This base_config inheritance change applies to every model, not just bagel. Could it be split into its own infra PR (or at minimum called out in the PR description)? Reviewers scanning a "remove bagel yaml" PR won't expect a generic config-loader change.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI tests use overlay YAMLs that inherit from vllm_omni/deploy/bagel.yaml. Without this merge step, the connectors: block defined in the base will not be visible when the overlay is loaded, and the transfer config parser will fail to resolve connector references.

if os.path.exists(complete_config_path):
return str(complete_config_path)

deploy_config_path = PROJECT_ROOT / "vllm_omni" / "deploy" / model_type_str
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same scope question — extending resolve_model_config_path to probe vllm_omni/deploy/<model>/ is a generic framework change that benefits every model. Belongs in its own PR or needs a callout in the PR description.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this change, auto-detect cannot find the new location and will fall all the way back to the legacy path.

engine_args.update(ds.engine_extras)
if deploy.async_chunk:
engine_args["async_chunk"] = True
if ps.omni_kv_config:
Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This omni_kv_config / prompt_expand_func / cfg_kv_collect_func forwarding from StagePipelineConfig to engine_args is needed by bagel

@pytest.mark.omni
@pytest.mark.parametrize("tp_size", [1, 2])
@hardware_test(res={"cuda": "H100", "rocm": "MI325"}, num_cards=2)
@pytest.mark.parametrize("tp_size", [1])
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was tp_size parametrize reduced from [1, 2] to [1] and num_cards from 2 to 1? This isn't a bagel migration change — looks like a CI-resource tweak that snuck in. Either justify or split out.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it will cause merge-test/nightly-test error, I believe current sleep mode implementation has some bug

@princepride princepride removed merge-test label to trigger buildkite merge test CI omni-test label to trigger buildkite omni model test in nightly CI labels Apr 23, 2026
@hsliuustc0106 hsliuustc0106 added omni-test label to trigger buildkite omni model test in nightly CI and removed ready label to trigger buildkite CI labels Apr 23, 2026
Signed-off-by: princepride <wangzhipeng628@gmail.com>
@princepride princepride enabled auto-merge (squash) April 23, 2026 12:45
princepride and others added 4 commits April 23, 2026 20:45
Resolve conflicts in docs/user_guide/examples/online_serving/bagel.md and
examples/online_serving/bagel/README.md by keeping the restructured
--deploy-config docs from this PR and dropping the stage-configs-path
references reintroduced by upstream vllm-project#2978.

Signed-off-by: princepride <wangzhipeng628@gmail.com>
Resolve conflict in vllm_omni/config/stage_config.py: combine upstream's
async_chunk sentinel-default fix (vllm-project#3078) with this PR's new
omni_kv_config engine_args propagation.

Signed-off-by: princepride <wangzhipeng628@gmail.com>
@hsliuustc0106 hsliuustc0106 disabled auto-merge April 24, 2026 13:20
@hsliuustc0106 hsliuustc0106 merged commit 37e5954 into vllm-project:main Apr 24, 2026
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

omni-test label to trigger buildkite omni model test in nightly CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants