[Config Refactor]: Remove bagel yaml#2936
Conversation
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
hsliuustc0106
left a comment
There was a problem hiding this comment.
BLOCKER scan: PASS
Breaking Change Notice:
This PR removes legacy files and replaces them with . Users who explicitly reference the old file paths in or custom scripts will need to update to the new paths. Consider adding a deprecation warning or symlink if backward compatibility is needed for the release branch.
hsliuustc0106
left a comment
There was a problem hiding this comment.
BLOCKER scan: PASS
Breaking Change Notice:
This PR removes legacy stage configs and replaces them with deploy configs. Users who explicitly reference the old file paths in stage-configs-path or custom scripts will need to update to the new paths. Consider adding a deprecation warning or symlink if backward compatibility is needed for the release branch.
princepride
left a comment
There was a problem hiding this comment.
Code Review Summary
The migration from legacy stage_configs/ YAML to the pipeline-driven framework is clean overall. CI passes. I found 3 high-severity issues that should be addressed before merge, plus several medium/low items.
| # | Severity | Issue |
|---|---|---|
| 1 | High | Think mode broken in two-stage (wrong prompt_expand_func + kv_transfer_criteria) |
| 2 | High | Mooncake test silently tests SharedMemory (missing connector bindings on stages) |
| 3 | High | XPU platform override incomplete (missing FP8, GPU assignment, memory settings) |
| 4 | Medium | hf_architectures collision between two pipelines |
| 5 | Medium | README documents --tensor-parallel-size which doesn't exist in end2end.py |
| 6 | Medium | --deploy-config vs --stage-configs-path inconsistency in docs |
| 7 | Low | LoRA threshold too generous (80 → 160) |
| 8 | Low | Fragile single-stage detection heuristic |
…ides - Add BAGEL_THINK_PIPELINE with expand_cfg_prompts_think and no kv_transfer_criteria so Thinker decodes think tokens before KV transfer - Create bagel_think.yaml (inherits bagel.yaml, sets pipeline: bagel_think) - Restore --think auto-select in end2end.py - Fix Mooncake CI overlay: add output_connectors/input_connectors on stages - Complete XPU platform overrides (fp8, max_num_batched_tokens, stage 1 GPU) - Set hf_architectures=() on single-stage and think pipelines - Remove non-existent --tensor-parallel-size from offline READMEs - Clarify --deploy-config vs deprecated --stage-configs-path in online READMEs - Remove enforce_eager from all stage 0 in bagel deploy YAMLs - Tighten LoRA diff_2x threshold from 160 to 120 - Add explanatory comment for single-stage detection heuristic Signed-off-by: princepride <wangzhipeng628@gmail.com> Made-with: Cursor
|
Can we quickly approve this pr? The use of vllm-omni as rollout in Verl may also depends on this change. |
Signed-off-by: princepride <wangzhipeng628@gmail.com>
| memory_pool_device: "cpu" | ||
|
|
||
| platforms: | ||
| xpu: |
There was a problem hiding this comment.
Only verified gpu, can't verify xpu
| seed: 52 | ||
|
|
||
| connectors: | ||
| shared_memory_connector: |
There was a problem hiding this comment.
which connectors do will users take?
There was a problem hiding this comment.
default use shm
| # vllm serve ByteDance-Seed/BAGEL-7B-MoT --omni \ | ||
| # --deploy-config vllm_omni/deploy/bagel_think.yaml | ||
|
|
||
| base_config: bagel.yaml |
There was a problem hiding this comment.
what's this used for? think-only?
There was a problem hiding this comment.
Why does think mode need a separate YAML file instead of switching automatically per request?
There are exactly two differences between BAGEL_PIPELINE and BAGEL_THINK_PIPELINE:
| Standard | Think | |
|---|---|---|
prompt_expand_func |
expand_cfg_prompts — companions run normally |
expand_cfg_prompts_think — companions get max_tokens=1 (stop after prefill) |
kv_transfer_criteria |
{"type": "prefill_finished"} — KV is transferred to DiT right after prefill, then the Thinker is stopped |
(absent) — no early transfer; the Thinker keeps decoding <think>...</think> tokens until EOS, then KV is transferred on completion |
Why can't this be switched per request?
The root cause is that kv_transfer_criteria is a scheduler-level configuration, not a per-request property:
self.kv_transfer_criteria = self._get_kv_transfer_criteria()It is read once from PipelineConfig during scheduler initialization and then applied uniformly to all requests. See lines 123 and 149:
if not self.kv_transfer_criteria:
return False
# ...
if criteria_type == "prefill_finished":
if request.num_computed_tokens >= request.num_prompt_tokens:
self.transfer_triggered_requests.add(request.request_id)
self._mark_request_for_kv_transfer(...)If prefill_finished is set, all image requests are truncated after prefill — think requests would never get a chance to decode <think> tokens. Conversely, if unset, all requests wait until EOS, slowing down the standard (non-think) path.
Similarly, prompt_expand_func is also pipeline-level — it is called uniformly by the orchestrator during request expansion, with no per-request dispatch.
What would be needed to support per-request switching:
- Move
kv_transfer_criteriafrom a scheduler-level attribute to a per-request attribute — each request would carry its own criteria. - Modify
_check_kv_transfer_criteriato read from therequestobject rather thanself.kv_transfer_criteria. - Have
prompt_expand_funcbranch based on a per-request flag (e.g.,think=Trueinsampling_params.extra_args).
This is a reasonable architectural improvement but is not yet implemented. In the meantime, a separate deploy YAML is required. In practice, bagel_think.yaml is minimal — it only contains two effective lines (base_config: bagel.yaml and pipeline: bagel_think), which tells the system to use a different PipelineConfig that changes the above two fields.
There was a problem hiding this comment.
@natureofnature Maybe we need move kv_transfer_criteria from a scheduler-level attribute to a per-request attribute ?
There was a problem hiding this comment.
Can we only use one yaml for deploy? You can refer to #2383
There was a problem hiding this comment.
Nop, this feature now relay on scheduler, so we may refactor scheduler strategy allowed user use --think as request arguments.
|
can you add a recipe as well? later, we will delete all model related examples and only keep model agnostic t_2_i.py... |
It may difficult for bagel, because bagel don't have chat template, in bagel's example, I need manually create prompts, maybe next pr I can add custom ninja file and remove end2end example. |
|
Fix CI |
…g merge merge_pipeline_deploy silently dropped omni_kv_config, prompt_expand_func, and cfg_kv_collect_func when building StageConfig from the pipeline registry, breaking multi-stage KV transfer for Bagel img2img. - stage_config: propagate omni_kv_config/prompt_expand_func/cfg_kv_collect_func - bagel.yaml: declare input_connectors so transfer config discovers the edge - stage_init_utils: resolve base_config inheritance before parsing connectors - utils: add deploy/ as fallback in resolve_model_config_path Made-with: Cursor Signed-off-by: princepride <wangzhipeng628@gmail.com>
|
The CI failed but seems not my code's problem, can you help approve it? @lishunyang12 |
|
PTAL @xiaohajiayou Can you help check if this pr has the override precedence issue you mentioned? |
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
|
please also fix this bug: https://buildkite.com/vllm/vllm-omni/builds/7509/steps/canvas?sid=019db039-ebbf-4c0a-aa44-9f3c64e165a1&tab=output |
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
| resolved_config_path = config_path or resolve_model_config_path(model) | ||
| return load_omni_transfer_config(resolved_config_path) | ||
| if resolved_config_path is None: | ||
| return None |
There was a problem hiding this comment.
This base_config inheritance change applies to every model, not just bagel. Could it be split into its own infra PR (or at minimum called out in the PR description)? Reviewers scanning a "remove bagel yaml" PR won't expect a generic config-loader change.
There was a problem hiding this comment.
CI tests use overlay YAMLs that inherit from vllm_omni/deploy/bagel.yaml. Without this merge step, the connectors: block defined in the base will not be visible when the overlay is loaded, and the transfer config parser will fail to resolve connector references.
| if os.path.exists(complete_config_path): | ||
| return str(complete_config_path) | ||
|
|
||
| deploy_config_path = PROJECT_ROOT / "vllm_omni" / "deploy" / model_type_str |
There was a problem hiding this comment.
Same scope question — extending resolve_model_config_path to probe vllm_omni/deploy/<model>/ is a generic framework change that benefits every model. Belongs in its own PR or needs a callout in the PR description.
There was a problem hiding this comment.
Without this change, auto-detect cannot find the new location and will fall all the way back to the legacy path.
| engine_args.update(ds.engine_extras) | ||
| if deploy.async_chunk: | ||
| engine_args["async_chunk"] = True | ||
| if ps.omni_kv_config: |
There was a problem hiding this comment.
This omni_kv_config / prompt_expand_func / cfg_kv_collect_func forwarding from StagePipelineConfig to engine_args is needed by bagel
| @pytest.mark.omni | ||
| @pytest.mark.parametrize("tp_size", [1, 2]) | ||
| @hardware_test(res={"cuda": "H100", "rocm": "MI325"}, num_cards=2) | ||
| @pytest.mark.parametrize("tp_size", [1]) |
There was a problem hiding this comment.
Why was tp_size parametrize reduced from [1, 2] to [1] and num_cards from 2 to 1? This isn't a bagel migration change — looks like a CI-resource tweak that snuck in. Either justify or split out.
There was a problem hiding this comment.
Because it will cause merge-test/nightly-test error, I believe current sleep mode implementation has some bug
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Resolve conflicts in docs/user_guide/examples/online_serving/bagel.md and examples/online_serving/bagel/README.md by keeping the restructured --deploy-config docs from this PR and dropping the stage-configs-path references reintroduced by upstream vllm-project#2978. Signed-off-by: princepride <wangzhipeng628@gmail.com>
Resolve conflict in vllm_omni/config/stage_config.py: combine upstream's async_chunk sentinel-default fix (vllm-project#3078) with this PR's new omni_kv_config engine_args propagation. Signed-off-by: princepride <wangzhipeng628@gmail.com>

Purpose
Migrate Bagel model configuration from the legacy
stage_configs/YAML system to the new pipeline-driven auto-detection framework introduced in #2383 and #2915.What this PR does
Delete all legacy Bagel YAML stage configs (8 files):
vllm_omni/model_executor/stage_configs/bagel.yamlvllm_omni/model_executor/stage_configs/bagel_multiconnector.yamlvllm_omni/model_executor/stage_configs/bagel_single_stage.yamlvllm_omni/model_executor/stage_configs/bagel_think.yamlvllm_omni/model_executor/stage_configs/bagel_usp2.yamlvllm_omni/platforms/xpu/stage_configs/bagel.yamltests/e2e/offline_inference/stage_configs/bagel_sharedmemory_ci.yamltests/e2e/offline_inference/stage_configs/bagel_mooncake_ci.yamlAdd new pipeline-driven configs:
vllm_omni/model_executor/models/bagel/pipeline.py— definesBAGEL_PIPELINE(two-stage: Thinker → DiT) andBAGEL_SINGLE_STAGE_PIPELINE(DiT-only) as PythonPipelineConfigdataclassesvllm_omni/deploy/bagel.yaml— default two-stage deploy config with runtime defaultsvllm_omni/deploy/bagel_single_stage.yaml— single-stage deploy config (pipeline: bagel_single_stage)vllm_omni/config/pipeline_registry.pySupport single-stage deployment: Bagel's DiT stage can handle all modalities (text→image, image editing, image understanding, text understanding) independently. A dedicated
BAGEL_SINGLE_STAGE_PIPELINEand deploy YAML enable this mode.Migrate CI tests to Python-based overlays:
bagel,bagel_single_stage, andbagel_mooncakeentries to_CI_OVERLAYSintests/helpers/stage_config.pytest_bagel_text2img.py,test_bagel_img2img.py,test_bagel_understanding.py,test_bagel_lora.py,test_bagel_online.py,test_quantization_fp8.py) to useget_deploy_config_path()instead of legacystage_configs_pathUpdate example scripts and documentation:
examples/offline_inference/bagel/end2end.pyto deriveis_single_stagedynamically from the loaded pipelineBug fixes:
async_chunk: falsein deploy YAMLs to fixValueErrorwhen no async-chunk processor is definedfrom pathlib import Pathimport intest_bagel_lora.pydiff_2x < 160) to accommodate expected larger differences atscale=2.0Test Plan
python3 tests/e2e/offline_inference/test_bagel_text2img.pypython3 tests/e2e/offline_inference/test_bagel_img2img.pypython3 tests/e2e/offline_inference/test_bagel_understanding.pypython3 tests/e2e/offline_inference/test_bagel_lora.pystage_configs_pathbagel_single_stagedeploy configTest Result
All Bagel e2e tests pass with
load_format: dummyCI overlays. The pipeline auto-detection correctly resolves both two-stage and single-stage topologies without any legacy YAML files.Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)