[Bugfix] align Bagel diffusion parallel config docs and stage YAMLs by xiaohajiayou · Pull Request #2636 · vllm-project/vllm-omni

xiaohajiayou · 2026-04-09T08:24:41Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Fix #2635
Bagel docs and several Bagel stage YAMLs were still using or describing diffusion-stage tensor parallelism with top-level tensor_parallel_size, which is inconsistent with the current diffusion runtime path.
This PR aligns Bagel multi-stage docs and stage configs with the current diffusion parallel config path.

In multi-stage omni models:

LLM stages use top-level engine args such as engine_args.tensor_parallel_size
Diffusion stages use engine_args.parallel_config.*

Changes

Update Bagel diffusion stage YAMLs to use engine_args.parallel_config.tensor_parallel_size
Keep Bagel LLM stage TP config unchanged (engine_args.tensor_parallel_size)
Update Bagel online/offline docs to explicitly distinguish:
- LLM stage TP config
- diffusion stage TP config
Normalize the Bagel Ulysses stage config to keep diffusion parallel settings under parallel_config

Files changed

vllm_omni/model_executor/stage_configs/bagel.yaml
vllm_omni/model_executor/stage_configs/bagel_multiconnector.yaml
vllm_omni/model_executor/stage_configs/bagel_usp2.yaml
vllm_omni/platforms/xpu/stage_configs/bagel.yaml
docs/user_guide/examples/online_serving/bagel.md
docs/user_guide/examples/offline_inference/bagel.md

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

chatgpt-codex-connector · 2026-04-09T08:24:51Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

Signed-off-by: xiaohajiayou <923390377@qq.com>

princepride

LGTM

ianliuy

LGTM overall the YAML fixes are correct and the code path confirms top-level tensor_parallel_size was silently dropped by OmniDiffusionConfig.from_kwargs(). One minor nit below.

ianliuy · 2026-04-12T01:41:30Z

@@ -35,6 +35,25 @@ For larger models or multi-GPU environments, you can enable Tensor Parallelism (

 1. **Modify Stage Config**: Create or modify a stage configuration yaml (e.g., [`bagel.yaml`](https://github.com/vllm-project/vllm-omni/tree/main/vllm_omni/model_executor/stage_configs/bagel.yaml)). Set `tensor_parallel_size` to `2` (or more) and update `devices` to include multiple GPU IDs (e.g., `"0,1"`).


Nit: This line still says "Set tensor_parallel_size to 2 (or more)" without distinguishing LLM vs diffusion stage, which is the whole point of this PR. Consider updating to:

Set the appropriate TP config field for your stage type (see details below) and update devices to include multiple GPU IDs.

lishunyang12

Review: [Bugfix] align Bagel diffusion parallel config docs and stage YAMLs

YAML changes (stage configs) -- looks good

The YAML changes across all four files are correct and consistent:

bagel.yaml, bagel_multiconnector.yaml, and xpu/bagel.yaml all move tensor_parallel_size from a top-level engine_args field into engine_args.parallel_config.tensor_parallel_size for the diffusion (DiT) stage.
bagel_usp2.yaml correctly moves tensor_parallel_size: 1 inside the existing parallel_config block alongside ulysses_degree.
LLM stage configs are left unchanged, which is the correct behavior.

Documentation -- needs a fix in online_serving/bagel.md

Issue: stale text in online_serving/bagel.md (line 36)

The PR inserts the new LLM-vs-diffusion explanation block after the existing step-1 text, but that existing text was not updated. Currently line 36 still reads:

Modify Stage Config: ... Set tensor_parallel_size to 2 (or more) and update devices to include multiple GPU IDs (e.g., "0,1").

This gives the old undifferentiated advice (tensor_parallel_size at top level) and contradicts the new distinction introduced immediately below it. The code block on lines 38-44 (the pre-existing LLM-style snippet) also lacks any label like "Example for the LLM stage" to match the structure of the newly inserted diffusion example.

Suggestion: Either (a) rewrite step 1 to be a generic intro (e.g., "Modify Stage Config: Create or modify a stage configuration yaml ... See below for TP config details for each stage type.") and remove the now-unlabeled code block, or (b) replace the existing step 1 + code block entirely with the new structured text -- the same way the offline doc was cleaned up. The offline doc (offline_inference/bagel.md) handles this cleanly; the online doc should match.

Minor: offline_inference/bagel.md

The new block starting with "In multi-stage omni models..." is inserted right after the intro paragraph with no transition. Consider adding a brief connecting sentence or a blank line + heading to improve readability. This is a nit, not a blocker.

Summary

YAML changes are correct. The offline doc update is clean. The online serving doc has a leftover stale instruction that should be updated for consistency. Requesting a small fix there before merge.

lishunyang12 · 2026-04-22T19:58:05Z

What is this relationship with #2936?

xiaohajiayou · 2026-04-26T05:15:59Z

Closing this since #2936 already completed this refactor.

xiaohajiayou requested a review from hsliuustc0106 as a code owner April 9, 2026 08:24

xiaohajiayou changed the title ~~Docs bagel parallel config~~ [Bugfix] align Bagel diffusion parallel config docs and stage YAMLs Apr 9, 2026

xiaohajiayou force-pushed the docs-bagel-parallel-config branch from 04fb23d to be1c649 Compare April 9, 2026 08:28

hsliuustc0106 requested a review from princepride April 9, 2026 08:28

[bugfix] align bagel diffusion parallel config docs and yaml

e861d03

Signed-off-by: xiaohajiayou <923390377@qq.com>

xiaohajiayou force-pushed the docs-bagel-parallel-config branch from be1c649 to e861d03 Compare April 9, 2026 08:32

princepride enabled auto-merge (squash) April 9, 2026 08:43

princepride approved these changes Apr 9, 2026

View reviewed changes

lishunyang12 disabled auto-merge April 9, 2026 13:16

Merge branch 'main' into docs-bagel-parallel-config

7a6aeea

ianliuy mentioned this pull request Apr 12, 2026

[Bug]: bagel model still use inconsistent parallel config fields in docs/YAML #2635

Closed

1 task

ianliuy reviewed Apr 12, 2026

View reviewed changes

lishunyang12 reviewed Apr 16, 2026

View reviewed changes

xiaohajiayou closed this Apr 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] align Bagel diffusion parallel config docs and stage YAMLs#2636

[Bugfix] align Bagel diffusion parallel config docs and stage YAMLs#2636
xiaohajiayou wants to merge 2 commits intovllm-project:mainfrom
xiaohajiayou:docs-bagel-parallel-config

xiaohajiayou commented Apr 9, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot commented Apr 9, 2026

Uh oh!

princepride left a comment

Uh oh!

ianliuy left a comment

Uh oh!

ianliuy Apr 12, 2026

Uh oh!

lishunyang12 left a comment

Uh oh!

lishunyang12 commented Apr 22, 2026

Uh oh!

xiaohajiayou commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		@@ -35,6 +35,25 @@ For larger models or multi-GPU environments, you can enable Tensor Parallelism (

		1. Modify Stage Config: Create or modify a stage configuration yaml (e.g., [`bagel.yaml`](https://github.com/vllm-project/vllm-omni/tree/main/vllm_omni/model_executor/stage_configs/bagel.yaml)). Set `tensor_parallel_size` to `2` (or more) and update `devices` to include multiple GPU IDs (e.g., `"0,1"`).

Conversation

xiaohajiayou commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Changes

Files changed

Uh oh!

chatgpt-codex-connector Bot commented Apr 9, 2026

Uh oh!

princepride left a comment

Choose a reason for hiding this comment

Uh oh!

ianliuy left a comment

Choose a reason for hiding this comment

Uh oh!

ianliuy Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Review: [Bugfix] align Bagel diffusion parallel config docs and stage YAMLs

YAML changes (stage configs) -- looks good

Documentation -- needs a fix in online_serving/bagel.md

Summary

Uh oh!

lishunyang12 commented Apr 22, 2026

Uh oh!

xiaohajiayou commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xiaohajiayou commented Apr 9, 2026 •

edited

Loading