[Perf][Bagel] Avoid per-step device syncs in Bagel img2img by natureofnature · Pull Request #3987 · vllm-project/vllm-omni

natureofnature · 2026-05-29T09:31:19Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

AR

The Bagel img2img path forces implicit GPU→CPU synchronizations in two hot loops, leaving the GPU idle while the CPU stalls:

DIT

AR stage (model_executor/models/bagel/bagel.py) — the MoT routing calls vae_mask.any() / ~vae_mask.any() (each is an implicit .item() device sync), and _adjust_positions_for_img2img indexes the CUDA positions tensor element-by-element in a Python loop (one sync per token).
DiT stage (diffusion/models/bagel/bagel_transformer.py) — the denoise loops iterate for t in timesteps where timesteps is a CUDA tensor (so t is a 0-d tensor → building the timestep tensor and the cfg_interval comparison sync eachstep), and attention uses Python max()/sum() over length tensors (one sync per element).

Fix

Cache the VAE-mask occupancy once per request as plain bools; the per-layer routing reads the bools instead of calling .any().
Copy positions to the host once before the boundary loop.
Iterate timesteps.tolist() so t is a Python float (timesteps[i] tensor is still used for the scheduler).
Use tensor reductions (query_lens.max(), query_lens.sum()) instead of Python max()/sum() over tensors.

Test Plan

Correctness: ran i2i before/after on the same input + seed at 256² and 1024²; generated PNGs are md5-identical between baseline and patched (changes are numerically identical, not approximations).
Performance: split serving (stage-0 AR / stage-1 DiT on separate GPUs, RDMA connector), 1 warmup + 5 measured i2i requests per config; comparedmetrics.stage_durations (stage_0_gen_ms / stage_1_gen_ms) baseline vs patched.
Smoke: service boots and serves t2i + i2i (HTTP 200) with the patched code.
Prompt: Change the color to blue

Test Result

Split serving (RDMA connector, 15 steps, i2i), stage_0 = AR, stage_1 = DiT, All changes are numerically identical — outputs are byte-for-byte unchanged (md5 of generated images matches before/after at both 256² and 1024²).
mean of 5 measured runs (H800, AR/DIT disaggregation):

resolution	stage	before	after	delta
256×256	AR	1064 ms	890 ms	−16.4%
256×256	DiT	755 ms	724 ms	−4.1%
1024×1024	AR	2265 ms	1969 ms	−13.1%
1024×1024	DiT	7714 ms	7362 ms	−4.6%

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

chatgpt-codex-connector · 2026-05-29T09:31:26Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

natureofnature · 2026-05-29T09:34:58Z

@princepride PTAL

Cache the VAE-mask flags and iterate host-side values so the AR MoT routing and the DiT denoise loop stop forcing GPU->CPU syncs on every layer/step. Signed-off-by: natureofnature <wzliu@connect.hku.hk>

Gaohan123

LGTM. Thanks

…ect#3987) Signed-off-by: natureofnature <wzliu@connect.hku.hk>

…ect#3987) Signed-off-by: natureofnature <wzliu@connect.hku.hk> Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

natureofnature requested review from Isotr0py, RuixiangMa, SamitHuang, ZJY0516, ZeldaHuang, david6666666, linyueqian, princepride, wtomin and yuanheng-zhao as code owners May 29, 2026 09:31

Avoid per-step device syncs in Bagel img2img

8ddca2e

Cache the VAE-mask flags and iterate host-side values so the AR MoT routing and the DiT denoise loop stop forcing GPU->CPU syncs on every layer/step. Signed-off-by: natureofnature <wzliu@connect.hku.hk>

natureofnature force-pushed the bagel-i2i-device-sync branch from 1e68110 to 8ddca2e Compare May 29, 2026 09:52

hsliuustc0106 added ready label to trigger buildkite CI merge-test label to trigger buildkite merge test CI labels Jun 1, 2026

Merge branch 'main' into bagel-i2i-device-sync

7289862

Gaohan123 added this to the v0.22.0 milestone Jun 1, 2026

Merge branch 'main' into bagel-i2i-device-sync

7c23099

Gaohan123 approved these changes Jun 2, 2026

View reviewed changes

Gaohan123 merged commit 1fb423e into vllm-project:main Jun 2, 2026
7 of 8 checks passed

86MaxCao pushed a commit to 86MaxCao/vllm-omni that referenced this pull request Jun 4, 2026

[Perf][Bagel] Avoid per-step device syncs in Bagel img2img (vllm-proj…

dcc199f

…ect#3987) Signed-off-by: natureofnature <wzliu@connect.hku.hk>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Perf][Bagel] Avoid per-step device syncs in Bagel img2img#3987

[Perf][Bagel] Avoid per-step device syncs in Bagel img2img#3987
Gaohan123 merged 3 commits into
vllm-project:mainfrom
natureofnature:bagel-i2i-device-sync

natureofnature commented May 29, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot commented May 29, 2026

Uh oh!

natureofnature commented May 29, 2026

Uh oh!

Gaohan123 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

natureofnature commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

AR

DIT

Fix

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot commented May 29, 2026

Uh oh!

natureofnature commented May 29, 2026

Uh oh!

Gaohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

natureofnature commented May 29, 2026 •

edited

Loading