Support VAE parallel for Bagel by lsyyysky · Pull Request #3982 · vllm-project/vllm-omni

lsyyysky · 2026-05-29T09:10:47Z

Purpose

Add VAE Patch Parallelism support for the Bagel (BAGEL-7B-MoT) diffusion model.

This PR lets Bagel split the latent into spatial tiles and distribute them across the DiT process group, so each rank only materializes the activations for its own tiles instead of the whole image — lowering per-GPU peak memory at high resolution.

Key points:

Introduce DistributedAutoEncoder(AutoEncoder, DistributedVaeMixin) in vllm_omni/diffusion/models/bagel/autoencoder.py, implementing split / exec / merge for both decode and encode (with overlap blending to avoid seams).
BagelPipeline now instantiates DistributedAutoEncoder so the DiT stage can run distributed e READMEs (scope, requirements, deploy YAML / CLI examples, verification via startup logs).

Scope:

Topology	VAE patch parallel
Single-stage (DiT only)	Supported on stage 0 (`BagelPipeline` + `DistributedAutoEncoder`)
Two-stage	Supported on stage 1 (DiT) only; stage 0 (Thinker) uses the encoder-only VAE and is unrelated

Test Plan

Hardware: 2× GPU, BAGEL-7B-MoT at /data/Bagel/BAGEL-7B-MoT.

End-to-end (single-stage DiT, text2img, 1024×1024, 20 steps, seed=42) — compare tensor_parallel_size=2 only vs tensor_parallel_size=2 + vae_patch_parallel_size=2 (vae_use_tiling=true). Metrics: per-request peak GPU memory (Peak GPU memory (this request)) and generation latency (stage_0_gen_ms).

Single-stage deploy used for both runs (only vae_patch_parallel_size / vae_use_tiling differ):
```
pipeline: bagel_single_stage
stages:
  - stage_id: 0
    enforce_eager: true
    trust_remote_code: true
    devices: "0,1"
    vae_use_tiling: true            # VAE PP run only
    parallel_config:
      tensor_parallel_size: 2
      vae_patch_parallel_size: 2    # 1 for the TP-only baseline
```
Correctness: confirm Bagel VAE decode running with distributed executor appears in logs when enabled, and generated images are valid.

Test Result

End-to-end (1024×1024, 20 steps) — inference phase, rank0(A100)

Config	TP	VAE PP	Latency (s)	Peak reserved	Peak allocated	vs TP-only
TP=2 only	2	1 (off)	16.63	20.0 GB	16.88 GB	baseline
TP=2 + VAE PP=2	2	2 (on)	16.63	16.9 GB	16.15 GB	-15%

The benefit grows with resolution: negligible at 512×512 (Transformer-bound), ~3 GB at 1024×1024 end-to-end

chatgpt-codex-connector · 2026-05-29T09:10:54Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

Signed-off-by: siyuan.lei <siyuanlei37@gmail.com>

RuixiangMa · 2026-05-29T16:05:08Z

            id="parallel_hsdp_2",
            marks=HSDP_2_FEATURE_MARKS,
        ),
+        # Tensor Parallelism (TP) + VAE Patch Parallelism (size=2)


The new TP + VAE-PP setup is not stage-local in deploy-config mode, so --tensor-parallel-size 2 also leaks into stage 0 while that stage is still pinned to devices: "0"

RuixiangMa · 2026-05-29T16:09:53Z

+                "tile_latent_stride_height": tile_latent_stride_height,
+                "tile_latent_stride_width": tile_latent_stride_width,
+            },
+            output_dtype=x.dtype,


The distributed encode() path uses x.dtype for gather/broadcast buffers, so Bagel img2img can encode latents under autocast and still repack them into float32 buffers unnecessarily

Signed-off-by: siyuan.lei <siyuanlei37@gmail.com>

Signed-off-by: siyuan.lei <siyuanlei37@gmail.com> Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

lsyyysky requested review from Gaohan123, Isotr0py, RuixiangMa, SamitHuang, ZJY0516, david6666666, hsliuustc0106, princepride, wtomin, yenuo26 and ywang96 as code owners May 29, 2026 09:10

lsyyysky mentioned this pull request May 29, 2026

[RFC]: Continuous Diffusion Model Acceleration Support #1217

Open

1 task

lsyyysky added 2 commits May 29, 2026 09:23

[Feat] support VAE parallel for Bagel

c1779e1

Signed-off-by: siyuan.lei <siyuanlei37@gmail.com>

fix tests for Bagel VAE Patch Parallelism

3e09f80

Signed-off-by: siyuan.lei <siyuanlei37@gmail.com>

lsyyysky force-pushed the main branch from bf56139 to 3e09f80 Compare May 29, 2026 09:47

RuixiangMa reviewed May 29, 2026

View reviewed changes

lsyyysky added 2 commits June 2, 2026 06:59

fix issue

0c5a44a

Signed-off-by: siyuan.lei <siyuanlei37@gmail.com>

Merge branch 'vllm-project:main' into main

793efd0

princepride approved these changes Jun 2, 2026

View reviewed changes

princepride enabled auto-merge (squash) June 2, 2026 10:24

princepride added the ready label to trigger buildkite CI label Jun 2, 2026

princepride merged commit bd37f3c into vllm-project:main Jun 2, 2026
8 checks passed

86MaxCao pushed a commit to 86MaxCao/vllm-omni that referenced this pull request Jun 4, 2026

Support VAE parallel for Bagel (vllm-project#3982)

2e35cdb

Signed-off-by: siyuan.lei <siyuanlei37@gmail.com>

akshatvishu pushed a commit to akshatvishu/vllm-omni that referenced this pull request Jun 13, 2026

Support VAE parallel for Bagel (vllm-project#3982)

7cc9162

Signed-off-by: siyuan.lei <siyuanlei37@gmail.com> Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support VAE parallel for Bagel #3982

Support VAE parallel for Bagel #3982
princepride merged 4 commits into
vllm-project:mainfrom
lsyyysky:main

lsyyysky commented May 29, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot commented May 29, 2026

Uh oh!

RuixiangMa May 29, 2026

Uh oh!

lsyyysky Jun 2, 2026

Uh oh!

RuixiangMa May 29, 2026

Uh oh!

lsyyysky Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lsyyysky commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

End-to-end (1024×1024, 20 steps) — inference phase, rank0(A100)

Uh oh!

chatgpt-codex-connector Bot commented May 29, 2026

Uh oh!

RuixiangMa May 29, 2026

Choose a reason for hiding this comment

Uh oh!

lsyyysky Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

RuixiangMa May 29, 2026

Choose a reason for hiding this comment

Uh oh!

lsyyysky Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lsyyysky commented May 29, 2026 •

edited

Loading