[Feature] Support vae tiling parallel encode by gcanlin · Pull Request #2368 · vllm-project/vllm-omni

gcanlin · 2026-03-31T09:37:30Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Add support for vae encode parallel.

Test Plan

Run Wan2.2 I2V end-to-end on 8 NPUs with Ulysses enabled:

python image_to_video.py \
  --model Wan-AI/Wan2.2-I2V-A14B-Diffusers \
  --image cherry_blossom.jpg \
  --prompt "Cherry blossoms swaying gently in the breeze, petals falling, smooth motion" \
  --negative-prompt "<optional quality filter>" \
  --height 512 \
  --width 768 \
  --num-frames 49 \
  --guidance-scale 4.0 \
  --num-inference-steps 20 \
  --flow-shift 12.0 \
  --fps 16 \
  --output i2v_output.mp4 \
  --enable-layerwise-offload \
  --ulysses-degree 8 \
  --vae-patch-parallel-size 8 \
  --vae-use-tiling

Test Results

Config	Time
without vae parallel	52s
only vae decode parallel	47s
vae decode parallel + vae encode parallel	43s

Main:

wan22_main.mp4

This PR:

wan22_vae_encode_parallel.mp4

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5694af1067

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-31T09:44:05Z

+        for i in range(0, height, self.tile_sample_stride_height):
+            for j in range(0, width, self.tile_sample_stride_width):


Scale tiled-encode split coordinates after patchify

When config.patch_size is set, _encode_distributed patchifies x before dispatching tile work, but encode_tile_split still iterates with self.tile_sample_stride_* in pre-patch coordinates. On patchified inputs this makes tiles much larger than intended (and far fewer), so patch-parallel encode loses most of its parallelism and uses incorrect overlap geometry for seam blending on patchified Wan checkpoints. The decode path already applies patch-size-aware scaling, so encode should mirror that coordinate system.

Useful? React with 👍 / 👎.

hsliuustc0106 · 2026-04-01T08:40:50Z

any accuracy test?

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin · 2026-04-01T11:51:32Z

@Bounty-hunter @wtomin Please help review. Thanks!

wtomin · 2026-04-01T11:53:07Z

Please update .\docs\design\feature\vae_parallel.md and .\docs\user_guide\diffusion\parallelism\vae_patch_parallel.md (if applied);
Can you verify that distributed encode also work for image-to-image models? Give some examples pls.

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin · 2026-04-01T15:22:08Z

Please update .\docs\design\feature\vae_parallel.md and .\docs\user_guide\diffusion\parallelism\vae_patch_parallel.md (if applied);

Can you verify that distributed encode also work for image-to-image models? Give some examples pls.

For 1, I have added the doc for vae encode parallel.

For 2, I think it's better to implement it in a follow-up PR. Will request help from community.

gcanlin · 2026-04-01T15:23:29Z

any accuracy test?

I will enable nightly-test to cover GPU accuracy test. And for NPU, I will execute the same test locally and paste the result.

gcanlin · 2026-04-02T00:35:41Z

>       assert ssim_score >= SSIM_THRESHOLD, (
            f"SSIM below threshold: got {ssim_score:.6f}, expected >= {SSIM_THRESHOLD:.6f}. "
            f"online={online_path} offline={offline_path}"
        )
E       AssertionError: SSIM below threshold: got 0.922084, expected >= 0.940000. online=/workspace/build/buildkite/tests/e2e/accuracy/wan22_i2v/result/rabbit-cf925a4c/online.mp4 offline=/workspace/build/buildkite/tests/e2e/accuracy/wan22_i2v/result/rabbit-cf925a4c/offline.mp4
E       assert 0.922084 >= 0.94
tests/e2e/accuracy/wan22_i2v/test_wan22_i2v_video_similarity.py:668: AssertionError

This PR leads to accuracy regression currently. I will try to fix it.

wtomin · 2026-04-02T07:19:12Z

This PR leads to accuracy regression currently. I will try to fix it.

I think it is expected. Previously vae does not use tiled_encode, now you are using tiled_encode in parallel mode, is that right?

lishunyang12

left a few comments, mostly minor

david6666666 · 2026-04-03T02:33:20Z

>       assert ssim_score >= SSIM_THRESHOLD, (
            f"SSIM below threshold: got {ssim_score:.6f}, expected >= {SSIM_THRESHOLD:.6f}. "
            f"online={online_path} offline={offline_path}"
        )
E       AssertionError: SSIM below threshold: got 0.922084, expected >= 0.940000. online=/workspace/build/buildkite/tests/e2e/accuracy/wan22_i2v/result/rabbit-cf925a4c/online.mp4 offline=/workspace/build/buildkite/tests/e2e/accuracy/wan22_i2v/result/rabbit-cf925a4c/offline.mp4
E       assert 0.922084 >= 0.94
tests/e2e/accuracy/wan22_i2v/test_wan22_i2v_video_similarity.py:668: AssertionError

This PR leads to accuracy regression currently. I will try to fix it.

Has this accuracy issue been resolved?

gcanlin · 2026-04-03T02:41:14Z

Has this accuracy issue been resolved?

I found that the current nightly accuracy test is only for the comparison between offline and online inference but not main and this PR. So it's wired. It looks more like I didn't align offline and online parallel config in the test.

david6666666 · 2026-04-03T03:05:43Z

maybe we can add -buildkite-agent artifact upload to see generated videos and judged by human.

wtomin

LGTM.

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin · 2026-04-06T13:40:20Z

will this help other models?

I think yes. But other models also need to do some minor adaption.

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

Signed-off-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> (cherry picked from commit e771842) Signed-off-by: David Chen <530634352@qq.com>

) Signed-off-by: gcanlin <canlinguosdu@gmail.com> Signed-off-by: David Chen <530634352@qq.com> Co-authored-by: Canlin Guo <canlinguosdu@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

Signed-off-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

- Add DistributedAutoencoderKLHunyuanVideo with overlapping tile split/exec/merge for both encode and decode paths - encode: encode_tile_split/exec/merge + _encode override with broadcast_result=True - decode: tile_split/exec/merge + decode override with broadcast_result=False - Fix distributed_vae_executor: use self.rank < pp_size instead of pp_size <= self.world_size - Wire DistributedAutoencoderKLHunyuanVideo into T2V and I2V pipelines Encode parallel follows the pattern from vllm-project#2368 (Wan VAE encode parallel). Signed-off-by: daixinning <daixinning@163.com>

- Add DistributedAutoencoderKLHunyuanVideo with overlapping tile split/exec/merge for distributed decode - decode: tile_split/exec/merge + decode override with broadcast_result=False - Fix distributed_vae_executor: use self.rank < pp_size instead of pp_size <= self.world_size - Wire DistributedAutoencoderKLHunyuanVideo into T2V and I2V pipelines Decode parallel follows the pattern from vllm-project#2368 (Wan VAE encode parallel). Signed-off-by: daixinning <daixinning@163.com>

- Add DistributedAutoencoderKLHunyuanVideo with overlapping tile split/exec/merge for distributed decode - decode: tile_split/exec/merge always used (small latents included, consistent with base class behavior; base class patch_split is incompatible with 5D latents) - Fix distributed_vae_executor: use self.rank < pp_size instead of pp_size <= self.world_size - Wire DistributedAutoencoderKLHunyuanVideo into T2V and I2V pipelines - Add unit tests for tile_split, tile_merge, and blend logic Decode parallel follows the pattern from vllm-project#2368 (Wan VAE encode parallel). Signed-off-by: daixinning <daixinning@163.com>

Signed-off-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Signed-off-by: bob-021206 <binyan_github@163.com>

Signed-off-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

[Feature] Support vae tiling parallel encode

5694af1

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin requested a review from hsliuustc0106 as a code owner March 31, 2026 09:37

chatgpt-codex-connector Bot reviewed Mar 31, 2026

View reviewed changes

gcanlin mentioned this pull request Mar 31, 2026

[RFC]: Wan2.2 Performance Optimization Roadmap on vLLM-Omni #1355

Open

1 task

david6666666 mentioned this pull request Apr 1, 2026

[RFC][0.20.0]: Qwen-Image、Qwen-Image-Layered、Qwen-Image-Edit-Plus、Wan2.2 Production-grade Feature Monitoring JiusiServe/vllm-omni#181

Closed

1 task

hsliuustc0106 requested review from SamitHuang, ZJY0516 and wtomin April 1, 2026 08:39

gcanlin added 3 commits April 1, 2026 09:38

Merge branch 'main' into vae-encode-parallel

9a8904c

add vae parallel in acc test

1c7b20f

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

fix lint

5b13f53

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

wtomin reviewed Apr 1, 2026

View reviewed changes

Comment thread vllm_omni/diffusion/distributed/autoencoders/autoencoder_kl_wan.py Outdated

wtomin reviewed Apr 1, 2026

View reviewed changes

Comment thread tests/e2e/accuracy/wan22_i2v/test_wan22_i2v_video_similarity.py

gcanlin added 2 commits April 1, 2026 15:11

fix comments

47c247e

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

add doc

1560dcc

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin added the nightly-test label to trigger buildkite nightly test CI label Apr 1, 2026

Bounty-hunter reviewed Apr 2, 2026

View reviewed changes

Comment thread vllm_omni/diffusion/distributed/autoencoders/autoencoder_kl_wan.py Outdated

Merge branch 'main' into vae-encode-parallel

8f2251d

hsliuustc0106 added the ready label to trigger buildkite CI label Apr 2, 2026

lishunyang12 reviewed Apr 2, 2026

View reviewed changes

Comment thread vllm_omni/diffusion/distributed/autoencoders/autoencoder_kl_wan.py Outdated

Comment thread vllm_omni/diffusion/distributed/autoencoders/autoencoder_kl_wan.py Outdated

Comment thread vllm_omni/diffusion/distributed/autoencoders/autoencoder_kl_wan.py

wtomin reviewed Apr 6, 2026

View reviewed changes

Comment thread tests/diffusion/distributed/test_autoencoder_kl_wan_encode.py Outdated

wtomin reviewed Apr 6, 2026

View reviewed changes

Comment thread docs/design/feature/vae_parallel.md Outdated

wtomin approved these changes Apr 6, 2026

View reviewed changes

gcanlin added 5 commits April 6, 2026 13:23

fix comments

d4f8149

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

docs

0c1025c

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

revert docs

aa19586

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

Merge branch 'main' into vae-encode-parallel

79b36b3

merge main

235d1e9

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin enabled auto-merge (squash) April 6, 2026 13:39

fix docs

b7ac250

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin merged commit e771842 into vllm-project:main Apr 6, 2026
7 of 8 checks passed

skf-1999 pushed a commit to Semmer2/vllm-omni that referenced this pull request Apr 7, 2026

[Feature] Support vae tiling parallel encode (vllm-project#2368)

da78812

Signed-off-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

david6666666 mentioned this pull request Apr 8, 2026

[Feature]Adding decode vae patch parallel supports for LTX-2 #2135

Open

5 tasks

wtomin mentioned this pull request Apr 8, 2026

[RFC]: Continuous Diffusion Model Acceleration Support #1217

Open

1 task

vraiti pushed a commit to vraiti/vllm-omni that referenced this pull request Apr 9, 2026

[Feature] Support vae tiling parallel encode (vllm-project#2368)

9cb1285

Signed-off-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

gcanlin mentioned this pull request Apr 11, 2026

[Feat][HunyuanVideo-1.5]Support vae-patch-parallel #2418

Open

lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026

[Feature] Support vae tiling parallel encode (vllm-project#2368)

0824fa6

Signed-off-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026

[Feature] Support vae tiling parallel encode (vllm-project#2368)

7a4d108

Signed-off-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

		for i in range(0, height, self.tile_sample_stride_height):
		for j in range(0, width, self.tile_sample_stride_width):

Conversation

gcanlin commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Results

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

gcanlin Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Apr 1, 2026

Uh oh!

gcanlin commented Apr 1, 2026

Uh oh!

wtomin commented Apr 1, 2026

Uh oh!

Uh oh!

Uh oh!

gcanlin commented Apr 1, 2026

Uh oh!

gcanlin commented Apr 1, 2026

Uh oh!

gcanlin commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

wtomin commented Apr 2, 2026

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

david6666666 commented Apr 3, 2026

Uh oh!

gcanlin commented Apr 3, 2026

Uh oh!

david6666666 commented Apr 3, 2026

Uh oh!

Uh oh!

Uh oh!

wtomin left a comment

Choose a reason for hiding this comment

Uh oh!

gcanlin commented Apr 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

gcanlin commented Mar 31, 2026 •

edited

Loading

gcanlin commented Apr 2, 2026 •

edited

Loading