Skip to content

[diffusion] fix Z-Image SP sharding for portrait and padded resolutions#21042

Merged
BBuf merged 6 commits intosgl-project:mainfrom
Ratish1:fix/zimage-sp-resolution-sharding
Mar 24, 2026
Merged

[diffusion] fix Z-Image SP sharding for portrait and padded resolutions#21042
BBuf merged 6 commits intosgl-project:mainfrom
Ratish1:fix/zimage-sp-resolution-sharding

Conversation

@Ratish1
Copy link
Copy Markdown
Collaborator

@Ratish1 Ratish1 commented Mar 20, 2026

Motivation

Fix Z-Image-Turbo sequence-parallel sharding for portrait and padded resolutions when Ulysses/SP is enabled. Fixes #21021

Z-Image currently works for some resolutions such as 1024x1024 and 1280x720, but produces corrupted results for others such as 720x1280 and 720x720 when sharding is enabled. This PR fixes the Z-Image-specific SP path so it preserves native image geometry during denoising.

Z-Image patchifies latents in native F/H/W order, but the current Z-Image SP path mutates the image geometry before
denoising:

  • it swaps H/W so sharding always happens along the larger spatial axis
  • it pads fake spatial rows so each rank gets the same local shard height

Modifications

  • Remove the Z-Image H/W swap path and keep sharding in native geometry.
  • Build a per-request SP plan that chooses either native height or native width splitting based on lower token-padding cost.
  • Shard rectangular latent slices directly along the chosen native axis.
  • Gather uneven local shards explicitly with all_gather, then crop each rank’s contribution back to its true size before
    concatenation.
  • Pass the Z-Image batch through the SP gather path so final gather can use the per-request shard plan.
  • Pad only the local patch-token sequence in the Z-Image DiT path to the aligned target length required by Ulysses.
  • Keep the shared gather_latents_for_sp(..., batch=None) signature compatible for other pipelines.

Accuracy Tests

I tested with 2 commands:
sglang generate --model-path Tongyi-MAI/Z-Image-Turbo --seed 42 --prompt "A crowded beach" --height 720 --width 720 --num-inference-steps 9 --ulysses-degree 2 --num-gpus 2 --guidance-scale 4.0

sglang generate --model-path Tongyi-MAI/Z-Image-Turbo --seed 42 --prompt "A crowded beach" --height 720 --width 1280 --num-inference-steps 9 --ulysses-degree 2 --num-gpus 2 --guidance-scale 4.0

Before PR:
image

image

After PR:
image
image

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions bot added the diffusion SGLang Diffusion label Mar 20, 2026
@BBuf
Copy link
Copy Markdown
Collaborator

BBuf commented Mar 22, 2026

This PR fixes the portrait / padded-resolution / W-shard path, but I don't see a regression test that actually exercises that path. The current 2-GPU Z-Image test still uses the default square case, so it likely never hits the new branch. Could we add one portrait or padded-resolution SP test here?

@mickqian
Copy link
Copy Markdown
Collaborator

please wait for #20679

@Ratish1
Copy link
Copy Markdown
Collaborator Author

Ratish1 commented Mar 23, 2026

/tag-and-rerun-ci

@Ratish1
Copy link
Copy Markdown
Collaborator Author

Ratish1 commented Mar 23, 2026

/rerun-failed-ci

@BBuf BBuf merged commit 2b1d3c9 into sgl-project:main Mar 24, 2026
151 of 161 checks passed
@Ratish1 Ratish1 deleted the fix/zimage-sp-resolution-sharding branch March 24, 2026 05:44
adityavaid pushed a commit to adityavaid/sglang that referenced this pull request Mar 24, 2026
…ns (sgl-project#21042)

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
adityavaid pushed a commit to adityavaid/sglang that referenced this pull request Mar 24, 2026
…ns (sgl-project#21042)

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
0-693 pushed a commit to 0-693/sglang that referenced this pull request Mar 25, 2026
…ns (sgl-project#21042)

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
johnnycxm pushed a commit to johnnycxm/sglang that referenced this pull request Mar 25, 2026
…ns (sgl-project#21042)

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
johnnycxm pushed a commit to johnnycxm/sglang that referenced this pull request Mar 25, 2026
…ns (sgl-project#21042)

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026
…ns (sgl-project#21042)

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

diffusion SGLang Diffusion run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] [Diffusion] Z-Image-Turbo only works with some resolutions when sharding

3 participants