[diffusion] fix: fix accuracy for some image models by mickqian · Pull Request #20679 · sgl-project/sglang

mickqian · 2026-03-16T08:56:31Z

Motivation

Modifications

Qwen-Image: align CFG with official diffusers by introducing true_cfg_scale, enabling CFG when tru e_cfg_scale > 1, and matching the official true-CFG norm rescale.
Qwen-Image-Edit: respect negative-image, so the uncond branch really uses negative_prompt.
Qwen-Image-Edit: keep Qwen2.5-VL vision rotary frequencies in fp32(aligned with diffusers) to reduce encoder drift
Qwen-Image-Edit with ulysses-degree=2: shard rope of condition-image together with rope of noisy-image, and build zero_cond_t modulation indices from local SP sequence lengths.
Z-Image: fix prompt/tokenization and dtype by using rendered chat-template prompts, bf16 text encoding, and fp32 latent sampling/scheduler state.
Z-Image with SP/Ulysses: stop sharding caption tokens (condition shouldn't be sharded), keep caption as a replicated suffix in joint attention, and fix the local/global RoPE offsets so multi-GPU stays aligned with single-GPU

Repro

Qwen-Image

$ sglang generate \
  --model-path Qwen/Qwen-Image \
  --attention-backend torch_sdpa \
  --prompt 'A coffee shop entrance features a chalkboard sign reading "Qwen Coffee  $2 per cup," with a neon
light beside it displaying "通义千问". Next to it hangs a poster showing a beautiful Chinese woman, and beneath
the poster is written "π≈3.1415926-53589793-23846264-33832795-02384197". Ultra HD, 4K, cinematic composition,
Ultra HD, 4K, cinematic composition.' \
  --negative-prompt " " \
  --width 1664 \
  --height 928 \
  --num-inference-steps 50 \
  --true-cfg-scale 4.0 \
  --seed 42 \
  --save-output

prompt = """A coffee shop entrance features a chalkboard sign reading "Qwen Coffee  $2 per cup," with a neon light beside it displaying "通义千问". Next to it hangs a poster showing a beautiful Chinese woman, and beneath the poster is
written "π≈3.1415926-53589793-23846264-33832795-02384197". Ultra HD, 4K, cinematic composition"""

pipe = DiffusionPipeline.from_pretrained("Qwen/Qwen-Image", torch_dtype=torch.bfloat16).to("cuda")
image = pipe(
    prompt=prompt,
    negative_prompt=" ",
    width=1664,
    height=928,
    num_inference_steps=50,
    true_cfg_scale=4.0,
    generator=torch.Generator(device="cuda").manual_seed(42),
).images[0]
  image.save("/tmp/triplet_outputs/qwen_image_diffusers.png")

before this branch	this branch	diffusers

with --ulysses-degree 2:

Qwen-Image-Edit

$ sglang generate --model-path=Qwen/Qwen-Image-Edit-2511 --prompt="Convert 2D style to 3D style" --image-path="https://github.com/lm-sys/lm-sys.github.io/releases/download/test/TI2I_Qwen_Image_Edit_Input.jpg" --save-output --width=1536 --height=1024

before this branch	this branch	diffusers

abs_diff (1-gpu vs diffusers):

with --ulysses-degree 2:

Z-Image-Turbo

$ sglang generate \
  --model-path Tongyi-MAI/Z-Image-Turbo \
  --prompt 'Young Chinese woman in red Hanfu, intricate embroidery. Impeccable makeup, red floral forehead pattern. Elaborate high bun, golden phoenix headdress, red flowers, beads. Holds round folding fan with lady, trees, bird. Neon lightning-bolt lamp (⚡️), bright yellow glow, above extended left palm. Soft-lit outdoor night background, silhouetted tiered pagoda (西安大雁塔), blurred colorful distant lights.' \
  --seed 42 \
  --height 1024 \
  --width 1024 \
  --num-inference-steps 9 \
  --guidance-scale 0.0

import torch
from diffusers import ZImagePipeline

prompt = """Young Chinese woman in red Hanfu, intricate embroidery. Impeccable makeup, red floral forehead pattern.
Elaborate high bun, golden phoenix headdress, red flowers, beads. Holds round folding fan with lady, trees, bird. Neon
lightning-bolt lamp (⚡️), bright yellow glow, above extended left palm. Soft-lit outdoor night background, silhouetted
tiered pagoda (西安大雁塔), blurred colorful distant lights."""

pipe = ZImagePipeline.from_pretrained(
    "Tongyi-MAI/Z-Image-Turbo",
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=False,
).to("cuda")
image = pipe(
    prompt=prompt,
    height=1024,
    width=1024,
    num_inference_steps=9,
    guidance_scale=0.0,
    generator=torch.Generator("cuda").manual_seed(42),
).images[0]
image.save("/tmp/triplet_outputs/zimage_turbo_diffusers.png")

before this branch	this branch	diffusers

with --ulysses-degree=2:

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

gemini-code-assist · 2026-03-16T08:56:42Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a specific issue within the qwen-image model's implementation, ensuring that sequence prefixes are correctly managed during the forward pass. This fix is essential for the model's stability and accurate processing of multimodal inputs, particularly in scenarios involving text sequence lengths.

Highlights

Qwen-Image Model Fix: Corrected the forward method in the qwen_image.py file by adding the num_replicated_prefix argument, set to seq_len_txt, to ensure proper handling of sequence prefixes for multimodal processing.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

python/sglang/multimodal_gen/runtime/models/dits/qwen_image.py
- Added num_replicated_prefix argument to a function call in the forward method.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The pull request correctly introduces the num_replicated_prefix parameter to the USPAttention call, using seq_len_txt. This ensures that replicated text tokens are handled appropriately within the attention mechanism, which is crucial for maintaining correct attention weights in a distributed setup. The change aligns with the intended functionality of the USPAttention module.

mickqian · 2026-03-18T17:00:36Z

/tag-and-rerun-ci

gemini-code-assist · 2026-03-18T17:09:35Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

mickqian · 2026-03-18T17:09:43Z

/rerun-failed-ci

yhyang201 · 2026-03-19T00:16:30Z

/rerun-failed-ci

yhyang201 · 2026-03-19T23:46:48Z

/rerun-failed-ci

yhyang201 · 2026-03-21T08:51:56Z

/rerun-failed-ci

yhyang201 · 2026-03-21T10:38:03Z

/rerun-failed-ci

yhyang201 · 2026-03-21T12:23:21Z

/rerun-failed-ci

yhyang201 · 2026-03-21T14:15:37Z

/rerun-failed-ci

yhyang201 · 2026-03-21T16:43:40Z

/rerun-failed-ci

Rockdu · 2026-03-23T01:41:30Z

Thanks for fixing this SP precision issue, here are some quantized test results on our side for reference

Worst-case `min_cosine` vs `single_gpu_ref`

Model	Metric	Before	After	Result
Z-Image-Turbo	model_output	0.5039	1.0000	✅ Fixed (exact match)
Z-Image-Turbo	prev_sample_mean	0.7694	1.0000	✅ Fixed (exact match)
Qwen-Image	model_output	0.8616	1.0000	✅ Fixed (exact match)
Qwen-Image	prev_sample_mean	0.8825	1.0000	✅ Fixed (exact match)

mickqian requested review from BBuf, ping1jing2, yhyang201 and yingluosanqian as code owners March 16, 2026 08:56

github-actions bot added the diffusion SGLang Diffusion label Mar 16, 2026

gemini-code-assist bot reviewed Mar 16, 2026

View reviewed changes

mickqian force-pushed the diffusion-fix branch from 602f2fb to cc7a6f5 Compare March 17, 2026 07:51

mickqian marked this pull request as draft March 18, 2026 07:39

github-actions bot added npu run-ci labels Mar 18, 2026

mickqian added 10 commits March 19, 2026 01:03

[diffusion] fix: fix sp for qwen-image

58e3de1

upd

2c44bf3

upd

5b12c08

upd

4f19b06

fix zimage

2cb11db

fix zimage

503c3ab

upd

5b97648

upd

8825c29

qwen-image: sdpa for text encoder

ac0bc82

qwen-image: sdpa for text encoder

430b836

mickqian force-pushed the diffusion-fix branch from 40f719d to 430b836 Compare March 18, 2026 17:03

mickqian marked this pull request as ready for review March 18, 2026 17:09

mickqian added the DO NOT MERGE label Mar 18, 2026

qwen-image: sdpa for text encoder

8a4702d

mickqian changed the title ~~[diffusion] fix: fix sp for qwen-image~~ [diffusion] fix: fix accuracy for multiple image models Mar 19, 2026

mickqian added 4 commits March 19, 2026 16:53

fix qwen-image and qwen-image-edit

fc92c37

fix qwen-image

18bc515

fix qwen-image

f846291

clean

fe8c80d

mickqian added 7 commits March 20, 2026 10:08

upd

b9a7b1d

upd

fca1c95

upd

e323e64

clean

436cf84

clean

7096f29

fix ci

b498d7e

fix ci

ac01a0c

mickqian added 3 commits March 22, 2026 08:19

Merge branch 'main' into diffusion-fix

c3961c9

upd

34612f3

Merge remote-tracking branch 'origin/diffusion-fix' into diffusion-fix

4dce4c0

mickqian mentioned this pull request Mar 22, 2026

[diffusion] fix Z-Image SP sharding for portrait and padded resolutions #21042

Merged

5 tasks

mickqian merged commit f7fc2c8 into main Mar 22, 2026
70 of 73 checks passed

mickqian deleted the diffusion-fix branch March 22, 2026 07:11

OrangeRedeng pushed a commit to OrangeRedeng/sglang that referenced this pull request Mar 22, 2026

[diffusion] fix: fix accuracy for some image models (sgl-project#20679)

6902f43

mickqian mentioned this pull request Mar 23, 2026

[NPU][Diffusion] fix sp modulate for qwen-image-edit #20974

Merged

5 tasks

avjves mentioned this pull request Mar 23, 2026

[diffusion] model: Fix FLUX.1 output correctness #21041

Merged

5 tasks

0-693 pushed a commit to 0-693/sglang that referenced this pull request Mar 25, 2026

[diffusion] fix: fix accuracy for some image models (sgl-project#20679)

be6545a

dutsc pushed a commit to dutsc/sglang that referenced this pull request Mar 30, 2026

[diffusion] fix: fix accuracy for some image models (sgl-project#20679)

01a4c8a

JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026

[diffusion] fix: fix accuracy for some image models (sgl-project#20679)

f70d8bb

Conversation

mickqian commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Repro

Qwen-Image

Qwen-Image-Edit

Z-Image-Turbo

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist bot commented Mar 16, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

mickqian commented Mar 18, 2026

Uh oh!

gemini-code-assist bot commented Mar 18, 2026

Uh oh!

mickqian commented Mar 18, 2026

Uh oh!

yhyang201 commented Mar 19, 2026

Uh oh!

yhyang201 commented Mar 19, 2026

Uh oh!

yhyang201 commented Mar 21, 2026

Uh oh!

yhyang201 commented Mar 21, 2026

Uh oh!

yhyang201 commented Mar 21, 2026

Uh oh!

yhyang201 commented Mar 21, 2026

Uh oh!

yhyang201 commented Mar 21, 2026

Uh oh!

Uh oh!

Rockdu commented Mar 23, 2026

Worst-case min_cosine vs single_gpu_ref

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mickqian commented Mar 16, 2026 •

edited

Loading

Worst-case `min_cosine` vs `single_gpu_ref`