Skip to content

[diffusion] fix: fix accuracy for some image models#20679

Merged
mickqian merged 27 commits intomainfrom
diffusion-fix
Mar 22, 2026
Merged

[diffusion] fix: fix accuracy for some image models#20679
mickqian merged 27 commits intomainfrom
diffusion-fix

Conversation

@mickqian
Copy link
Copy Markdown
Collaborator

@mickqian mickqian commented Mar 16, 2026

Motivation

Modifications

  1. Qwen-Image: align CFG with official diffusers by introducing true_cfg_scale, enabling CFG when tru e_cfg_scale > 1, and matching the official true-CFG norm rescale.
  2. Qwen-Image-Edit: respect negative-image, so the uncond branch really uses negative_prompt.
  3. Qwen-Image-Edit: keep Qwen2.5-VL vision rotary frequencies in fp32(aligned with diffusers) to reduce encoder drift
  4. Qwen-Image-Edit with ulysses-degree=2: shard rope of condition-image together with rope of noisy-image, and build zero_cond_t modulation indices from local SP sequence lengths.
  5. Z-Image: fix prompt/tokenization and dtype by using rendered chat-template prompts, bf16 text encoding, and fp32 latent sampling/scheduler state.
  6. Z-Image with SP/Ulysses: stop sharding caption tokens (condition shouldn't be sharded), keep caption as a replicated suffix in joint attention, and fix the local/global RoPE offsets so multi-GPU stays aligned with single-GPU

Repro

Qwen-Image

$ sglang generate \
  --model-path Qwen/Qwen-Image \
  --attention-backend torch_sdpa \
  --prompt 'A coffee shop entrance features a chalkboard sign reading "Qwen Coffee  $2 per cup," with a neon
light beside it displaying "通义千问". Next to it hangs a poster showing a beautiful Chinese woman, and beneath
the poster is written "π≈3.1415926-53589793-23846264-33832795-02384197". Ultra HD, 4K, cinematic composition,
Ultra HD, 4K, cinematic composition.' \
  --negative-prompt " " \
  --width 1664 \
  --height 928 \
  --num-inference-steps 50 \
  --true-cfg-scale 4.0 \
  --seed 42 \
  --save-output
prompt = """A coffee shop entrance features a chalkboard sign reading "Qwen Coffee  $2 per cup," with a neon light beside it displaying "通义千问". Next to it hangs a poster showing a beautiful Chinese woman, and beneath the poster is
written "π≈3.1415926-53589793-23846264-33832795-02384197". Ultra HD, 4K, cinematic composition"""

pipe = DiffusionPipeline.from_pretrained("Qwen/Qwen-Image", torch_dtype=torch.bfloat16).to("cuda")
image = pipe(
    prompt=prompt,
    negative_prompt=" ",
    width=1664,
    height=928,
    num_inference_steps=50,
    true_cfg_scale=4.0,
    generator=torch.Generator(device="cuda").manual_seed(42),
).images[0]
  image.save("/tmp/triplet_outputs/qwen_image_diffusers.png")
before this branch this branch diffusers
qwen_image_before qwen_image_after qwen_image_diffusers

with --ulysses-degree 2:

image

Qwen-Image-Edit

$ sglang generate --model-path=Qwen/Qwen-Image-Edit-2511 --prompt="Convert 2D style to 3D style" --image-path="https://github.com/lm-sys/lm-sys.github.io/releases/download/test/TI2I_Qwen_Image_Edit_Input.jpg" --save-output --width=1536 --height=1024
before this branch this branch diffusers
qwen_image_dit_before qwen_image_edit_after qwen_image_edit_diffusers

abs_diff (1-gpu vs diffusers):
abs_diff

with --ulysses-degree 2:

sglang_2gpu_ulysses2

Z-Image-Turbo

$ sglang generate \
  --model-path Tongyi-MAI/Z-Image-Turbo \
  --prompt 'Young Chinese woman in red Hanfu, intricate embroidery. Impeccable makeup, red floral forehead pattern. Elaborate high bun, golden phoenix headdress, red flowers, beads. Holds round folding fan with lady, trees, bird. Neon lightning-bolt lamp (⚡️), bright yellow glow, above extended left palm. Soft-lit outdoor night background, silhouetted tiered pagoda (西安大雁塔), blurred colorful distant lights.' \
  --seed 42 \
  --height 1024 \
  --width 1024 \
  --num-inference-steps 9 \
  --guidance-scale 0.0
import torch
from diffusers import ZImagePipeline

prompt = """Young Chinese woman in red Hanfu, intricate embroidery. Impeccable makeup, red floral forehead pattern.
Elaborate high bun, golden phoenix headdress, red flowers, beads. Holds round folding fan with lady, trees, bird. Neon
lightning-bolt lamp (⚡️), bright yellow glow, above extended left palm. Soft-lit outdoor night background, silhouetted
tiered pagoda (西安大雁塔), blurred colorful distant lights."""

pipe = ZImagePipeline.from_pretrained(
    "Tongyi-MAI/Z-Image-Turbo",
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=False,
).to("cuda")
image = pipe(
    prompt=prompt,
    height=1024,
    width=1024,
    num_inference_steps=9,
    guidance_scale=0.0,
    generator=torch.Generator("cuda").manual_seed(42),
).images[0]
image.save("/tmp/triplet_outputs/zimage_turbo_diffusers.png")
before this branch this branch diffusers
zimage_turbo_before zimage_turbo_after zimage_turbo_diffusers

with --ulysses-degree=2:

image

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a specific issue within the qwen-image model's implementation, ensuring that sequence prefixes are correctly managed during the forward pass. This fix is essential for the model's stability and accurate processing of multimodal inputs, particularly in scenarios involving text sequence lengths.

Highlights

  • Qwen-Image Model Fix: Corrected the forward method in the qwen_image.py file by adding the num_replicated_prefix argument, set to seq_len_txt, to ensure proper handling of sequence prefixes for multimodal processing.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • python/sglang/multimodal_gen/runtime/models/dits/qwen_image.py
    • Added num_replicated_prefix argument to a function call in the forward method.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions github-actions bot added the diffusion SGLang Diffusion label Mar 16, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request correctly introduces the num_replicated_prefix parameter to the USPAttention call, using seq_len_txt. This ensures that replicated text tokens are handled appropriately within the attention mechanism, which is crucial for maintaining correct attention weights in a distributed setup. The change aligns with the intended functionality of the USPAttention module.

@mickqian mickqian marked this pull request as draft March 18, 2026 07:39
@mickqian
Copy link
Copy Markdown
Collaborator Author

/tag-and-rerun-ci

@mickqian mickqian marked this pull request as ready for review March 18, 2026 17:09
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@mickqian
Copy link
Copy Markdown
Collaborator Author

/rerun-failed-ci

1 similar comment
@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@mickqian mickqian changed the title [diffusion] fix: fix sp for qwen-image [diffusion] fix: fix accuracy for multiple image models Mar 19, 2026
@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

4 similar comments
@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@mickqian mickqian merged commit f7fc2c8 into main Mar 22, 2026
70 of 73 checks passed
@mickqian mickqian deleted the diffusion-fix branch March 22, 2026 07:11
OrangeRedeng pushed a commit to OrangeRedeng/sglang that referenced this pull request Mar 22, 2026
@Rockdu
Copy link
Copy Markdown

Rockdu commented Mar 23, 2026

Thanks for fixing this SP precision issue, here are some quantized test results on our side for reference

Worst-case min_cosine vs single_gpu_ref

Model Metric Before After Result
Z-Image-Turbo model_output 0.5039 1.0000 ✅ Fixed (exact match)
Z-Image-Turbo prev_sample_mean 0.7694 1.0000 ✅ Fixed (exact match)
Qwen-Image model_output 0.8616 1.0000 ✅ Fixed (exact match)
Qwen-Image prev_sample_mean 0.8825 1.0000 ✅ Fixed (exact match)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants