[Diffusion] Improve qwen image edit performace to align with LightX2V by BBuf · Pull Request #15812 · sgl-project/sglang

BBuf · 2025-12-25T07:35:17Z

main

sglang generate --model-path Qwen/Qwen-Image-Edit-2511 --prompt "Change the person to a standing position, bending over to hold the dog's front paws."  --image-path "/home/lmsys/bbuf/LightX2V/examples/qwen_image/1.png"

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:30<00:00,  1.29it/s]
[12-25 07:11:20] [DenoisingStage] average time per step: 0.7722 seconds
[12-25 07:11:20] [DenoisingStage] finished in 30.8956 seconds
[12-25 07:11:20] [DecodingStage] started...
[12-25 07:11:20] [DecodingStage] finished in 0.5390 seconds
[12-25 07:11:20] Output saved to outputs/Change_the_person_to_a_standing_position_bending_over_to_hold_the_dog_s_front_paws._20251225-071047_08c591d4.jpg
[12-25 07:11:20] Pixel data generated successfully in 33.13 seconds

pr

sglang generate --model-path Qwen/Qwen-Image-Edit-2511 --prompt "Change the person to a standing position, bending over to hold the dog's front paws."  --image-path "/home/lmsys/bbuf/LightX2V/examples/qwen_image/1.png"

[12-25 07:00:34] [DenoisingStage] started...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:25<00:00,  1.58it/s]
[12-25 07:00:59] [DenoisingStage] average time per step: 0.6327 seconds
[12-25 07:00:59] [DenoisingStage] finished in 25.3114 seconds
[12-25 07:00:59] [DecodingStage] started...
[12-25 07:01:00] [DecodingStage] finished in 0.5667 seconds

Motivation

use upstream fa3 , not sgl-kernel fa3 : 1.7ms->1.2ms

main:

pr:

flashinfer rope: 241us->82us

main

pr:

revert pack qkv to avoid unaligned cat kernel: 154.5us->70us

main:

pr:

fuse_scale_shift_gate_select01_kernel: 148us->66us。

main:

pr:

one-step one-lyaer result

LightX2V(5.1ms)

sglang main(6.1ms)

pr(5.19ms)

Each layer of every step is very close now.

gemini-code-assist · 2025-12-25T07:35:21Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

python/sglang/multimodal_gen/runtime/layers/attention/backends/flash_attn.py

python/sglang/multimodal_gen/runtime/models/dits/qwen_image.py

BBuf · 2025-12-25T11:32:08Z

/tag-and-rerun-ci

sunxxuns · 2026-01-02T06:52:34Z

this PR failed AMD CI's cc @merrymercy @Ying1123 should not be merged.
As flashinfer now only supports Nvidia hardware, should not become a chokepoint for other hardwares. Thanks.
PS: fix in #16287

…16287)

sgl-project#15813 (sgl-project#16287)

…h LightX2V (sgl-project#15812) Co-authored-by: Mick <mickjagger19@icloud.com>

sgl-project#15813 (sgl-project#16287)

BBuf added 21 commits December 24, 2025 19:05

ud

a62db86

ud

cfd3837

ud

f6dd384

ud

1edc85a

ud

d2a6b18

ud

9f1c9a1

ud

780cc3f

ud

0a79fe6

ud

3f2864d

ud

4915682

ud

314a9fc

ud

e3d4e44

ud

32dc874

ud

caa441e

ud

774f9e6

ud

bfb9bbc

ud

00e228c

ud

4d2a8e4

ud

62b04ad

ud

3e4ed81

ud

c0bfef6

BBuf requested review from mickqian and yhyang201 as code owners December 25, 2025 07:35

github-actions bot added the diffusion SGLang Diffusion label Dec 25, 2025

mickqian approved these changes Dec 25, 2025

View reviewed changes

python/sglang/multimodal_gen/runtime/layers/attention/backends/flash_attn.py Outdated Show resolved Hide resolved

python/sglang/multimodal_gen/runtime/models/dits/qwen_image.py Show resolved Hide resolved

BBuf and others added 3 commits December 25, 2025 16:06

ud

d1705d8

lint

41ab5ab

Merge branch 'main' into qwen_image_edit_opt

a8785f5

BBuf added the run-ci label Dec 25, 2025

ud

784c63e

mickqian approved these changes Dec 26, 2025

View reviewed changes

Merge branch 'main' into qwen_image_edit_opt

4e83559

mickqian merged commit 51dbdb2 into main Dec 26, 2025
97 of 104 checks passed

mickqian deleted the qwen_image_edit_opt branch December 26, 2025 14:25

This was referenced Dec 28, 2025

[Diffusion] Refactor diffusion Flash Attention backend #16000

Closed

[Diffusion] Disable packed QKV for FLUX & Z-Image #16038

Merged

[Bug] Slower FlashAttention V3 in Diffusion than Diffusers/Cache-DIT #16196

Closed

sunxxuns mentioned this pull request Jan 2, 2026

fixed amd multimodal CI failures caused by refactor in #15812 #15813 #16287

Merged

Kangyan-Zhou pushed a commit that referenced this pull request Jan 2, 2026

fixed amd multimodal CI failures caused by refactor in #15812 #15813 (#…

30cfb68

…16287)

RubiaCx mentioned this pull request Jan 3, 2026

diffusion: rotary embedding kernel #14302

Open

6 tasks

yingluosanqian pushed a commit to yingluosanqian/sglang that referenced this pull request Jan 4, 2026

fixed amd multimodal CI failures caused by refactor in sgl-project#15812

d5e07a1

sgl-project#15813 (sgl-project#16287)

YChange01 pushed a commit to YChange01/sglang that referenced this pull request Jan 13, 2026

[diffusion] improve: improve qwen-image-edit performance to align wit…

6be33e6

…h LightX2V (sgl-project#15812) Co-authored-by: Mick <mickjagger19@icloud.com>

YChange01 pushed a commit to YChange01/sglang that referenced this pull request Jan 13, 2026

fixed amd multimodal CI failures caused by refactor in sgl-project#15812

fde8543

sgl-project#15813 (sgl-project#16287)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Diffusion] Improve qwen image edit performace to align with LightX2V#15812

[Diffusion] Improve qwen image edit performace to align with LightX2V#15812
mickqian merged 26 commits intomainfrom
qwen_image_edit_opt

BBuf commented Dec 25, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Dec 25, 2025

Uh oh!

Uh oh!

Uh oh!

BBuf commented Dec 25, 2025

Uh oh!

Uh oh!

sunxxuns commented Jan 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

BBuf commented Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

main

pr

Motivation

use upstream fa3 , not sgl-kernel fa3 : 1.7ms->1.2ms

flashinfer rope: 241us->82us

revert pack qkv to avoid unaligned cat kernel: 154.5us->70us

fuse_scale_shift_gate_select01_kernel: 148us->66us。

one-step one-lyaer result

Uh oh!

gemini-code-assist bot commented Dec 25, 2025

Uh oh!

Uh oh!

Uh oh!

BBuf commented Dec 25, 2025

Uh oh!

Uh oh!

sunxxuns commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

BBuf commented Dec 25, 2025 •

edited

Loading

sunxxuns commented Jan 2, 2026 •

edited

Loading