Skip to content

[Diffusion] Improve qwen image edit performace to align with LightX2V#15812

Merged
mickqian merged 26 commits intomainfrom
qwen_image_edit_opt
Dec 26, 2025
Merged

[Diffusion] Improve qwen image edit performace to align with LightX2V#15812
mickqian merged 26 commits intomainfrom
qwen_image_edit_opt

Conversation

@BBuf
Copy link
Copy Markdown
Collaborator

@BBuf BBuf commented Dec 25, 2025

main

sglang generate --model-path Qwen/Qwen-Image-Edit-2511 --prompt "Change the person to a standing position, bending over to hold the dog's front paws."  --image-path "/home/lmsys/bbuf/LightX2V/examples/qwen_image/1.png"
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:30<00:00,  1.29it/s]
[12-25 07:11:20] [DenoisingStage] average time per step: 0.7722 seconds
[12-25 07:11:20] [DenoisingStage] finished in 30.8956 seconds
[12-25 07:11:20] [DecodingStage] started...
[12-25 07:11:20] [DecodingStage] finished in 0.5390 seconds
[12-25 07:11:20] Output saved to outputs/Change_the_person_to_a_standing_position_bending_over_to_hold_the_dog_s_front_paws._20251225-071047_08c591d4.jpg
[12-25 07:11:20] Pixel data generated successfully in 33.13 seconds
图片

pr

sglang generate --model-path Qwen/Qwen-Image-Edit-2511 --prompt "Change the person to a standing position, bending over to hold the dog's front paws."  --image-path "/home/lmsys/bbuf/LightX2V/examples/qwen_image/1.png"
[12-25 07:00:34] [DenoisingStage] started...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:25<00:00,  1.58it/s]
[12-25 07:00:59] [DenoisingStage] average time per step: 0.6327 seconds
[12-25 07:00:59] [DenoisingStage] finished in 25.3114 seconds
[12-25 07:00:59] [DecodingStage] started...
[12-25 07:01:00] [DecodingStage] finished in 0.5667 seconds
图片

Motivation

use upstream fa3 , not sgl-kernel fa3 : 1.7ms->1.2ms

main:

b5f74563-ea0e-40fd-8416-f447ec1805fe

pr:

d1ae0892-8e3a-4044-a468-6daa162450e8

flashinfer rope: 241us->82us

main

图片

pr:

图片

revert pack qkv to avoid unaligned cat kernel: 154.5us->70us

main:

图片

pr:

图片

fuse_scale_shift_gate_select01_kernel: 148us->66us。

main:

图片

pr:

图片

one-step one-lyaer result

  • LightX2V(5.1ms)
图片
  • sglang main(6.1ms)
图片
  • pr(5.19ms)
图片

Each layer of every step is very close now.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions bot added the diffusion SGLang Diffusion label Dec 25, 2025
@BBuf
Copy link
Copy Markdown
Collaborator Author

BBuf commented Dec 25, 2025

/tag-and-rerun-ci

@BBuf BBuf added the run-ci label Dec 25, 2025
@mickqian mickqian merged commit 51dbdb2 into main Dec 26, 2025
97 of 104 checks passed
@mickqian mickqian deleted the qwen_image_edit_opt branch December 26, 2025 14:25
@sunxxuns
Copy link
Copy Markdown
Collaborator

sunxxuns commented Jan 2, 2026

this PR failed AMD CI's cc @merrymercy @Ying1123 should not be merged.
As flashinfer now only supports Nvidia hardware, should not become a chokepoint for other hardwares. Thanks.
PS: fix in #16287

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

diffusion SGLang Diffusion run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants