Skip to content

[BugFix]Qwen-Image performance regression by using torch RMSNorm(RMSNorm backend)#4074

Merged
Gaohan123 merged 1 commit into
vllm-project:mainfrom
NumberWan:qwen_perf
Jun 2, 2026
Merged

[BugFix]Qwen-Image performance regression by using torch RMSNorm(RMSNorm backend)#4074
Gaohan123 merged 1 commit into
vllm-project:mainfrom
NumberWan:qwen_perf

Conversation

@NumberWan

@NumberWan NumberWan commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Fix/mitigate the accuracy regression reported in #4029 (test_qwen_image_matches_diffusers, SSIM threshold ≥ 0.97): switch Qwen-Image RMSNorm layers to torch.nn.RMSNorm.

Background

Change

  • File: vllm_omni/diffusion/models/qwen_image/qwen_image_transformer.py
  • Change: use torch.nn.RMSNorm for Qwen-Image RMSNorm layers
  • Scope: Qwen-Image diffusion transformer only

Test Plan

Test Result

Accuracy: test_qwen_image_matches_diffusers in local test

Note: the rows below are local results (L20X + venv). They are not matched nightly CI absolute values, but are useful for comparing relative changes across RMSNorm backends.

Will be updated once getting CI test result

Variant (local) commit SSIM PSNR (dB)
baseline (before-#3933 merged) “Last PASS commit” 11c4fced 0.950571 26.75
omni RMSNorm plateau (post-#3933) 6f4bd3e 0.950902 26.75
torch RMSNorm (this PR) b1e12d1a 0.959643 28.017508

Accuracy: test_qwen_image_matches_diffusers in CI

Variant (local) commit SSIM PSNR (dB)
baseline (before-#3933 merged) “Last PASS commit” 11c4fced 0.972616 30.022175
omni RMSNorm plateau (post-#3933) 6f4bd3e 0.941089 25.499989
torch RMSNorm (After this PR) b1e12d1a 0.975093 30.727339

Workload: 512×512, 10 steps, 500 requests (+1 warmup), max-concurrency=1.
Baseline table (v0.18.0 as baseline) + torch RMSNorm (this PR):

Metric v0.18.0 v0.20.0 (vLLM RMSNorm) v0.20.0 (omni RMSNorm) v0.20.0 (torch RMSNorm, this PR)
throughput_qps 0.843 0.742 (−11.9%) 0.811 (−3.8%) 0.829 (−1.6%)
latency_mean 1.187 1.347 (+13.5%) 1.233 (+3.9%) 1.206 (+1.6%)
latency_median 1.181 1.317 (+11.5%) 1.223 (+3.5%) 1.187 (+0.5%)
latency_p99 1.269 1.647 (+29.8%) 1.363 (+7.4%) 1.372 (+8.2%)
latency_p95 1.234 1.459 (+18.2%) 1.298 (+5.2%) 1.294 (+4.9%)

Compare (vLLM RMS_Norm / Torch RMS_Norm) in profiling

vLLM RMS_Norm CPU time vs Torch RMS_Norm CPU time

vLLM

vllm_rms_norm #### Torch torch_rms_norm Torch RMS_Norm: ~14us vLLM RMS_Norm: ~57us

vLLM RMS_Norm CPU time vs Torch RMS_Norm CUDA time

vLLM

vllm_rms_norm_GPU

Torch

torch_rms_norm_GPU Torch RMS_Norm: ~31us vLLM RMS_Norm: ~62us
Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@NumberWan NumberWan changed the title Qwen perf [BugFix]Qwen-Image performance regression by using torch RMSNorm(RMSNorm backend) Jun 2, 2026
@Gaohan123 Gaohan123 added this to the v0.22.0 milestone Jun 2, 2026
@Gaohan123 Gaohan123 added ready label to trigger buildkite CI diffusion-x2iat-test label to trigger buildkite x2image + x2audio + x2text series of diffusion models test in nightly CI labels Jun 2, 2026
@Gaohan123 Gaohan123 linked an issue Jun 2, 2026 that may be closed by this pull request
1 task
Signed-off-by: NumberWan <wantszkin2003@gmail.com>

@hsliuustc0106 hsliuustc0106 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@hsliuustc0106

Copy link
Copy Markdown
Collaborator

can you provide a profiling for different versions and this PR?

@Gaohan123 Gaohan123 merged commit 35ee3c7 into vllm-project:main Jun 2, 2026
8 checks passed
@NumberWan

Copy link
Copy Markdown
Contributor Author

can you provide a profiling for different versions and this PR?

The difference in profiling comparison has been added to the description.

86MaxCao pushed a commit to 86MaxCao/vllm-omni that referenced this pull request Jun 4, 2026
…orm backend) (vllm-project#4074)

Signed-off-by: NumberWan <wantszkin2003@gmail.com>
akshatvishu pushed a commit to akshatvishu/vllm-omni that referenced this pull request Jun 13, 2026
…orm backend) (vllm-project#4074)

Signed-off-by: NumberWan <wantszkin2003@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

diffusion-x2iat-test label to trigger buildkite x2image + x2audio + x2text series of diffusion models test in nightly CI ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Performance]: Qwen-Image on vLLM-Omni 0.18 -> latest performance regression

3 participants