[BugFix]Qwen-Image performance regression by using torch RMSNorm(RMSNorm backend)#4074
Merged
Merged
Conversation
1 task
Signed-off-by: NumberWan <wantszkin2003@gmail.com>
Collaborator
|
can you provide a profiling for different versions and this PR? |
Contributor
Author
The difference in profiling comparison has been added to the description. |
5 tasks
86MaxCao
pushed a commit
to 86MaxCao/vllm-omni
that referenced
this pull request
Jun 4, 2026
…orm backend) (vllm-project#4074) Signed-off-by: NumberWan <wantszkin2003@gmail.com>
akshatvishu
pushed a commit
to akshatvishu/vllm-omni
that referenced
this pull request
Jun 13, 2026
…orm backend) (vllm-project#4074) Signed-off-by: NumberWan <wantszkin2003@gmail.com> Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Fix/mitigate the accuracy regression reported in #4029 (
test_qwen_image_matches_diffusers, SSIM threshold ≥ 0.97): switch Qwen-Image RMSNorm layers totorch.nn.RMSNorm.Background
diffusion_benchmark_serving.py→latency_mean).torch.nn.RMSNormas a pragmatic fallback: avoid vLLM RMSNorm extension path, and reduce the risk of fused-kernel numerical drift impacting SSIM.Change
vllm_omni/diffusion/models/qwen_image/qwen_image_transformer.pytorch.nn.RMSNormfor Qwen-Image RMSNorm layersTest Plan
tests/e2e/accuracy/test_qwen_image.py::test_qwen_image_matches_diffusers(primary goal: confirm [Bug]: Nightly / CI failed - tests/e2e/accuracy/test_qwen_image.py::test_qwen_image_matches_diffusers #4029 is addressed).diffusion_benchmark_serving.py(512×512, 10 steps) to showlatency_meanimproved compare will applied vllm rmsNormTest Result
Accuracy:
test_qwen_image_matches_diffusersin local testNote: the rows below are local results (L20X + venv). They are not matched nightly CI absolute values, but are useful for comparing relative changes across RMSNorm backends.
Will be updated once getting CI test result
11c4fced6f4bd3eb1e12d1aAccuracy:
test_qwen_image_matches_diffusersin CI11c4fced6f4bd3eb1e12d1aWorkload: 512×512, 10 steps, 500 requests (+1 warmup), max-concurrency=1.
Baseline table (v0.18.0 as baseline) + torch RMSNorm (this PR):
Compare (vLLM RMS_Norm / Torch RMS_Norm) in profiling
vLLM RMS_Norm CPU time vs Torch RMS_Norm CPU time
vLLM
vLLM RMS_Norm CPU time vs Torch RMS_Norm CUDA time
vLLM
Torch
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)