[WIP] wan2.2: replace the small operator's WanRMS_norm with the fused operator on npu#2952
Closed
lyj-jjj wants to merge 326 commits intovllm-project:release/v0.18.0.post1from
Closed
Conversation
Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>
…uxKontextPipeline et.al (vllm-project#2489) Signed-off-by: Lancer <maruixiang6688@gmail.com>
Signed-off-by: Zhengyuan Su (苏政渊) <su.zhengyuan@u.nus.edu>
…roject#1439) Signed-off-by: Hyoseop Song <crad_on25@naver.com> Signed-off-by: Hyoseop Song <crad_on25@naver.com>
…project#2480) Signed-off-by: Sy03 <1370724210@qq.com>
…AM (vllm-project#2429) Signed-off-by: Sy03 <1370724210@qq.com> Co-authored-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com>
…lm-project#2430) Signed-off-by: Sy03 <1370724210@qq.com>
…rs (vllm-project#2470) Signed-off-by: willamhou <willamhou@ceresman.com> Co-authored-by: willamhou <willamhou@ceresman.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Happy <yesreply@happy.engineering>
…rameters/buffers staying on CPU (vllm-project#1486) Signed-off-by: Lancer <maruixiang6688@gmail.com> Signed-off-by: Lancer <402430575@qq.com> Co-authored-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu> Signed-off-by: Claude <noreply@anthropic.com> Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
…t#2503) Signed-off-by: princepride <wangzhipeng628@gmail.com>
vllm-project#2502) Signed-off-by: princepride <wangzhipeng628@gmail.com>
…t#1284) Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com> Signed-off-by: wangyu <410167048@qq.com> Co-authored-by: wangyu <410167048@qq.com>
…les only contain documentation. (vllm-project#2534) Signed-off-by: wangyu <410167048@qq.com>
…#2488) Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
…ect#2530) Signed-off-by: Alex Brooks <albrooks@redhat.com>
…ject#2359) Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> Co-authored-by: Canlin Guo <canlinguosdu@gmail.com>
Signed-off-by: Chen Yang <2082464740@qq.com> Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
… loop (vllm-project#2511) Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
…ion (vllm-project#2270) Signed-off-by: skf1999 <13234016272@163.com>
…Human and fix media utils bug (vllm-project#2542) Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
…m-project#2424) Signed-off-by: marksverdhei <marksverdhei@hotmail.com> Signed-off-by: marksverdhai <249650165+marksverdhai@users.noreply.github.com> Co-authored-by: marksverdhai <249650165+marksverdhai@users.noreply.github.com>
…, close/update race, and heartbeat stall (vllm-project#1899) Signed-off-by: pikaxinge <2392811793@qq.com> Co-authored-by: Alicia <115451386+congw729@users.noreply.github.com>
Signed-off-by: khluu <khluu000@gmail.com>
…cfg_alpha for Voxtral TTS (vllm-project#2338) Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai> Signed-off-by: Yueqian Lin <linyueqian@outlook.com> Co-authored-by: Yueqian Lin <linyueqian@outlook.com>
Signed-off-by: CHEN <116010019@link.cuhk.edu.cn>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com> Signed-off-by: Jared Wen <w13431838023@gmail.com> Co-authored-by: SYLAR <125541396+lishunyang12@users.noreply.github.com>
…4 nightly tests (vllm-project#2641) Signed-off-by: wangyu <410167048@qq.com>
Signed-off-by: XIN GAO <1037396230@qq.com> Signed-off-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>
…2670) Signed-off-by: lvliang-intel <liang1.lv@intel.com> Co-authored-by: SYLAR <125541396+lishunyang12@users.noreply.github.com>
Signed-off-by: Ricardo Noriega <rnoriega@redhat.com>
gcanlin
reviewed
Apr 22, 2026
| return | ||
|
|
||
| import torch_npu | ||
| class WanRMS_norm(nn.Module): |
Collaborator
There was a problem hiding this comment.
Why not directly use RMSNorm introduced in #2583?
gcanlin
reviewed
Apr 22, 2026
Comment on lines
+4
to
+39
| from __future__ import annotations | ||
| import torch | ||
|
|
||
| from torch import nn | ||
| from vllm_omni.platforms import current_omni_platform | ||
|
|
||
| def patch_wan_rms_norm(): | ||
| '''Replace small operators with fused operators''' | ||
|
|
||
| if not current_omni_platform.is_npu(): | ||
| return | ||
|
|
||
| import torch_npu | ||
| class WanRMS_norm(nn.Module): | ||
| def __init__(self, dim: int, channel_first: bool = True, images: bool = True, bias: bool = False) -> None: | ||
| super().__init__() | ||
| broadcastable_dims = (1, 1, 1) if not images else (1, 1) | ||
| shape = (dim, *broadcastable_dims) if channel_first else (dim,) | ||
| self.channel_first = channel_first | ||
| self.scale = dim ** 0.5 | ||
| self.gamma = nn.Parameter(torch.ones(shape)) | ||
| self.gamma_new = None | ||
| self.bias = nn.Parameter(torch.zeros(shape)) if bias else 0.0 | ||
|
|
||
| def forward(self, x): | ||
| x = x.transpose(1, -1) | ||
| if self.gamma_new is None: | ||
| self.gamma_new = self.gamma.transpose(0, -1).reshape(-1) | ||
| x_out = torch_npu.npu_rms_norm(x, self.gamma_new, epsilon=1e-6) | ||
| x_out = x_out[0].transpose(1, -1) | ||
| return x_out | ||
|
|
||
| import sys | ||
| for module_name, module in sys.modules.items(): | ||
| if hasattr(module, 'WanRMS_norm'): | ||
| setattr(module, 'WanRMS_norm', WanRMS_norm) No newline at end of file |
Collaborator
There was a problem hiding this comment.
We will only need these lines. And it's unnecessary to create a new file.
Suggested change
| from __future__ import annotations | |
| import torch | |
| from torch import nn | |
| from vllm_omni.platforms import current_omni_platform | |
| def patch_wan_rms_norm(): | |
| '''Replace small operators with fused operators''' | |
| if not current_omni_platform.is_npu(): | |
| return | |
| import torch_npu | |
| class WanRMS_norm(nn.Module): | |
| def __init__(self, dim: int, channel_first: bool = True, images: bool = True, bias: bool = False) -> None: | |
| super().__init__() | |
| broadcastable_dims = (1, 1, 1) if not images else (1, 1) | |
| shape = (dim, *broadcastable_dims) if channel_first else (dim,) | |
| self.channel_first = channel_first | |
| self.scale = dim ** 0.5 | |
| self.gamma = nn.Parameter(torch.ones(shape)) | |
| self.gamma_new = None | |
| self.bias = nn.Parameter(torch.zeros(shape)) if bias else 0.0 | |
| def forward(self, x): | |
| x = x.transpose(1, -1) | |
| if self.gamma_new is None: | |
| self.gamma_new = self.gamma.transpose(0, -1).reshape(-1) | |
| x_out = torch_npu.npu_rms_norm(x, self.gamma_new, epsilon=1e-6) | |
| x_out = x_out[0].transpose(1, -1) | |
| return x_out | |
| import sys | |
| for module_name, module in sys.modules.items(): | |
| if hasattr(module, 'WanRMS_norm'): | |
| setattr(module, 'WanRMS_norm', WanRMS_norm) | |
| import sys | |
| for module_name, module in sys.modules.items(): | |
| if hasattr(module, 'WanRMS_norm'): | |
| setattr(module, 'WanRMS_norm', RMSNorm) |
…lm-project#2724) Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>
…l img2img mm kwargs (vllm-project#2932) Signed-off-by: NumberWan <wantszkin2003@gmail.com>
…upportsModuleOffload (vllm-project#2427) Signed-off-by: Nick Cao <ncao@redhat.com> Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>
…ject#2766) Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
Signed-off-by: david6666666 <530634352@qq.com>
…ject#3049) Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com> Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com> Signed-off-by: LHXuuu <xulianhao.xlh@antgroup.com> Co-authored-by: LHXuuu <xulianhao.xlh@antgroup.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
…t#3054) Signed-off-by: NumberWan <wantszkin2003@gmail.com>
Signed-off-by: Ricardo Noriega De Soto <rnoriega@redhat.com>
…wen-image model and modified conftest.py in test/dfx/ (vllm-project#2817) Signed-off-by: zhumingjue <zhumingjue@huawei.com> Signed-off-by: zhumingjue138 <zhumingjue@huawei.com>
Signed-off-by: Hui <1779066624@qq.com> Signed-off-by: Hui. <62495465+Hu1Lcode@users.noreply.github.com> Co-authored-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com>
698df24 to
95a07f7
Compare
Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
replace the small operator's WanRMS_norm with the fused operator on npu
Test Plan
Test Result
performance
before: Encoder-2.1s, Decoder-3.4s
after: Encoder-1.6s, Decoder-2.5s ---- reduce-1.4s, improve-34%
Accuracy
berfore
https://github.com/user-attachments/assets/683596fe-7e83-4272-ba16-0475c464cf73
after
https://github.com/user-attachments/assets/4597a909-27a8-4930-95eb-2f9c1913f0ae
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)