Skip to content

[WIP] wan2.2: replace the small operator's WanRMS_norm with the fused operator on npu#2952

Closed
lyj-jjj wants to merge 326 commits intovllm-project:release/v0.18.0.post1from
lyj-jjj:0.18.0.post1-wan2.2-vae-rmsnorm-new
Closed

[WIP] wan2.2: replace the small operator's WanRMS_norm with the fused operator on npu#2952
lyj-jjj wants to merge 326 commits intovllm-project:release/v0.18.0.post1from
lyj-jjj:0.18.0.post1-wan2.2-vae-rmsnorm-new

Conversation

@lyj-jjj
Copy link
Copy Markdown
Contributor

@lyj-jjj lyj-jjj commented Apr 20, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

replace the small operator's WanRMS_norm with the fused operator on npu

Test Plan

  • script
export ASCEND_RT_VISIBLE_DEVICES=8,9,10,11
export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True"

python image_to_video.py \
--model /weights/Wan2.2-I2V-A14B-LightX2V-Diffusers \
--image i2v_input.jpg \
--prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside" \
--height 1280 \
--width 720 \
--num-frames 61 \
--guidance-scale 1.0 \
--guidance-scale-high 1.0 \
--num-inference-steps 4 \
--cfg-parallel-size 1 \
--ulysses-degree 4 \
--boundary-ratio 0.875 \
--flow-shift 12.0 \
--fps 16 \
--output i2v_output_origin_weight_1.mp4 \
--vae-patch-parallel-size 4 \
--vae-use-tiling \

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

yuanheng-zhao and others added 30 commits April 5, 2026 06:47
Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>
…uxKontextPipeline et.al (vllm-project#2489)

Signed-off-by: Lancer <maruixiang6688@gmail.com>
Signed-off-by: Zhengyuan Su (苏政渊) <su.zhengyuan@u.nus.edu>
)

Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>
…roject#1439)

Signed-off-by: Hyoseop Song <crad_on25@naver.com>
Signed-off-by: Hyoseop Song  <crad_on25@naver.com>
…AM (vllm-project#2429)

Signed-off-by: Sy03 <1370724210@qq.com>
Co-authored-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com>
…rs (vllm-project#2470)

Signed-off-by: willamhou <willamhou@ceresman.com>
Co-authored-by: willamhou <willamhou@ceresman.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Happy <yesreply@happy.engineering>
…rameters/buffers staying on CPU (vllm-project#1486)

Signed-off-by: Lancer <maruixiang6688@gmail.com>
Signed-off-by: Lancer <402430575@qq.com>
Co-authored-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
Signed-off-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
…t#2503)

Signed-off-by: princepride <wangzhipeng628@gmail.com>
…t#1284)

Signed-off-by: Alicia <115451386+congw729@users.noreply.github.com>
Signed-off-by: wangyu <410167048@qq.com>
Co-authored-by: wangyu <410167048@qq.com>
…les only contain documentation. (vllm-project#2534)

Signed-off-by: wangyu <410167048@qq.com>
…ject#2359)

Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
Co-authored-by: Canlin Guo <canlinguosdu@gmail.com>
Signed-off-by: Chen Yang <2082464740@qq.com>
Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
… loop (vllm-project#2511)

Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
…Human and fix media utils bug (vllm-project#2542)

Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
…m-project#2424)

Signed-off-by: marksverdhei <marksverdhei@hotmail.com>
Signed-off-by: marksverdhai <249650165+marksverdhai@users.noreply.github.com>
Co-authored-by: marksverdhai <249650165+marksverdhai@users.noreply.github.com>
…, close/update race, and heartbeat stall (vllm-project#1899)

Signed-off-by: pikaxinge <2392811793@qq.com>
Co-authored-by: Alicia <115451386+congw729@users.noreply.github.com>
Signed-off-by: khluu <khluu000@gmail.com>
y123456y78 and others added 9 commits April 22, 2026 03:49
…cfg_alpha for Voxtral TTS (vllm-project#2338)

Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Yueqian Lin <linyueqian@outlook.com>
Co-authored-by: Yueqian Lin <linyueqian@outlook.com>
Signed-off-by: CHEN <116010019@link.cuhk.edu.cn>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: Jared Wen <w13431838023@gmail.com>
Co-authored-by: SYLAR <125541396+lishunyang12@users.noreply.github.com>
…4 nightly tests (vllm-project#2641)

Signed-off-by: wangyu <410167048@qq.com>
Signed-off-by: XIN GAO <1037396230@qq.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Co-authored-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>
…2670)

Signed-off-by: lvliang-intel <liang1.lv@intel.com>
Co-authored-by: SYLAR <125541396+lishunyang12@users.noreply.github.com>
Signed-off-by: Ricardo Noriega <rnoriega@redhat.com>
return

import torch_npu
class WanRMS_norm(nn.Module):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not directly use RMSNorm introduced in #2583?

Comment on lines +4 to +39
from __future__ import annotations
import torch

from torch import nn
from vllm_omni.platforms import current_omni_platform

def patch_wan_rms_norm():
'''Replace small operators with fused operators'''

if not current_omni_platform.is_npu():
return

import torch_npu
class WanRMS_norm(nn.Module):
def __init__(self, dim: int, channel_first: bool = True, images: bool = True, bias: bool = False) -> None:
super().__init__()
broadcastable_dims = (1, 1, 1) if not images else (1, 1)
shape = (dim, *broadcastable_dims) if channel_first else (dim,)
self.channel_first = channel_first
self.scale = dim ** 0.5
self.gamma = nn.Parameter(torch.ones(shape))
self.gamma_new = None
self.bias = nn.Parameter(torch.zeros(shape)) if bias else 0.0

def forward(self, x):
x = x.transpose(1, -1)
if self.gamma_new is None:
self.gamma_new = self.gamma.transpose(0, -1).reshape(-1)
x_out = torch_npu.npu_rms_norm(x, self.gamma_new, epsilon=1e-6)
x_out = x_out[0].transpose(1, -1)
return x_out

import sys
for module_name, module in sys.modules.items():
if hasattr(module, 'WanRMS_norm'):
setattr(module, 'WanRMS_norm', WanRMS_norm) No newline at end of file
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will only need these lines. And it's unnecessary to create a new file.

Suggested change
from __future__ import annotations
import torch
from torch import nn
from vllm_omni.platforms import current_omni_platform
def patch_wan_rms_norm():
'''Replace small operators with fused operators'''
if not current_omni_platform.is_npu():
return
import torch_npu
class WanRMS_norm(nn.Module):
def __init__(self, dim: int, channel_first: bool = True, images: bool = True, bias: bool = False) -> None:
super().__init__()
broadcastable_dims = (1, 1, 1) if not images else (1, 1)
shape = (dim, *broadcastable_dims) if channel_first else (dim,)
self.channel_first = channel_first
self.scale = dim ** 0.5
self.gamma = nn.Parameter(torch.ones(shape))
self.gamma_new = None
self.bias = nn.Parameter(torch.zeros(shape)) if bias else 0.0
def forward(self, x):
x = x.transpose(1, -1)
if self.gamma_new is None:
self.gamma_new = self.gamma.transpose(0, -1).reshape(-1)
x_out = torch_npu.npu_rms_norm(x, self.gamma_new, epsilon=1e-6)
x_out = x_out[0].transpose(1, -1)
return x_out
import sys
for module_name, module in sys.modules.items():
if hasattr(module, 'WanRMS_norm'):
setattr(module, 'WanRMS_norm', WanRMS_norm)
import sys
for module_name, module in sys.modules.items():
if hasattr(module, 'WanRMS_norm'):
setattr(module, 'WanRMS_norm', RMSNorm)

fhfuih and others added 14 commits April 23, 2026 00:07
…lm-project#2724)

Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>
…l img2img mm kwargs (vllm-project#2932)

Signed-off-by: NumberWan <wantszkin2003@gmail.com>
…upportsModuleOffload (vllm-project#2427)

Signed-off-by: Nick Cao <ncao@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: wuhang <wuhang6@huawei.com>
Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>
Signed-off-by: david6666666 <530634352@qq.com>
Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>
Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
Signed-off-by: LHXuuu <xulianhao.xlh@antgroup.com>
Co-authored-by: LHXuuu <xulianhao.xlh@antgroup.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
)

Signed-off-by: xiaohajiayou <923390377@qq.com>
Co-authored-by: SYLAR <125541396+lishunyang12@users.noreply.github.com>
Signed-off-by: Ricardo Noriega De Soto <rnoriega@redhat.com>
…wen-image model and modified conftest.py in test/dfx/ (vllm-project#2817)

Signed-off-by: zhumingjue <zhumingjue@huawei.com>
Signed-off-by: zhumingjue138 <zhumingjue@huawei.com>
Signed-off-by: Hui <1779066624@qq.com>
Signed-off-by: Hui. <62495465+Hu1Lcode@users.noreply.github.com>
Co-authored-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com>
@lyj-jjj lyj-jjj force-pushed the 0.18.0.post1-wan2.2-vae-rmsnorm-new branch from 698df24 to 95a07f7 Compare April 23, 2026 10:09
lyj-jjj added 2 commits April 23, 2026 19:02
Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
@lyj-jjj lyj-jjj changed the title wan2.2: replace the small operator's WanRMS_norm with the fused operator on npu [WIP] wan2.2: replace the small operator's WanRMS_norm with the fused operator on npu Apr 23, 2026
@lyj-jjj lyj-jjj closed this Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.