[Perf] [NPU]Fix rotary embedding dimension mismatch & add auto-adaptation for full-dim cos/sin in mindiesd RoP by Li-8916 · Pull Request #2369 · vllm-project/vllm-omni

Li-8916 · 2026-03-31T09:52:03Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

This PR optimizes the MindIE-SD Rotary Position Embedding (RoPE) operator on Ascend NPUs to resolve dimension mismatch and redundant expansion issues in models such as Wan 2.2, where cos/sin tensors are pre-expanded to full dimension:
Fix dimension errors: Add automatic dimension detection. When cos.shape[-1] == x.shape[-1], automatically disable half_head_dim to avoid repeated expansion of already full-dimension cos/sin.
Support dual input formats: Compatible with both half-dimension cos/sin (D/2) and full-dimension cos/sin (D), adapting to RoPE preprocessing logic in different models.
Improve robustness: Eliminate shape mismatch errors caused by hardcoded parameters and make the mindiesd RoPE operator adaptive to input layouts.
Maintain performance: No changes to the existing NPU fused acceleration logic; the lightweight dimension check introduces no performance overhead.

Test Plan

Test Environment
Hardware: Ascend NPU 910B / 910B4
Framework: vLLM-Omni + PyTorch for Ascend + MindIE-SD
Model: Wan 2.2 (main model using this RoPE operator)
Test Cases
Basic functionality test
Run with half-dimension cos/sin (D/2), verify correct output shape and values.
Run with full-dimension cos/sin (D), verify new logic is triggered and half_head_dim is set to False.
Dimension matching test
Verify no repeat expand/repeat when cos.shape[-1] == x.shape[-1].
Verify original expansion logic remains unchanged when dimensions differ.

Test Result

All test cases passed on Ascend NPU:

model	image resolution	seconds	fps	interface step	Number of NPUs	E2E time	E2E time after rope operator replacement
wan2.2-diffusers	832*480	2s	16	40	4 (cfg1 ulyness4)	86s	72s

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

…rectly Signed-off-by: Li-8916 <lishilin314@163.com>

Signed-off-by: Li-8916 <lishilin314@163.com>

Signed-off-by: Li-8916 <lishilin314@163.com> Co-authored-by: vasede <1399968934@qq.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a25f1311b5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-31T09:58:11Z

+                query = self.rope(query,freqs_cos, freqs_sin)
+                key = self.rope(key,freqs_cos, freqs_sin)


Cast rotary tensors before invoking NPU RoPE kernel

In the new NPU branch, self.rope is called with freqs_cos/freqs_sin directly, but WAN rotary tables are produced as float32 while query/key are typically bf16/fp16; this creates a mixed-dtype call path only on mindiesd+NPU. Unlike the previous apply_rotary_emb_wan path (which wrote into empty_like) and unlike apply_rope_to_qk in layers/rope.py (which explicitly casts cos/sin), this can trigger dtype mismatch errors or silently force higher-precision execution in fused kernels, hurting correctness/performance for NPU inference.

Useful? React with 👍 / 👎.

Signed-off-by: Li-8916 <lishilin314@163.com>

Moved the import statement for find_spec to a new location. Signed-off-by: Li-8916 <lishilin314@163.com>

hsliuustc0106 · 2026-03-31T11:27:22Z

            freqs_cos, freqs_sin = rotary_emb
-            query = apply_rotary_emb_wan(query, freqs_cos, freqs_sin)
-            key = apply_rotary_emb_wan(key, freqs_cos, freqs_sin)
+            if find_spec("mindiesd") is not None and current_omni_platform.is_npu():


any better ways to add supports from mindiesd? cc @gcanlin

???? @Li-8916 @gcanlin

I prefer not to introduce it here. Will investigate how mindie-sd implement it.

This PR #2393 probably has the same effect and provide the general optimization for all platforms.

Signed-off-by: Li-8916 <lishilin314@163.com>

Li-8916 and others added 4 commits March 31, 2026 03:44

cos/sin shape already is [s,d], just use rotary_position_embedding di…

58d78f7

…rectly Signed-off-by: Li-8916 <lishilin314@163.com>

Update rope.py

eb375af

Signed-off-by: Li-8916 <lishilin314@163.com>

Merge branch 'vllm-project:main' into main

492294a

optimize rope on NPU

a25f131

Signed-off-by: Li-8916 <lishilin314@163.com> Co-authored-by: vasede <1399968934@qq.com>

Li-8916 requested a review from hsliuustc0106 as a code owner March 31, 2026 09:52

chatgpt-codex-connector Bot reviewed Mar 31, 2026

View reviewed changes

Li-8916 added 2 commits March 31, 2026 19:18

update rotary

38a7941

Signed-off-by: Li-8916 <lishilin314@163.com>

Reorganize import statements in wan2_2_transformer.py

b91cbf2

Moved the import statement for find_spec to a new location. Signed-off-by: Li-8916 <lishilin314@163.com>

hsliuustc0106 reviewed Mar 31, 2026

View reviewed changes

Li-8916 added 2 commits March 31, 2026 19:28

Remove unnecessary blank line in wan2_2_transformer.py

e09cdb0

Signed-off-by: Li-8916 <lishilin314@163.com>

Remove unnecessary line in rope.py

bc28e90

Signed-off-by: Li-8916 <lishilin314@163.com>

Li-8916 changed the title ~~Fix rotary embedding dimension mismatch & add auto-adaptation for full-dim cos/sin in mindiesd RoP~~ [Perf] [NPU]Fix rotary embedding dimension mismatch & add auto-adaptation for full-dim cos/sin in mindiesd RoP Mar 31, 2026

gcanlin mentioned this pull request Mar 31, 2026

[RFC]: Wan2.2 Performance Optimization Roadmap on vLLM-Omni #1355

Open

1 task

david6666666 mentioned this pull request Apr 1, 2026

[RFC][0.20.0]: Qwen-Image、Qwen-Image-Layered、Qwen-Image-Edit-Plus、Wan2.2 Production-grade Feature Monitoring JiusiServe/vllm-omni#181

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Perf] [NPU]Fix rotary embedding dimension mismatch & add auto-adaptation for full-dim cos/sin in mindiesd RoP#2369

[Perf] [NPU]Fix rotary embedding dimension mismatch & add auto-adaptation for full-dim cos/sin in mindiesd RoP#2369
Li-8916 wants to merge 8 commits intovllm-project:mainfrom
Li-8916:optimize-rope-on-npu

Li-8916 commented Mar 31, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 31, 2026

Uh oh!

hsliuustc0106 Mar 31, 2026

Uh oh!

hsliuustc0106 Apr 1, 2026

Uh oh!

gcanlin Apr 1, 2026

Uh oh!

gcanlin Apr 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		query = self.rope(query,freqs_cos, freqs_sin)
		key = self.rope(key,freqs_cos, freqs_sin)

Conversation

Li-8916 commented Mar 31, 2026

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

gcanlin Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

gcanlin Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gcanlin Apr 1, 2026 •

edited

Loading