[Perf] [NPU]Fix rotary embedding dimension mismatch & add auto-adaptation for full-dim cos/sin in mindiesd RoP#2369
Conversation
…rectly Signed-off-by: Li-8916 <lishilin314@163.com>
Signed-off-by: Li-8916 <lishilin314@163.com>
Signed-off-by: Li-8916 <lishilin314@163.com> Co-authored-by: vasede <1399968934@qq.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a25f1311b5
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| query = self.rope(query,freqs_cos, freqs_sin) | ||
| key = self.rope(key,freqs_cos, freqs_sin) |
There was a problem hiding this comment.
Cast rotary tensors before invoking NPU RoPE kernel
In the new NPU branch, self.rope is called with freqs_cos/freqs_sin directly, but WAN rotary tables are produced as float32 while query/key are typically bf16/fp16; this creates a mixed-dtype call path only on mindiesd+NPU. Unlike the previous apply_rotary_emb_wan path (which wrote into empty_like) and unlike apply_rope_to_qk in layers/rope.py (which explicitly casts cos/sin), this can trigger dtype mismatch errors or silently force higher-precision execution in fused kernels, hurting correctness/performance for NPU inference.
Useful? React with 👍 / 👎.
Signed-off-by: Li-8916 <lishilin314@163.com>
Moved the import statement for find_spec to a new location. Signed-off-by: Li-8916 <lishilin314@163.com>
| freqs_cos, freqs_sin = rotary_emb | ||
| query = apply_rotary_emb_wan(query, freqs_cos, freqs_sin) | ||
| key = apply_rotary_emb_wan(key, freqs_cos, freqs_sin) | ||
| if find_spec("mindiesd") is not None and current_omni_platform.is_npu(): |
There was a problem hiding this comment.
any better ways to add supports from mindiesd? cc @gcanlin
There was a problem hiding this comment.
I prefer not to introduce it here. Will investigate how mindie-sd implement it.
There was a problem hiding this comment.
This PR #2393 probably has the same effect and provide the general optimization for all platforms.
Signed-off-by: Li-8916 <lishilin314@163.com>
Signed-off-by: Li-8916 <lishilin314@163.com>
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
This PR optimizes the MindIE-SD Rotary Position Embedding (RoPE) operator on Ascend NPUs to resolve dimension mismatch and redundant expansion issues in models such as Wan 2.2, where cos/sin tensors are pre-expanded to full dimension:
Fix dimension errors: Add automatic dimension detection. When cos.shape[-1] == x.shape[-1], automatically disable half_head_dim to avoid repeated expansion of already full-dimension cos/sin.
Support dual input formats: Compatible with both half-dimension cos/sin (D/2) and full-dimension cos/sin (D), adapting to RoPE preprocessing logic in different models.
Improve robustness: Eliminate shape mismatch errors caused by hardcoded parameters and make the mindiesd RoPE operator adaptive to input layouts.
Maintain performance: No changes to the existing NPU fused acceleration logic; the lightweight dimension check introduces no performance overhead.
Test Plan
Test Environment
Hardware: Ascend NPU 910B / 910B4
Framework: vLLM-Omni + PyTorch for Ascend + MindIE-SD
Model: Wan 2.2 (main model using this RoPE operator)
Test Cases
Basic functionality test
Run with half-dimension cos/sin (D/2), verify correct output shape and values.
Run with full-dimension cos/sin (D), verify new logic is triggered and half_head_dim is set to False.
Dimension matching test
Verify no repeat expand/repeat when cos.shape[-1] == x.shape[-1].
Verify original expansion logic remains unchanged when dimensions differ.
Test Result
All test cases passed on Ascend NPU:
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)