Skip to content

[v0.13.0][Fusion]add checks to skip fusion where split_rmsnorm_rope is not supported#6749

Merged
whx-sjtu merged 2 commits intovllm-project:releases/v0.13.0from
Angazenn:cos_sin_dev
Feb 14, 2026
Merged

[v0.13.0][Fusion]add checks to skip fusion where split_rmsnorm_rope is not supported#6749
whx-sjtu merged 2 commits intovllm-project:releases/v0.13.0from
Angazenn:cos_sin_dev

Conversation

@Angazenn
Copy link
Copy Markdown
Collaborator

@Angazenn Angazenn commented Feb 13, 2026

What this PR does / why we need it?

With #6602 , npu_rotary_embedding unifies all rope implementation in AscendRotaryEmbedding, but allows a wider range of application of fusion op split_qkv_rmsnorm_rope. This PR restricts the fusion of split_qkv_rmsnorm_rope to only cases where head_size == 128 && rotary_dim == head_size. Further enhancement and generalization of this op will be accomplished by @whx-sjtu .

Does this PR introduce any user-facing change?

How was this patch tested?

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @Angazenn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the robustness of the QKNorm and Rope fusion pass by introducing more precise conditions for its application. It centralizes the logic for determining the rotary embedding dimension into a new utility function, which is then used to validate that the head_size and rope_dim are compatible with the fusion, thereby preventing potential issues with unsupported configurations and ensuring stable performance.

Highlights

  • New Utility Function for RoPE Dimension: Introduced a new utility function, get_rope_dim, to consistently determine the rotary embedding dimension based on the VLLM configuration, accounting for MLA usage, partial rotary factors, and explicit rotary dimensions.
  • Enhanced Fusion Skip Logic: Modified the QKNorm and Rope fusion pass to include additional checks, ensuring that fusion is only applied when the head_size is 128 and the rope_dim matches the head_size. This prevents fusion in unsupported configurations.
  • Refactored RoPE Dimension Calculation: Updated the set_cos_and_sin function to utilize the newly introduced get_rope_dim utility, centralizing the logic for determining the rotary embedding dimension and improving code consistency.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • vllm_ascend/compilation/passes/qknorm_rope_fusion_pass.py
    • Imported get_rope_dim from vllm_ascend.ops.rotary_embedding.
    • Modified the QKNormRopeFusionPass initialization to include a check for rope_dim == head_size in addition to head_size == 128, skipping fusion if these conditions are not met.
  • vllm_ascend/ops/rotary_embedding.py
    • Added a new function get_rope_dim to calculate the rotary embedding dimension based on vllm_config properties like use_mla, partial_rotary_factor, and rotary_dim.
    • Updated set_cos_and_sin to use the new get_rope_dim function for determining rope_dim in both use_mla and non-use_mla scenarios.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@Angazenn Angazenn added ready read for review ready-for-test start test by label for PR labels Feb 13, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds checks to skip the split_rmsnorm_rope fusion when it is not supported. This is achieved by refactoring the rope_dim calculation into a new get_rope_dim helper function and using it to validate the conditions for the fusion pass. While the refactoring is a good improvement, it introduces a critical bug in the new get_rope_dim function that could cause incorrect behavior by setting rope_dim to zero inappropriately. I have provided a specific comment and code suggestion to address this issue.

@Angazenn Angazenn force-pushed the cos_sin_dev branch 2 times, most recently from 78d2276 to 5ff1c23 Compare February 13, 2026 10:23
Signed-off-by: Angazenn <supperccell@163.com>
Signed-off-by: Angazenn <supperccell@163.com>
@whx-sjtu whx-sjtu merged commit 918599f into vllm-project:releases/v0.13.0 Feb 14, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants