Skip to content

[XPU] Add deepseek_scaling_rope fused kernel#36612

Merged
jikunshang merged 8 commits intovllm-project:mainfrom
yitingw1:dev/xpu_customop_deepseek_rope
Mar 16, 2026
Merged

[XPU] Add deepseek_scaling_rope fused kernel#36612
jikunshang merged 8 commits intovllm-project:mainfrom
yitingw1:dev/xpu_customop_deepseek_rope

Conversation

@yitingw1
Copy link
Copy Markdown
Contributor

@yitingw1 yitingw1 commented Mar 10, 2026

Purpose

[XPU] Add the usage of the fused deepseek_scaling_rope kernel in PR for DeepseekScalingRotaryEmbedding. Previously, it ran with forward_native().

Test Plan

Test Result

Verified lm_eval with 4xBMG for DeepSeek-V2-Lite-Chat functionality locally.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: yitingw1 <yiting.wang@intel.com>
@mergify mergify bot added the deepseek Related to DeepSeek models label Mar 10, 2026
@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@jikunshang jikunshang requested a review from Copilot March 10, 2026 09:26
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a fused kernel for deepseek_scaling_rope on XPU to improve performance. The changes involve registering a new custom PyTorch operation and integrating it into the DeepseekScalingRotaryEmbedding layer. My review has identified a few critical issues related to thread safety, correctness of the operation registration, and potential runtime errors due to an uninitialized cache. I've also pointed out some type hint inconsistencies that could affect static analysis and torch.compile.

Copy link
Copy Markdown
Collaborator

@jikunshang jikunshang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a fused XPU kernel for DeepseekScalingRotaryEmbedding, replacing the previous forward_native() fallback with a dedicated forward_xpu() method that calls a custom registered op backed by torch.ops._xpu_C.deepseek_scaling_rope.

Changes:

  • Adds forward_xpu() method to DeepseekScalingRotaryEmbedding that delegates to the new fused XPU op.
  • Registers xpu_ops_deepseek_scaling_rope as a custom torch op with a fake implementation for tracing.
  • Introduces a module-level _OPS_REGISTERED guard and register_ops_once() to prevent duplicate registration.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
vllm/model_executor/layers/rotary_embedding/deepseek_scaling_rope.py Adds forward_xpu() method wiring the class to the new fused kernel op
vllm/_xpu_ops.py Implements and registers the xpu_ops_deepseek_scaling_rope custom op with impl and fake functions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: yitingw1 <yiting.wang@intel.com>
@jikunshang
Copy link
Copy Markdown
Collaborator

Please resolve conflicts, thanks.

@mergify
Copy link
Copy Markdown

mergify bot commented Mar 11, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @yitingw1.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 11, 2026
…seek_rope

Signed-off-by: yitingw1 <yiting.wang@intel.com>
@mergify mergify bot removed the needs-rebase label Mar 12, 2026
@yitingw1
Copy link
Copy Markdown
Contributor Author

Please resolve conflicts, thanks.

Done.

@jikunshang jikunshang added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 12, 2026
@jikunshang jikunshang merged commit 68e1b71 into vllm-project:main Mar 16, 2026
55 checks passed
Lucaskabela pushed a commit to Lucaskabela/vllm that referenced this pull request Mar 17, 2026
Signed-off-by: yitingw1 <yiting.wang@intel.com>
wendyliu235 pushed a commit to wendyliu235/vllm-public that referenced this pull request Mar 18, 2026
Signed-off-by: yitingw1 <yiting.wang@intel.com>
fxdawnn pushed a commit to fxdawnn/vllm that referenced this pull request Mar 19, 2026
Signed-off-by: yitingw1 <yiting.wang@intel.com>
khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 27, 2026
Signed-off-by: yitingw1 <yiting.wang@intel.com>
Monishver11 pushed a commit to Monishver11/vllm that referenced this pull request Mar 27, 2026
Signed-off-by: yitingw1 <yiting.wang@intel.com>
Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deepseek Related to DeepSeek models ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants