[XPU] Add deepseek_scaling_rope fused kernel#36612
[XPU] Add deepseek_scaling_rope fused kernel#36612jikunshang merged 8 commits intovllm-project:mainfrom
Conversation
Signed-off-by: yitingw1 <yiting.wang@intel.com>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
There was a problem hiding this comment.
Code Review
This pull request introduces a fused kernel for deepseek_scaling_rope on XPU to improve performance. The changes involve registering a new custom PyTorch operation and integrating it into the DeepseekScalingRotaryEmbedding layer. My review has identified a few critical issues related to thread safety, correctness of the operation registration, and potential runtime errors due to an uninitialized cache. I've also pointed out some type hint inconsistencies that could affect static analysis and torch.compile.
vllm/model_executor/layers/rotary_embedding/deepseek_scaling_rope.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
This PR adds a fused XPU kernel for DeepseekScalingRotaryEmbedding, replacing the previous forward_native() fallback with a dedicated forward_xpu() method that calls a custom registered op backed by torch.ops._xpu_C.deepseek_scaling_rope.
Changes:
- Adds
forward_xpu()method toDeepseekScalingRotaryEmbeddingthat delegates to the new fused XPU op. - Registers
xpu_ops_deepseek_scaling_ropeas a custom torch op with a fake implementation for tracing. - Introduces a module-level
_OPS_REGISTEREDguard andregister_ops_once()to prevent duplicate registration.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
vllm/model_executor/layers/rotary_embedding/deepseek_scaling_rope.py |
Adds forward_xpu() method wiring the class to the new fused kernel op |
vllm/_xpu_ops.py |
Implements and registers the xpu_ops_deepseek_scaling_rope custom op with impl and fake functions |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Please resolve conflicts, thanks. |
|
This pull request has merge conflicts that must be resolved before it can be |
…seek_rope Signed-off-by: yitingw1 <yiting.wang@intel.com>
Done. |
…seek_rope Signed-off-by: yitingw1 <yiting.wang@intel.com>
Signed-off-by: yitingw1 <yiting.wang@intel.com>
Signed-off-by: yitingw1 <yiting.wang@intel.com>
Signed-off-by: yitingw1 <yiting.wang@intel.com>
Signed-off-by: yitingw1 <yiting.wang@intel.com>
Signed-off-by: yitingw1 <yiting.wang@intel.com> Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>
Purpose
[XPU] Add the usage of the fused deepseek_scaling_rope kernel in PR for DeepseekScalingRotaryEmbedding. Previously, it ran with forward_native().
Test Plan
Test Result
Verified lm_eval with 4xBMG for DeepSeek-V2-Lite-Chat functionality locally.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.