Skip to content

Patch Grouped Topk#708

Merged
xuechendi merged 2 commits intovllm-project:mainfrom
xinyu-intel:dev/xinyu/patch-topk
Dec 12, 2025
Merged

Patch Grouped Topk#708
xuechendi merged 2 commits intovllm-project:mainfrom
xinyu-intel:dev/xinyu/patch-topk

Conversation

@xinyu-intel
Copy link
Copy Markdown
Contributor

No description provided.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR patches the grouped top-k implementation for MoE (Mixture of Experts) operations in vLLM on HPU, addressing dtype conversion behavior based on whether grouped top-k is enabled.

  • Adds conditional dtype conversion logic based on use_grouped_topk flag
  • Implements a patched grouped_topk function with batch invariance support and e_score_correction_bias handling
  • Applies the patch to the vLLM library's fused_moe layer module

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
vllm_gaudi/ops/hpu_fused_moe.py Adds conditional dtype conversion, implements patched_grouped_topk function, and applies the grouped_topk patch to vllm module
vllm_gaudi/ops/hpu_fp8.py Adds conditional dtype conversion for FP8 operations based on use_grouped_topk flag

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +67 to +69
if not layer.use_grouped_topk:
topk_ids = topk_ids.to(torch.int64)
topk_weights = topk_weights.to(x.dtype)
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dtype conversions for topk_ids and topk_weights are now duplicated - they appear both before line 67 (lines 63-64) and within this conditional block (lines 68-69). When use_grouped_topk is False, these conversions happen twice unnecessarily. Consider moving the earlier conversions (lines 63-64) into an else block, or removing the duplicate logic.

Copilot uses AI. Check for mistakes.
Comment thread vllm_gaudi/ops/hpu_fp8.py
topk_weights = topk_weights.view(*x.shape[:-1], -1)
if not layer.use_grouped_topk:
topk_ids = topk_ids.to(torch.int64)
topk_weights = topk_weights.to(x.dtype)
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dtype conversions for topk_ids and topk_weights are duplicated - they appear both before line 163 (lines 159-160) and within this conditional block (lines 164-165). When use_grouped_topk is False, these conversions happen twice unnecessarily. Consider moving the earlier conversions (lines 159-160) into an else block, or removing the duplicate logic.

Suggested change
topk_weights = topk_weights.to(x.dtype)

Copilot uses AI. Check for mistakes.
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
@github-actions
Copy link
Copy Markdown

✅ CI Passed

All checks passed successfully against the following vllm commit:
7618dc973dd1e56a46162bc7bd6e7625143bead0

@xuechendi xuechendi self-assigned this Dec 12, 2025
@xuechendi xuechendi merged commit c03ca8d into vllm-project:main Dec 12, 2025
46 checks passed
adobrzyn pushed a commit that referenced this pull request Mar 31, 2026
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants