Skip to content

Fix Topk Calculation in GPTOSS#970

Merged
iboiko-habana merged 2 commits intovllm-project:mainfrom
SKRohit:fix_gptoss_moe
Feb 17, 2026
Merged

Fix Topk Calculation in GPTOSS#970
iboiko-habana merged 2 commits intovllm-project:mainfrom
SKRohit:fix_gptoss_moe

Conversation

@SKRohit
Copy link
Copy Markdown
Contributor

@SKRohit SKRohit commented Feb 12, 2026

Fixes Accuracy Issue in GPTOSS: #887. Updates apply_monolithic introduced in #876 to handle gptoss

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adjusts MoE router top-k selection to handle GPTOSS (gpt_oss) routing correctly by selecting experts from logits before applying softmax.

Changes:

  • Adds a gpt_oss-specific routing path that applies topk on logits and then softmaxes the selected logits.
  • Keeps existing behavior for other model types (softmax over all experts, then topk, then renormalize).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread vllm_gaudi/ops/hpu_fused_moe.py Outdated
Comment on lines +70 to +76
if self.model_type is not None and self.model_type in ["gpt_oss"]:
topk_weights, topk_ids = torch.topk(router_logits, layer.top_k, dim=-1)
topk_weights = F.softmax(topk_weights, dim=-1, dtype=torch.float32)
else:
topk_weights = F.softmax(router_logits, dim=1, dtype=torch.float32)
topk_weights, topk_ids = torch.topk(topk_weights, layer.top_k, dim=-1)
topk_weights /= topk_weights.sum(dim=-1, keepdim=True)
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new gpt_oss branch uses softmax(..., dim=-1) while the existing path uses softmax(..., dim=1). This introduces inconsistent normalization across model types and can produce incorrect routing if router_logits is not strictly 2D (or if the expert dimension isn’t 1). Consider using the same expert-dimension for both branches (typically dim=-1), or deriving the expert dimension once and reusing it for softmax/topk.

Suggested change
if self.model_type is not None and self.model_type in ["gpt_oss"]:
topk_weights, topk_ids = torch.topk(router_logits, layer.top_k, dim=-1)
topk_weights = F.softmax(topk_weights, dim=-1, dtype=torch.float32)
else:
topk_weights = F.softmax(router_logits, dim=1, dtype=torch.float32)
topk_weights, topk_ids = torch.topk(topk_weights, layer.top_k, dim=-1)
topk_weights /= topk_weights.sum(dim=-1, keepdim=True)
expert_dim = router_logits.dim() - 1
if self.model_type is not None and self.model_type in ["gpt_oss"]:
topk_weights, topk_ids = torch.topk(router_logits, layer.top_k, dim=expert_dim)
topk_weights = F.softmax(topk_weights, dim=expert_dim, dtype=torch.float32)
else:
topk_weights = F.softmax(router_logits, dim=expert_dim, dtype=torch.float32)
topk_weights, topk_ids = torch.topk(topk_weights, layer.top_k, dim=expert_dim)
topk_weights /= topk_weights.sum(dim=expert_dim, keepdim=True)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know. Just changing dim=-1 in softmax would resolve the issue. But for some model it might be required to apply softmax on second dimension, so I did not change it.

Comment thread vllm_gaudi/ops/hpu_fused_moe.py Outdated
@SKRohit SKRohit marked this pull request as draft February 12, 2026 18:14
@github-actions
Copy link
Copy Markdown

✅ CI Passed

All checks passed successfully against the following vllm commit:
17b17c068453e6dc6af79240bb94857ae175cc51

@iboiko-habana
Copy link
Copy Markdown
Collaborator

@SKRohit do we need to merge this PR?

@SKRohit
Copy link
Copy Markdown
Contributor Author

SKRohit commented Feb 13, 2026

@iboiko-habana i am running few tests will let you once changes are ready for merge.

@github-actions
Copy link
Copy Markdown

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

@SKRohit SKRohit marked this pull request as ready for review February 16, 2026 11:34
@SKRohit
Copy link
Copy Markdown
Contributor Author

SKRohit commented Feb 16, 2026

@iboiko-habana I have verified the changes. PR is ready to merge from my side.

@github-actions
Copy link
Copy Markdown

✅ CI Passed

All checks passed successfully against the following vllm commit:
17b17c068453e6dc6af79240bb94857ae175cc51

@github-actions
Copy link
Copy Markdown

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

SKRohit and others added 2 commits February 16, 2026 22:25
Signed-off-by: Rohit kumar Singh <rksingh@habana.ai>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Rohit Kumar Singh <9626333+SKRohit@users.noreply.github.com>
Signed-off-by: Rohit kumar Singh <rksingh@habana.ai>
@github-actions
Copy link
Copy Markdown

✅ CI Passed

All checks passed successfully against the following vllm commit:
17b17c068453e6dc6af79240bb94857ae175cc51

@SKRohit
Copy link
Copy Markdown
Contributor Author

SKRohit commented Feb 17, 2026

@iboiko-habana Can we merge?

@iboiko-habana iboiko-habana merged commit f0a883d into vllm-project:main Feb 17, 2026
61 checks passed
SKRohit added a commit to SKRohit/vllm-gaudi that referenced this pull request Feb 17, 2026
Fixes Accuracy Issue in GPTOSS:
vllm-project#887. Updates
`apply_monolithic` introduced in
vllm-project#876 to handle gptoss

---------

Signed-off-by: Rohit kumar Singh <rksingh@habana.ai>
Signed-off-by: Rohit Kumar Singh <9626333+SKRohit@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
gyou2021 pushed a commit to gyou2021/vllm-gaudi that referenced this pull request Feb 21, 2026
Fixes Accuracy Issue in GPTOSS:
vllm-project#887. Updates
`apply_monolithic` introduced in
vllm-project#876 to handle gptoss

---------

Signed-off-by: Rohit kumar Singh <rksingh@habana.ai>
Signed-off-by: Rohit Kumar Singh <9626333+SKRohit@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@jaideepsai-narayan
Copy link
Copy Markdown

jaideepsai-narayan commented Mar 11, 2026

Hi @iboiko-habana @SKRohit , we encountered this issue (#891 (comment)) earlier. However, with the new branch, we are again seeing a drop in accuracy when using the Unsloth version of GPTOSS.

image

adobrzyn pushed a commit that referenced this pull request Mar 31, 2026
Fixes Accuracy Issue in GPTOSS:
#887. Updates
`apply_monolithic` introduced in
#876 to handle gptoss

---------

Signed-off-by: Rohit kumar Singh <rksingh@habana.ai>
Signed-off-by: Rohit Kumar Singh <9626333+SKRohit@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants