Fix Topk Calculation in GPTOSS by SKRohit · Pull Request #970 · vllm-project/vllm-gaudi

SKRohit · 2026-02-12T18:02:01Z

Fixes Accuracy Issue in GPTOSS: #887. Updates apply_monolithic introduced in #876 to handle gptoss

Copilot

Pull request overview

Adjusts MoE router top-k selection to handle GPTOSS (gpt_oss) routing correctly by selecting experts from logits before applying softmax.

Changes:

Adds a gpt_oss-specific routing path that applies topk on logits and then softmaxes the selected logits.
Keeps existing behavior for other model types (softmax over all experts, then topk, then renormalize).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-12T18:03:00Z

+            if self.model_type is not None and self.model_type in ["gpt_oss"]:
+                topk_weights, topk_ids = torch.topk(router_logits, layer.top_k, dim=-1)
+                topk_weights = F.softmax(topk_weights, dim=-1, dtype=torch.float32)
+            else:
+                topk_weights = F.softmax(router_logits, dim=1, dtype=torch.float32)
+                topk_weights, topk_ids = torch.topk(topk_weights, layer.top_k, dim=-1)
+                topk_weights /= topk_weights.sum(dim=-1, keepdim=True)


The new gpt_oss branch uses softmax(..., dim=-1) while the existing path uses softmax(..., dim=1). This introduces inconsistent normalization across model types and can produce incorrect routing if router_logits is not strictly 2D (or if the expert dimension isn’t 1). Consider using the same expert-dimension for both branches (typically dim=-1), or deriving the expert dimension once and reusing it for softmax/topk.

Suggested change

if self.model_type is not None and self.model_type in ["gpt_oss"]:

topk_weights, topk_ids = torch.topk(router_logits, layer.top_k, dim=-1)

topk_weights = F.softmax(topk_weights, dim=-1, dtype=torch.float32)

else:

topk_weights = F.softmax(router_logits, dim=1, dtype=torch.float32)

topk_weights, topk_ids = torch.topk(topk_weights, layer.top_k, dim=-1)

topk_weights /= topk_weights.sum(dim=-1, keepdim=True)

expert_dim = router_logits.dim() - 1

if self.model_type is not None and self.model_type in ["gpt_oss"]:

topk_weights, topk_ids = torch.topk(router_logits, layer.top_k, dim=expert_dim)

topk_weights = F.softmax(topk_weights, dim=expert_dim, dtype=torch.float32)

else:

topk_weights = F.softmax(router_logits, dim=expert_dim, dtype=torch.float32)

topk_weights, topk_ids = torch.topk(topk_weights, layer.top_k, dim=expert_dim)

topk_weights /= topk_weights.sum(dim=expert_dim, keepdim=True)

I know. Just changing dim=-1 in softmax would resolve the issue. But for some model it might be required to apply softmax on second dimension, so I did not change it.

github-actions · 2026-02-12T22:31:03Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
17b17c068453e6dc6af79240bb94857ae175cc51

iboiko-habana · 2026-02-13T10:36:43Z

@SKRohit do we need to merge this PR?

SKRohit · 2026-02-13T11:10:59Z

@iboiko-habana i am running few tests will let you once changes are ready for merge.

github-actions · 2026-02-16T04:56:32Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

SKRohit · 2026-02-16T12:03:32Z

@iboiko-habana I have verified the changes. PR is ready to merge from my side.

github-actions · 2026-02-16T14:13:43Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
17b17c068453e6dc6af79240bb94857ae175cc51

github-actions · 2026-02-16T15:03:53Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

Signed-off-by: Rohit kumar Singh <rksingh@habana.ai>

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Rohit Kumar Singh <9626333+SKRohit@users.noreply.github.com> Signed-off-by: Rohit kumar Singh <rksingh@habana.ai>

github-actions · 2026-02-16T19:28:19Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
17b17c068453e6dc6af79240bb94857ae175cc51

SKRohit · 2026-02-17T05:15:26Z

@iboiko-habana Can we merge?

Fixes Accuracy Issue in GPTOSS: vllm-project#887. Updates `apply_monolithic` introduced in vllm-project#876 to handle gptoss --------- Signed-off-by: Rohit kumar Singh <rksingh@habana.ai> Signed-off-by: Rohit Kumar Singh <9626333+SKRohit@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

jaideepsai-narayan · 2026-03-11T06:17:57Z

Hi @iboiko-habana @SKRohit , we encountered this issue (#891 (comment)) earlier. However, with the new branch, we are again seeing a drop in accuracy when using the Unsloth version of GPTOSS.

Fixes Accuracy Issue in GPTOSS: #887. Updates `apply_monolithic` introduced in #876 to handle gptoss --------- Signed-off-by: Rohit kumar Singh <rksingh@habana.ai> Signed-off-by: Rohit Kumar Singh <9626333+SKRohit@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

SKRohit requested a review from xuechendi as a code owner February 12, 2026 18:02

Copilot AI review requested due to automatic review settings February 12, 2026 18:02

SKRohit requested review from adobrzyn, afierka-intel, iboiko-habana, kamil-kaczor, ksmusz, mgawarkiewicz-intel and michalkuligowski as code owners February 12, 2026 18:02

Copilot AI reviewed Feb 12, 2026

View reviewed changes

SKRohit marked this pull request as draft February 12, 2026 18:14

iboiko-habana approved these changes Feb 13, 2026

View reviewed changes

SKRohit marked this pull request as ready for review February 16, 2026 11:34

github-actions Bot mentioned this pull request Feb 16, 2026

🚦 Team Review Dashboard #701

Open

SKRohit force-pushed the fix_gptoss_moe branch from a9c538f to 65ccd95 Compare February 16, 2026 15:03

SKRohit force-pushed the fix_gptoss_moe branch from 65ccd95 to 2d2838c Compare February 16, 2026 15:05

SKRohit and others added 2 commits February 16, 2026 22:25

Fix Topk Calculation in GPTOSS

c14d16c

Signed-off-by: Rohit kumar Singh <rksingh@habana.ai>

Update vllm_gaudi/ops/hpu_fused_moe.py

40c3d7f

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Rohit Kumar Singh <9626333+SKRohit@users.noreply.github.com> Signed-off-by: Rohit kumar Singh <rksingh@habana.ai>

SKRohit force-pushed the fix_gptoss_moe branch from 2d2838c to 40c3d7f Compare February 16, 2026 16:56

iboiko-habana approved these changes Feb 17, 2026

View reviewed changes

iboiko-habana merged commit f0a883d into vllm-project:main Feb 17, 2026
61 checks passed

Conversation

SKRohit commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

SKRohit Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Feb 12, 2026

✅ CI Passed

Uh oh!

iboiko-habana commented Feb 13, 2026

Uh oh!

SKRohit commented Feb 13, 2026

Uh oh!

github-actions Bot commented Feb 16, 2026

🚧 CI Blocked

Uh oh!

SKRohit commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Feb 16, 2026

✅ CI Passed

Uh oh!

github-actions Bot commented Feb 16, 2026

🚧 CI Blocked

Uh oh!

github-actions Bot commented Feb 16, 2026

✅ CI Passed

Uh oh!

SKRohit commented Feb 17, 2026

Uh oh!

Uh oh!

jaideepsai-narayan commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SKRohit commented Feb 12, 2026 •

edited

Loading

SKRohit commented Feb 16, 2026 •

edited

Loading

jaideepsai-narayan commented Mar 11, 2026 •

edited

Loading