GPT OSS Integration Code by hlahkar · Pull Request #887 · vllm-project/vllm-gaudi

hlahkar · 2026-01-27T06:33:03Z

This PR integrates support for the GPT OSS model type, including additions for handling model-specific routing logic, bias support in MoE layers, and attention sink mechanisms for improved inference.

Adds GPT OSS-specific expert routing and softmax handling in the MoE forward pass
Implements bias support throughout the MoE pipeline
Introduces attention sink functionality across attention backends and operations

Copilot

Pull request overview

This PR integrates support for the GPT OSS model type, adding specialized handling for routing logic in MoE layers, bias support throughout the MoE pipeline, and attention sink mechanisms to improve inference performance.

Changes:

Adds GPT OSS-specific expert routing with reversed softmax/topk ordering in the MoE forward pass
Implements bias support across MoE operations (w13_bias and w2_bias) with conditional bias application
Introduces attention sink functionality across multiple attention backends (pipelined, naive, and FSDPA) to enhance attention computation

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
vllm_gaudi/v1/worker/hpu_model_runner.py	Adjusts sliding window block size calculation with +1 offset
vllm_gaudi/ops/hpu_fused_moe.py	Adds GPT OSS routing logic and bias support to MoE operations
vllm_gaudi/extension/utils.py	Extends forward signature to accept sinks parameter
vllm_gaudi/extension/ops.py	Implements attention sink mechanisms in pipelined and prompt attention functions
vllm_gaudi/attention/backends/hpu_attn.py	Adds sink support to attention implementations with dtype conversions

Comments suppressed due to low confidence (1)

vllm_gaudi/ops/hpu_fused_moe.py:1

Variable i is undefined in this context. The variable i is used from the loop that starts at line 660, but this code at line 634 executes before that loop. Use experts_range[0] or iterate through experts_range to access bias attributes.

from functools import partial

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions · 2026-01-28T07:03:47Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

github-actions · 2026-01-28T09:35:44Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

github-actions · 2026-01-29T06:08:24Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Himangshu Lahkar <49579433+hlahkar@users.noreply.github.com>

Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>

github-actions · 2026-02-04T08:12:05Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>

github-actions · 2026-02-04T21:39:20Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
17b17c068453e6dc6af79240bb94857ae175cc51

github-actions · 2026-02-06T11:12:38Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
17b17c068453e6dc6af79240bb94857ae175cc51

Fixes Accuracy Issue in GPTOSS: #887. Updates `apply_monolithic` introduced in #876 to handle gptoss --------- Signed-off-by: Rohit kumar Singh <rksingh@habana.ai> Signed-off-by: Rohit Kumar Singh <9626333+SKRohit@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Fixes Accuracy Issue in GPTOSS: vllm-project#887. Updates `apply_monolithic` introduced in vllm-project#876 to handle gptoss --------- Signed-off-by: Rohit kumar Singh <rksingh@habana.ai> Signed-off-by: Rohit Kumar Singh <9626333+SKRohit@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

This PR integrates support for the GPT OSS model type, including additions for handling model-specific routing logic, bias support in MoE layers, and attention sink mechanisms for improved inference. Adds GPT OSS-specific expert routing and softmax handling in the MoE forward pass Implements bias support throughout the MoE pipeline Introduces attention sink functionality across attention backends and operations --------- Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai> Signed-off-by: Himangshu Lahkar <49579433+hlahkar@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Wojciech Pyszka <wpyszka@habana.ai> Co-authored-by: Michał Kuligowski <michal.kuligowski@intel.com>

Fixes Accuracy Issue in GPTOSS: vllm-project#887. Updates `apply_monolithic` introduced in vllm-project#876 to handle gptoss --------- Signed-off-by: Rohit kumar Singh <rksingh@habana.ai> Signed-off-by: Rohit Kumar Singh <9626333+SKRohit@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

This PR integrates support for the GPT OSS model type, including additions for handling model-specific routing logic, bias support in MoE layers, and attention sink mechanisms for improved inference. Adds GPT OSS-specific expert routing and softmax handling in the MoE forward pass Implements bias support throughout the MoE pipeline Introduces attention sink functionality across attention backends and operations --------- Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai> Signed-off-by: Himangshu Lahkar <49579433+hlahkar@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Wojciech Pyszka <wpyszka@habana.ai> Co-authored-by: Michał Kuligowski <michal.kuligowski@intel.com>

Fixes Accuracy Issue in GPTOSS: #887. Updates `apply_monolithic` introduced in #876 to handle gptoss --------- Signed-off-by: Rohit kumar Singh <rksingh@habana.ai> Signed-off-by: Rohit Kumar Singh <9626333+SKRohit@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

hlahkar requested a review from kzawora-intel as a code owner January 27, 2026 06:33

Copilot AI review requested due to automatic review settings January 27, 2026 06:33

hlahkar requested review from adobrzyn, afierka-intel, iboiko-habana, kamil-kaczor, ksmusz, mgawarkiewicz-intel, michalkuligowski and xuechendi as code owners January 27, 2026 06:33

Copilot AI reviewed Jan 27, 2026

View reviewed changes

Comment thread vllm_gaudi/extension/ops.py Outdated

Comment thread vllm_gaudi/extension/ops.py

Comment thread vllm_gaudi/extension/ops.py

Comment thread vllm_gaudi/ops/hpu_fused_moe.py

Comment thread vllm_gaudi/ops/hpu_fused_moe.py

github-actions Bot mentioned this pull request Jan 27, 2026

🚦 Team Review Dashboard #701

Open

afierka-intel reviewed Jan 28, 2026

View reviewed changes

Comment thread vllm_gaudi/extension/ops.py Outdated

adobrzyn mentioned this pull request Jan 29, 2026

Unable to run GPT-OSS models with vLLM-Gaudi #891

Closed

Himangshu Lahkar and others added 10 commits February 4, 2026 07:58

gpt oss implementation

cd26b8f

Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>

Update vllm_gaudi/extension/ops.py

6155309

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Himangshu Lahkar <49579433+hlahkar@users.noreply.github.com>

fix bias

7dcea73

Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>

fix pre commit

4f5cca3

Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>

fix pre commit ruff

7e81bdd

Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>

fix ci tests

342a138

Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>

fix dtype

7b36cbf

Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>

fix attention without sink

16192b2

Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>

fix sliding window block calc

80262d5

Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>

pre commit fix

efed397

Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>

hlahkar force-pushed the latest_gpt_oss branch from c22d4ef to efed397 Compare February 4, 2026 06:01

sliding window only for gpt-oss

c2a619a

Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>

hlahkar and others added 5 commits February 4, 2026 13:42

Merge branch 'main' into latest_gpt_oss

bc0264e

sliding window only for gpt-oss

b1217f3

Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>

fix precommit

4903174

Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>

Merge branch 'main' into latest_gpt_oss

d5efa83

fix precommit

ebef2a8

Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>

hlahkar mentioned this pull request Feb 5, 2026

GPT OSS Integration Code #771

Closed

Merge branch 'main' into latest_gpt_oss

6224456

Merge branch 'main' into latest_gpt_oss

5e05571

michalkuligowski approved these changes Feb 10, 2026

View reviewed changes

Merge branch 'main' into latest_gpt_oss

1b6f236

michalkuligowski enabled auto-merge (squash) February 10, 2026 08:46

Merge branch 'main' into latest_gpt_oss

89bac96

michalkuligowski merged commit c1dccf3 into vllm-project:main Feb 10, 2026
12 of 13 checks passed

SKRohit mentioned this pull request Feb 12, 2026

Fix Topk Calculation in GPTOSS #970

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPT OSS Integration Code#887

GPT OSS Integration Code#887
michalkuligowski merged 20 commits intovllm-project:mainfrom
hlahkar:latest_gpt_oss

hlahkar commented Jan 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jan 28, 2026

Uh oh!

github-actions Bot commented Jan 28, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jan 29, 2026

Uh oh!

github-actions Bot commented Feb 4, 2026

Uh oh!

github-actions Bot commented Feb 4, 2026

Uh oh!

github-actions Bot commented Feb 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

hlahkar commented Jan 27, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jan 28, 2026

🚧 CI Blocked

Uh oh!

github-actions Bot commented Jan 28, 2026

🚧 CI Blocked

Uh oh!

Uh oh!

github-actions Bot commented Jan 29, 2026

🚧 CI Blocked

Uh oh!

github-actions Bot commented Feb 4, 2026

🚧 CI Blocked

Uh oh!

github-actions Bot commented Feb 4, 2026

✅ CI Passed

Uh oh!

github-actions Bot commented Feb 6, 2026

✅ CI Passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants