Skip to content

GPT OSS Integration Code#887

Merged
michalkuligowski merged 20 commits intovllm-project:mainfrom
hlahkar:latest_gpt_oss
Feb 10, 2026
Merged

GPT OSS Integration Code#887
michalkuligowski merged 20 commits intovllm-project:mainfrom
hlahkar:latest_gpt_oss

Conversation

@hlahkar
Copy link
Copy Markdown
Contributor

@hlahkar hlahkar commented Jan 27, 2026

This PR integrates support for the GPT OSS model type, including additions for handling model-specific routing logic, bias support in MoE layers, and attention sink mechanisms for improved inference.

Adds GPT OSS-specific expert routing and softmax handling in the MoE forward pass
Implements bias support throughout the MoE pipeline
Introduces attention sink functionality across attention backends and operations

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR integrates support for the GPT OSS model type, adding specialized handling for routing logic in MoE layers, bias support throughout the MoE pipeline, and attention sink mechanisms to improve inference performance.

Changes:

  • Adds GPT OSS-specific expert routing with reversed softmax/topk ordering in the MoE forward pass
  • Implements bias support across MoE operations (w13_bias and w2_bias) with conditional bias application
  • Introduces attention sink functionality across multiple attention backends (pipelined, naive, and FSDPA) to enhance attention computation

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
vllm_gaudi/v1/worker/hpu_model_runner.py Adjusts sliding window block size calculation with +1 offset
vllm_gaudi/ops/hpu_fused_moe.py Adds GPT OSS routing logic and bias support to MoE operations
vllm_gaudi/extension/utils.py Extends forward signature to accept sinks parameter
vllm_gaudi/extension/ops.py Implements attention sink mechanisms in pipelined and prompt attention functions
vllm_gaudi/attention/backends/hpu_attn.py Adds sink support to attention implementations with dtype conversions
Comments suppressed due to low confidence (1)

vllm_gaudi/ops/hpu_fused_moe.py:1

  • Variable i is undefined in this context. The variable i is used from the loop that starts at line 660, but this code at line 634 executes before that loop. Use experts_range[0] or iterate through experts_range to access bias attributes.
from functools import partial

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread vllm_gaudi/extension/ops.py Outdated
Comment thread vllm_gaudi/extension/ops.py
Comment thread vllm_gaudi/extension/ops.py
Comment thread vllm_gaudi/ops/hpu_fused_moe.py
Comment thread vllm_gaudi/ops/hpu_fused_moe.py
@github-actions
Copy link
Copy Markdown

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

1 similar comment
@github-actions
Copy link
Copy Markdown

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

Comment thread vllm_gaudi/extension/ops.py Outdated
@github-actions
Copy link
Copy Markdown

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

Himangshu Lahkar and others added 10 commits February 4, 2026 07:58
Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Himangshu Lahkar <49579433+hlahkar@users.noreply.github.com>
Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>
Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>
Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>
Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>
Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>
Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>
Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>
Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>
Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 4, 2026

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

hlahkar and others added 5 commits February 4, 2026 13:42
Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>
Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>
Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 4, 2026

✅ CI Passed

All checks passed successfully against the following vllm commit:
17b17c068453e6dc6af79240bb94857ae175cc51

@hlahkar hlahkar mentioned this pull request Feb 5, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 6, 2026

✅ CI Passed

All checks passed successfully against the following vllm commit:
17b17c068453e6dc6af79240bb94857ae175cc51

@michalkuligowski michalkuligowski enabled auto-merge (squash) February 10, 2026 08:46
@michalkuligowski michalkuligowski merged commit c1dccf3 into vllm-project:main Feb 10, 2026
12 of 13 checks passed
iboiko-habana pushed a commit that referenced this pull request Feb 17, 2026
Fixes Accuracy Issue in GPTOSS:
#887. Updates
`apply_monolithic` introduced in
#876 to handle gptoss

---------

Signed-off-by: Rohit kumar Singh <rksingh@habana.ai>
Signed-off-by: Rohit Kumar Singh <9626333+SKRohit@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
SKRohit added a commit to SKRohit/vllm-gaudi that referenced this pull request Feb 17, 2026
Fixes Accuracy Issue in GPTOSS:
vllm-project#887. Updates
`apply_monolithic` introduced in
vllm-project#876 to handle gptoss

---------

Signed-off-by: Rohit kumar Singh <rksingh@habana.ai>
Signed-off-by: Rohit Kumar Singh <9626333+SKRohit@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
gyou2021 pushed a commit to gyou2021/vllm-gaudi that referenced this pull request Feb 21, 2026
This PR integrates support for the GPT OSS model type, including
additions for handling model-specific routing logic, bias support in MoE
layers, and attention sink mechanisms for improved inference.

Adds GPT OSS-specific expert routing and softmax handling in the MoE
forward pass
Implements bias support throughout the MoE pipeline
Introduces attention sink functionality across attention backends and
operations

---------

Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>
Signed-off-by: Himangshu Lahkar <49579433+hlahkar@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Wojciech Pyszka <wpyszka@habana.ai>
Co-authored-by: Michał Kuligowski <michal.kuligowski@intel.com>
gyou2021 pushed a commit to gyou2021/vllm-gaudi that referenced this pull request Feb 21, 2026
Fixes Accuracy Issue in GPTOSS:
vllm-project#887. Updates
`apply_monolithic` introduced in
vllm-project#876 to handle gptoss

---------

Signed-off-by: Rohit kumar Singh <rksingh@habana.ai>
Signed-off-by: Rohit Kumar Singh <9626333+SKRohit@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
adobrzyn pushed a commit that referenced this pull request Mar 31, 2026
This PR integrates support for the GPT OSS model type, including
additions for handling model-specific routing logic, bias support in MoE
layers, and attention sink mechanisms for improved inference.

Adds GPT OSS-specific expert routing and softmax handling in the MoE
forward pass
Implements bias support throughout the MoE pipeline
Introduces attention sink functionality across attention backends and
operations

---------

Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>
Signed-off-by: Himangshu Lahkar <49579433+hlahkar@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Wojciech Pyszka <wpyszka@habana.ai>
Co-authored-by: Michał Kuligowski <michal.kuligowski@intel.com>
adobrzyn pushed a commit that referenced this pull request Mar 31, 2026
Fixes Accuracy Issue in GPTOSS:
#887. Updates
`apply_monolithic` introduced in
#876 to handle gptoss

---------

Signed-off-by: Rohit kumar Singh <rksingh@habana.ai>
Signed-off-by: Rohit Kumar Singh <9626333+SKRohit@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants