Skip to content

GPT OSS Integration Code#771

Closed
hlahkar wants to merge 25 commits intovllm-project:mainfrom
hlahkar:gpt_oss_latest
Closed

GPT OSS Integration Code#771
hlahkar wants to merge 25 commits intovllm-project:mainfrom
hlahkar:gpt_oss_latest

Conversation

@hlahkar
Copy link
Copy Markdown
Contributor

@hlahkar hlahkar commented Jan 2, 2026

No description provided.

Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR integrates support for the GPT OSS model type, including additions for handling model-specific routing logic, bias support in MoE layers, and attention sink mechanisms for improved inference.

  • Adds GPT OSS-specific expert routing and softmax handling in the MoE forward pass
  • Implements bias support throughout the MoE pipeline
  • Introduces attention sink functionality across attention backends and operations

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
vllm_gaudi/v1/worker/hpu_model_runner.py Increases sliding window block size calculation by 1
vllm_gaudi/ops/hpu_fused_moe.py Adds GPT OSS model type detection, bias handling in MoE operations, and model-specific expert routing
vllm_gaudi/extension/utils.py Adds sinks parameter support to forward pass
vllm_gaudi/extension/ops.py Implements sink attention mechanism in pipelined and naive attention functions, adds bias support to MoE operations
vllm_gaudi/attention/backends/hpu_attn.py Adds sinks parameter and dtype consistency checks in attention implementation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread vllm_gaudi/extension/ops.py
@hlahkar hlahkar mentioned this pull request Jan 2, 2026
Himangshu Lahkar added 3 commits January 2, 2026 04:36
Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>
Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>
Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>
Himangshu Lahkar and others added 5 commits January 2, 2026 06:59
@github-actions
Copy link
Copy Markdown

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

@wpyszka
Copy link
Copy Markdown
Collaborator

wpyszka commented Jan 19, 2026

/run-gaudi-tests

wpyszka and others added 3 commits January 19, 2026 13:29
Signed-off-by: Himangshu Lahkar <49579433+hlahkar@users.noreply.github.com>
Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>
@hlahkar
Copy link
Copy Markdown
Contributor Author

hlahkar commented Jan 22, 2026

/run-gaudi-tests

@sys-hab-pt-service
Copy link
Copy Markdown
Collaborator

Only codeowners and testowners can request to run Gaudi tests. Contact list: kzawora-intel, xuechendi, adobrzyn, mgawarkiewicz-intel, afierka-intel, michalkuligowski, iboiko-habana, kamil-kaczor, ksmusz, PatrykWo, kamil-kaczor, kfojcik-intel, ksmusz, wuxun-zhang, xuechendi, attafosu, ulivne, Kacper-Pietkun, iboiko-habana, jkaniecki, jbyczkow, wpyszka

hlahkar and others added 3 commits January 23, 2026 06:07
Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>
Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>
@github-actions
Copy link
Copy Markdown

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

@wpyszka
Copy link
Copy Markdown
Collaborator

wpyszka commented Jan 23, 2026

/run-gaudi-tests

Himangshu Lahkar and others added 5 commits January 27, 2026 08:07
Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>
…llm-project#855)

Llama4 for `max_model_len > 32k` enable temperature adjustment
https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/llama4.py#L719.
Enabled adjustment causes tensor `q` shape modification from 2D to 3D:
https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/llama4.py#L307.
This tensor is passing to `UnqnatizedFusedMoEMetod -> forward`:
https://github.com/vllm-project/vllm-gaudi/blob/main/vllm_gaudi/ops/hpu_fused_moe.py#L163
causing invalid reshaping - we trying to return a 3D `output.view` based
on 2D output tensor.

Found that following PR introduced the bug: vllm-project#680 and vllm-project#684

Cherry-picked from `releases/v0.13.0`

---------

Signed-off-by: Artur Fierka <artur.fierka@intel.com>
…vllm-project#852)

Signed-off-by: Dudi Lester <dlester@habana.ai>
Co-authored-by: Kamil Kaczor <kamil.kaczor@intel.com>
Reverts vllm-project#780

---------

Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>
Co-authored-by: Chendi.Xue <chendi.xue@intel.com>
Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>
@github-actions
Copy link
Copy Markdown

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

@wpyszka
Copy link
Copy Markdown
Collaborator

wpyszka commented Jan 27, 2026

/run-gaudi-tests

@hlahkar
Copy link
Copy Markdown
Contributor Author

hlahkar commented Feb 5, 2026

PR is taken care through #887

@hlahkar hlahkar closed this Feb 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants