Initial Commit GPT-OSS by hlahkar · Pull Request #485 · vllm-project/vllm-gaudi

hlahkar · 2025-10-28T04:29:42Z

This enables GPT OSS with naive attention. Features enabled:

Sinks in Attention
Bias in MoE

Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>

michalkuligowski · 2025-11-04T08:07:17Z

+            if self.bias is not None:
+                w1_bias_list = [self.w13_list[i].bias.squeeze() for i in experts_range]
+                w2_bias_list = [self.w2_list[i].bias.squeeze() for i in experts_range]
+                return torch.ops.hpu.mixture_of_experts.bias_fused_weights(hidden_states=hidden_states,


Test fails with:
"The underlying op of 'hpu.mixture_of_experts' has no overload name 'bias_fused_weights'. Did you mean: 'fp8_fused_weights'" please fix

The CI is on 1.22.0; this needs 1.23.0 software, that's the reason it's failing; we can merge this only after CI moves to 1.23.0 release

Copilot

Pull Request Overview

This PR enables GPT-OSS model support with two main features: attention sinks for improved context handling and bias support in Mixture of Experts (MoE) layers.

Key Changes:

Added sink attention mechanism to handle long-context scenarios across naive, FSDPA, and flat attention implementations
Implemented bias support in MoE operations for models requiring biased expert computations
Added model-specific routing logic for GPT-OSS in the MoE forward pass

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
vllm_gaudi/ops/hpu_fused_moe.py	Added bias handling in MoE layers and GPT-OSS specific router weight processing
vllm_gaudi/extension/utils.py	Extended FSDPA forward method to accept sinks parameter
vllm_gaudi/extension/ops.py	Implemented sink attention logic across multiple attention implementations and added bias support to MoE operations
vllm_gaudi/attention/backends/hpu_attn.py	Added sinks parameter to attention implementations with validation and dtype conversion
tests/unit_tests/sinks/test_gpt_oss.py	Added integration test for GPT-OSS model with expected outputs

Comments suppressed due to low confidence (2)

vllm_gaudi/attention/backends/hpu_attn.py:1

Missing space after '#' in comment. Should be '# causal' for proper comment formatting.

# SPDX-License-Identifier: Apache-2.0

vllm_gaudi/attention/backends/hpu_attn.py:1

Inconsistent TODO format: should be 'TODO:' with a colon instead of a dash.

# SPDX-License-Identifier: Apache-2.0

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-10T05:06:09Z

+                    w12=w1_list,
+                    w3=w2_list,
+                    w12_bias=w1_bias_list_slice,
+                    w3_bias=w2_bias_list_slice,
+                    permuted_weights=permuted_weights,
+                    experts_min=self.experts_min,
+                    experts_max=self.experts_max)


Incorrect weight lists passed to MoE operation. Should use sliced lists w1_list_slice and w2_list_slice instead of full lists w1_list and w2_list to match the expert range being processed.

Suggested change

w12=w1_list,

w3=w2_list,

w12_bias=w1_bias_list_slice,

w3_bias=w2_bias_list_slice,

permuted_weights=permuted_weights,

experts_min=self.experts_min,

experts_max=self.experts_max)

w12=w1_list_slice,

w3=w2_list_slice,

w12_bias=w1_bias_list_slice,

w3_bias=w2_bias_list_slice,

permuted_weights=permuted_weights,

experts_min=min_expert,

experts_max=max_expert)

Copilot · 2025-11-10T05:06:10Z

+                    experts_min=self.experts_min,
+                    experts_max=self.experts_max)


Incorrect expert range parameters. Should use min_expert and max_expert (computed for the current slice) instead of self.experts_min and self.experts_max to correctly process the expert slice.

Suggested change

experts_min=self.experts_min,

experts_max=self.experts_max)

experts_min=min_expert,

experts_max=max_expert)

Copilot · 2025-11-10T05:06:10Z

+                # TODO - change 128 to proper window size
+                window_size = (
+                    128,


Magic number 128 used for window size. Consider defining this as a named constant or deriving it from self.sliding_window as indicated by the TODO comment.

Suggested change

# TODO - change 128 to proper window size

window_size = (

128,

# Use self.sliding_window for window size instead of hardcoded 128

window_size = (

self.sliding_window,

Copilot · 2025-11-10T05:06:10Z

+            tensor_parallel_size=4,
+        )
+        generated_texts = do_sample(llm, original_output=original_output_120, rtol=1e-01, atol=1e-01, max_num_seqs=1)
+    assert generated_texts == expected_output


Assertion compares single generated text with expected output incorrectly. The function returns a list but only validates the first element earlier. This assertion will fail unless generated_texts contains exactly one element matching expected_output[0]. Consider assert generated_texts[0] == expected_output[0] or assert generated_texts == expected_output after validating the list length.

Suggested change

assert generated_texts == expected_output

assert len(generated_texts) == len(expected_output)

assert generated_texts[0] == expected_output[0]

Copilot · 2025-11-10T05:06:11Z

+            attn_sink = attn_sink.exp()
+            if attn_sink.dtype == torch.float32:
+                attn_sink = attn_sink.to(value.dtype)
+            #TODO: Removing this .sum and using attn_sink directly


Corrected spacing in TODO comment: should be 'TODO:' with a space after the colon for consistency.

Suggested change

#TODO: Removing this .sum and using attn_sink directly

# TODO: Removing this .sum and using attn_sink directly

Copilot · 2025-11-10T05:06:11Z

-                    attn_bias = None
-                    window_size = (self.sliding_window, 0)
-                    common_args['window_size'] = window_size
+                # TODO - change 128 to proper window size


Inconsistent TODO format: should be 'TODO:' with a colon instead of a dash for consistency with project conventions.

Suggested change

# TODO - change 128 to proper window size

# TODO: change 128 to proper window size

Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>

Signed-off-by: Himangshu Lahkar <49579433+hlahkar@users.noreply.github.com>

Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>

Signed-off-by: Himangshu Lahkar <49579433+hlahkar@users.noreply.github.com>

hlahkar · 2026-01-02T01:43:36Z

Tracking this with #771; as there are lot of changes due to latest vllm plugin

Initial Commit GPT-OSS

c52dbfd

Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>

hlahkar requested review from adobrzyn, afierka-intel, iboiko-habana, kzawora-intel, mgawarkiewicz-intel, michalkuligowski, mswiniarsk, vivekgoe and xuechendi as code owners October 28, 2025 04:29

hlahkar mentioned this pull request Oct 28, 2025

Gpt Oss Enablement #441

Closed

Himangshu Lahkar added 5 commits October 28, 2025 06:37

Update Formatting

bc3d704

Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>

Update Test Case

f3e2553

Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>

Remove unused variable from test

1d35ae9

Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>

Update model_runner

a350ae9

Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>

Set FUSED_SDPA to 0 for test

1928416

Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>

michalkuligowski requested changes Nov 4, 2025

View reviewed changes

Merge branch 'main' into gpt_oss

596433a

Copilot AI review requested due to automatic review settings November 10, 2025 05:04

Copilot AI reviewed Nov 10, 2025

View reviewed changes

Himangshu Lahkar and others added 4 commits November 10, 2025 08:18

Set window_size for fsdpa based on sliding_window

2a2968a

Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>

Merge branch 'main' into gpt_oss

2ecb728

Signed-off-by: Himangshu Lahkar <49579433+hlahkar@users.noreply.github.com>

Update block calculation for decode

051a0c0

Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>

Update pipelined_pa signature

d4eee4d

Signed-off-by: Himangshu Lahkar <hlahkar@habana.ai>

github-actions Bot mentioned this pull request Dec 8, 2025

🚦 Team Review Dashboard #701

Open

Merge branch 'main' into gpt_oss

1855f5a

Signed-off-by: Himangshu Lahkar <49579433+hlahkar@users.noreply.github.com>

hlahkar requested review from kamil-kaczor and ksmusz as code owners December 16, 2025 04:08

hlahkar closed this Jan 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial Commit GPT-OSS#485

Initial Commit GPT-OSS#485
hlahkar wants to merge 12 commits into
vllm-project:mainfrom
hlahkar:gpt_oss

hlahkar commented Oct 28, 2025

Uh oh!

michalkuligowski Nov 4, 2025

Uh oh!

hlahkar Nov 10, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 10, 2025

Uh oh!

Copilot AI Nov 10, 2025

Uh oh!

Copilot AI Nov 10, 2025

Uh oh!

Copilot AI Nov 10, 2025

Uh oh!

Copilot AI Nov 10, 2025

Uh oh!

Copilot AI Nov 10, 2025

Uh oh!

hlahkar commented Jan 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	assert generated_texts == expected_output
	assert len(generated_texts) == len(expected_output)
	assert generated_texts[0] == expected_output[0]

	#TODO: Removing this .sum and using attn_sink directly
	# TODO: Removing this .sum and using attn_sink directly

	# TODO - change 128 to proper window size
	# TODO: change 128 to proper window size

Conversation

hlahkar commented Oct 28, 2025

Uh oh!

michalkuligowski Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

hlahkar Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

hlahkar commented Jan 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants