Revert "Skip materialised causal attn_bias on FSDPA for non-GDN hybri… by rsmyrek · Pull Request #1482 · vllm-project/vllm-gaudi

rsmyrek · 2026-05-22T09:46:46Z

…d models (#1413)"

This reverts commit 808dbfa.

…d models (vllm-project#1413)" This reverts commit 808dbfa. Signed-off-by: Radoslaw Smyrek <radoslawx.smyrek@intel.com>

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Removes the “non-GDN hybrid” topology detection and the FSDPA-native causal short-circuit paths that skipped materializing attn_bias for certain hybrid models.

Changes:

Deleted is_non_gdn_hybrid topology detection in both runner/metadata init paths.
Removed early-return short-circuit in set_attn_bias / _set_attn_bias that avoided building a large causal attn_bias tensor.

                or not attn_metadata.is_prompt):
            return attn_metadata

-        # Extended FSDPA-native causal short-circuit for non-GDN hybrid models
-        # (e.g. Granite-4 Mamba2+Transformer). FusedSDPA can encode a purely
-        # causal mask natively via is_causal=True + valid_seq_lengths, including
-        # chunked prefill where block_list is non-None. Skipping the
-        # materialised [bs, 1, q_len, total_kv_len] attn_bias avoids a large
-        # add_bf16 on the attention critical path (significant at long
-        # context). Conservative scope: only non-GDN hybrid models; GDN /
-        # pure-transformer / other topologies keep the materialised bias path
-        # until validated.
-        if (self.prefill_use_fusedsdpa and self.is_causal and not self.is_pooling_model
-                and not getattr(self, 'sliding_window', None)
-                and not getattr(self, 'model_has_chunked_attention', False)
-                and getattr(self, 'alibi_slopes', None) is None and self.is_non_gdn_hybrid):
-            return attn_metadata
-
        if attn_metadata.attn_bias is not None:
            return attn_metadata



                or not attn_metadata.is_prompt):
            return attn_metadata

-        # Extended FSDPA-native causal short-circuit for non-GDN hybrid models
-        # (e.g. Granite-4 Mamba2+Transformer). FusedSDPA handles a purely
-        # causal mask natively (is_causal=True + valid_seq_lengths). Skip
-        # materialising a [bs, 1, q_len, total_kv_len] attn_bias even during
-        # chunked prefill (block_list is non-None) for these topologies; this
-        # removes a sizable add_bf16 from the attention critical path during
-        # long-context chunked prefill. interleaved_sliding_window and
-        # chunked-attention bias paths (window_attn_bias / chunked_attn_bias)
-        # are populated later in process_metadata and used by hpu_attn
-        # instead. Conservative scope: only non-GDN hybrid models; all other
-        # topologies retain the original behaviour.
-        if (self.prefill_use_fusedsdpa and not self.interleaved_sliding_window and self.is_non_gdn_hybrid):
-            return attn_metadata
-
        if attn_metadata.attn_bias is not None:
            return attn_metadata



github-actions · 2026-05-26T13:53:36Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

…ias_revert

github-actions · 2026-05-27T12:17:35Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
0a54df28471be07b3d668ea21c5e411569d3baea

Revert "Skip materialised causal attn_bias on FSDPA for non-GDN hybri…

3a8370d

…d models (vllm-project#1413)" This reverts commit 808dbfa. Signed-off-by: Radoslaw Smyrek <radoslawx.smyrek@intel.com>

Copilot AI review requested due to automatic review settings May 22, 2026 09:46

rsmyrek had a problem deploying to pre-merge-approval May 22, 2026 09:46 — with GitHub Actions Error

Copilot AI reviewed May 22, 2026

View reviewed changes

rsmyrek marked this pull request as ready for review May 26, 2026 13:37

rsmyrek requested review from PatrykWo, adobrzyn, afierka-intel, iboiko-habana, jbyczkow, kamil-kaczor, ksmusz, mgawarkiewicz-intel, michalkuligowski and xuechendi as code owners May 26, 2026 13:37

rsmyrek temporarily deployed to pre-merge-approval May 26, 2026 13:37 — with GitHub Actions Inactive

Merge branch 'main' into dev/rsmyrekx/skip_materialised_causal_attn_b…

1aa6e71

…ias_revert

rsmyrek temporarily deployed to pre-merge-approval May 26, 2026 14:01 — with GitHub Actions Inactive

github-actions Bot mentioned this pull request May 26, 2026

🚦 Team Review Dashboard #701

Open

Merge branch 'main' into dev/rsmyrekx/skip_materialised_causal_attn_b…

81f3f1d

…ias_revert

iboiko-habana temporarily deployed to pre-merge-approval May 27, 2026 07:56 — with GitHub Actions Inactive

iboiko-habana approved these changes May 27, 2026

View reviewed changes

iboiko-habana merged commit d8af506 into vllm-project:main May 27, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert "Skip materialised causal attn_bias on FSDPA for non-GDN hybri…#1482

Revert "Skip materialised causal attn_bias on FSDPA for non-GDN hybri…#1482
iboiko-habana merged 3 commits into
vllm-project:mainfrom
rsmyrek:dev/rsmyrekx/skip_materialised_causal_attn_bias_revert

rsmyrek commented May 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rsmyrek commented May 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

github-actions Bot commented May 26, 2026

🚧 CI Blocked

Uh oh!

github-actions Bot commented May 27, 2026

✅ CI Passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants