Revert "Skip materialised causal attn_bias on FSDPA for non-GDN hybri… by rsmyrek · Pull Request #1481 · vllm-project/vllm-gaudi

rsmyrek · 2026-05-22T09:39:30Z

…d models (#1413)"

This reverts commit 808dbfa.

…d models (vllm-project#1413)" This reverts commit 808dbfa. Signed-off-by: Radoslaw Smyrek <radoslawx.smyrek@intel.com>

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR removes the “non-GDN hybrid” topology detection and the associated early-return optimization that skipped materializing attention bias for certain hybrid models when using FusedSDPA.

Changes:

Removed is_non_gdn_hybrid computation in runner and related component init paths.
Removed “FSDPA-native causal short-circuit” branches in set_attn_bias and _set_attn_bias that previously returned early for non-GDN hybrid models.

                or not attn_metadata.is_prompt):
            return attn_metadata

-        # Extended FSDPA-native causal short-circuit for non-GDN hybrid models
-        # (e.g. Granite-4 Mamba2+Transformer). FusedSDPA can encode a purely
-        # causal mask natively via is_causal=True + valid_seq_lengths, including
-        # chunked prefill where block_list is non-None. Skipping the
-        # materialised [bs, 1, q_len, total_kv_len] attn_bias avoids a large
-        # add_bf16 on the attention critical path (significant at long
-        # context). Conservative scope: only non-GDN hybrid models; GDN /
-        # pure-transformer / other topologies keep the materialised bias path
-        # until validated.
-        if (self.prefill_use_fusedsdpa and self.is_causal and not self.is_pooling_model
-                and not getattr(self, 'sliding_window', None)
-                and not getattr(self, 'model_has_chunked_attention', False)
-                and getattr(self, 'alibi_slopes', None) is None and self.is_non_gdn_hybrid):
-            return attn_metadata
-
        if attn_metadata.attn_bias is not None:
            return attn_metadata


                or not attn_metadata.is_prompt):
            return attn_metadata

-        # Extended FSDPA-native causal short-circuit for non-GDN hybrid models
-        # (e.g. Granite-4 Mamba2+Transformer). FusedSDPA handles a purely
-        # causal mask natively (is_causal=True + valid_seq_lengths). Skip
-        # materialising a [bs, 1, q_len, total_kv_len] attn_bias even during
-        # chunked prefill (block_list is non-None) for these topologies; this
-        # removes a sizable add_bf16 from the attention critical path during
-        # long-context chunked prefill. interleaved_sliding_window and
-        # chunked-attention bias paths (window_attn_bias / chunked_attn_bias)
-        # are populated later in process_metadata and used by hpu_attn
-        # instead. Conservative scope: only non-GDN hybrid models; all other
-        # topologies retain the original behaviour.
-        if (self.prefill_use_fusedsdpa and not self.interleaved_sliding_window and self.is_non_gdn_hybrid):
-            return attn_metadata
-
        if attn_metadata.attn_bias is not None:
            return attn_metadata


…ausal_attn_bias_revert_0.21.0

github-actions · 2026-05-25T14:14:58Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

…ausal_attn_bias_revert_0.21.0

github-actions · 2026-05-25T20:21:50Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
ad7125a431e176d4161099480a66f0169609a690

Revert "Skip materialised causal attn_bias on FSDPA for non-GDN hybri…

e4a8aa7

…d models (vllm-project#1413)" This reverts commit 808dbfa. Signed-off-by: Radoslaw Smyrek <radoslawx.smyrek@intel.com>

Copilot AI review requested due to automatic review settings May 22, 2026 09:39

rsmyrek had a problem deploying to pre-merge-approval May 22, 2026 09:39 — with GitHub Actions Error

Copilot AI reviewed May 22, 2026

View reviewed changes

rsmyrek marked this pull request as ready for review May 22, 2026 12:56

rsmyrek requested review from PatrykWo, mgawarkiewicz-intel and wpyszka as code owners May 22, 2026 12:56

rsmyrek temporarily deployed to pre-merge-approval May 22, 2026 12:56 — with GitHub Actions Inactive

github-actions Bot mentioned this pull request May 22, 2026

🚦 Team Review Dashboard #701

Open

jbyczkow approved these changes May 25, 2026

View reviewed changes

Merge branch 'releases/v0.21.0' into dev/rsmyrekx/skip_materialised_c…

4753344

…ausal_attn_bias_revert_0.21.0

jbyczkow temporarily deployed to pre-merge-approval May 25, 2026 09:01 — with GitHub Actions Inactive

jbyczkow approved these changes May 25, 2026

View reviewed changes

Merge branch 'releases/v0.21.0' into dev/rsmyrekx/skip_materialised_c…

a4a2933

…ausal_attn_bias_revert_0.21.0

jbyczkow temporarily deployed to pre-merge-approval May 25, 2026 15:19 — with GitHub Actions Inactive

jbyczkow approved these changes May 25, 2026

View reviewed changes

mgawarkiewicz-intel merged commit 5121be2 into vllm-project:releases/v0.21.0 May 26, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert "Skip materialised causal attn_bias on FSDPA for non-GDN hybri…#1481

Revert "Skip materialised causal attn_bias on FSDPA for non-GDN hybri…#1481
mgawarkiewicz-intel merged 3 commits into
vllm-project:releases/v0.21.0from
rsmyrek:dev/rsmyrekx/skip_materialised_causal_attn_bias_revert_0.21.0

rsmyrek commented May 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented May 25, 2026

Uh oh!

github-actions Bot commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

rsmyrek commented May 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

github-actions Bot commented May 25, 2026

🚧 CI Blocked

Uh oh!

github-actions Bot commented May 25, 2026

✅ CI Passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants