Skip to content

Revert "Skip materialised causal attn_bias on FSDPA for non-GDN hybri…#1481

Merged
mgawarkiewicz-intel merged 3 commits into
vllm-project:releases/v0.21.0from
rsmyrek:dev/rsmyrekx/skip_materialised_causal_attn_bias_revert_0.21.0
May 26, 2026
Merged

Revert "Skip materialised causal attn_bias on FSDPA for non-GDN hybri…#1481
mgawarkiewicz-intel merged 3 commits into
vllm-project:releases/v0.21.0from
rsmyrek:dev/rsmyrekx/skip_materialised_causal_attn_bias_revert_0.21.0

Conversation

@rsmyrek
Copy link
Copy Markdown
Contributor

@rsmyrek rsmyrek commented May 22, 2026

…d models (#1413)"

This reverts commit 808dbfa.

…d models (vllm-project#1413)"

This reverts commit 808dbfa.

Signed-off-by: Radoslaw Smyrek <radoslawx.smyrek@intel.com>
Copilot AI review requested due to automatic review settings May 22, 2026 09:39
@rsmyrek rsmyrek had a problem deploying to pre-merge-approval May 22, 2026 09:39 — with GitHub Actions Error
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR removes the “non-GDN hybrid” topology detection and the associated early-return optimization that skipped materializing attention bias for certain hybrid models when using FusedSDPA.

Changes:

  • Removed is_non_gdn_hybrid computation in runner and related component init paths.
  • Removed “FSDPA-native causal short-circuit” branches in set_attn_bias and _set_attn_bias that previously returned early for non-GDN hybrid models.

Comment on lines 3890 to 3894
or not attn_metadata.is_prompt):
return attn_metadata

# Extended FSDPA-native causal short-circuit for non-GDN hybrid models
# (e.g. Granite-4 Mamba2+Transformer). FusedSDPA can encode a purely
# causal mask natively via is_causal=True + valid_seq_lengths, including
# chunked prefill where block_list is non-None. Skipping the
# materialised [bs, 1, q_len, total_kv_len] attn_bias avoids a large
# add_bf16 on the attention critical path (significant at long
# context). Conservative scope: only non-GDN hybrid models; GDN /
# pure-transformer / other topologies keep the materialised bias path
# until validated.
if (self.prefill_use_fusedsdpa and self.is_causal and not self.is_pooling_model
and not getattr(self, 'sliding_window', None)
and not getattr(self, 'model_has_chunked_attention', False)
and getattr(self, 'alibi_slopes', None) is None and self.is_non_gdn_hybrid):
return attn_metadata

if attn_metadata.attn_bias is not None:
return attn_metadata
Comment on lines 6764 to 6768
or not attn_metadata.is_prompt):
return attn_metadata

# Extended FSDPA-native causal short-circuit for non-GDN hybrid models
# (e.g. Granite-4 Mamba2+Transformer). FusedSDPA handles a purely
# causal mask natively (is_causal=True + valid_seq_lengths). Skip
# materialising a [bs, 1, q_len, total_kv_len] attn_bias even during
# chunked prefill (block_list is non-None) for these topologies; this
# removes a sizable add_bf16 from the attention critical path during
# long-context chunked prefill. interleaved_sliding_window and
# chunked-attention bias paths (window_attn_bias / chunked_attn_bias)
# are populated later in process_metadata and used by hpu_attn
# instead. Conservative scope: only non-GDN hybrid models; all other
# topologies retain the original behaviour.
if (self.prefill_use_fusedsdpa and not self.interleaved_sliding_window and self.is_non_gdn_hybrid):
return attn_metadata

if attn_metadata.attn_bias is not None:
return attn_metadata
@rsmyrek rsmyrek marked this pull request as ready for review May 22, 2026 12:56
@rsmyrek rsmyrek temporarily deployed to pre-merge-approval May 22, 2026 12:56 — with GitHub Actions Inactive
@jbyczkow jbyczkow temporarily deployed to pre-merge-approval May 25, 2026 09:01 — with GitHub Actions Inactive
@github-actions
Copy link
Copy Markdown

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

@jbyczkow jbyczkow temporarily deployed to pre-merge-approval May 25, 2026 15:19 — with GitHub Actions Inactive
@github-actions
Copy link
Copy Markdown

✅ CI Passed

All checks passed successfully against the following vllm commit:
ad7125a431e176d4161099480a66f0169609a690

@mgawarkiewicz-intel mgawarkiewicz-intel merged commit 5121be2 into vllm-project:releases/v0.21.0 May 26, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants