Skip to content

Revert "Skip materialised causal attn_bias on FSDPA for non-GDN hybri…#1480

Closed
rsmyrek wants to merge 1 commit into
vllm-project:releases/v0.21.0from
rsmyrek:dev/rsmyrekx/skip_materialised_causal_attn_bias_revert
Closed

Revert "Skip materialised causal attn_bias on FSDPA for non-GDN hybri…#1480
rsmyrek wants to merge 1 commit into
vllm-project:releases/v0.21.0from
rsmyrek:dev/rsmyrekx/skip_materialised_causal_attn_bias_revert

Conversation

@rsmyrek
Copy link
Copy Markdown
Contributor

@rsmyrek rsmyrek commented May 22, 2026

…d models (#1413)"

This reverts commit 808dbfa.

…d models (vllm-project#1413)"

This reverts commit 808dbfa.

Signed-off-by: Radoslaw Smyrek <radoslawx.smyrek@intel.com>
Copilot AI review requested due to automatic review settings May 22, 2026 09:26
@rsmyrek rsmyrek requested a deployment to pre-merge-approval May 22, 2026 09:26 — with GitHub Actions Waiting
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR removes the “non-GDN hybrid” topology detection and the associated FusedSDPA-native causal short-circuit that skipped materializing attn_bias for certain hybrid models.

Changes:

  • Removed is_non_gdn_hybrid topology detection in runner/config initialization.
  • Removed early-return paths in set_attn_bias / _set_attn_bias that previously avoided building a large causal attn_bias for select hybrid models.

Comment on lines 3890 to 3894
or not attn_metadata.is_prompt):
return attn_metadata

# Extended FSDPA-native causal short-circuit for non-GDN hybrid models
# (e.g. Granite-4 Mamba2+Transformer). FusedSDPA can encode a purely
# causal mask natively via is_causal=True + valid_seq_lengths, including
# chunked prefill where block_list is non-None. Skipping the
# materialised [bs, 1, q_len, total_kv_len] attn_bias avoids a large
# add_bf16 on the attention critical path (significant at long
# context). Conservative scope: only non-GDN hybrid models; GDN /
# pure-transformer / other topologies keep the materialised bias path
# until validated.
if (self.prefill_use_fusedsdpa and self.is_causal and not self.is_pooling_model
and not getattr(self, 'sliding_window', None)
and not getattr(self, 'model_has_chunked_attention', False)
and getattr(self, 'alibi_slopes', None) is None and self.is_non_gdn_hybrid):
return attn_metadata

if attn_metadata.attn_bias is not None:
return attn_metadata
@rsmyrek rsmyrek closed this May 22, 2026
@rsmyrek rsmyrek deleted the dev/rsmyrekx/skip_materialised_causal_attn_bias_revert branch May 22, 2026 09:36
@rsmyrek rsmyrek had a problem deploying to pre-merge-approval May 22, 2026 09:39 — with GitHub Actions Error
@rsmyrek rsmyrek temporarily deployed to pre-merge-approval May 22, 2026 12:56 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants