Skip to content

Revert "Skip materialised causal attn_bias on FSDPA for non-GDN hybri…#1482

Merged
iboiko-habana merged 3 commits into
vllm-project:mainfrom
rsmyrek:dev/rsmyrekx/skip_materialised_causal_attn_bias_revert
May 27, 2026
Merged

Revert "Skip materialised causal attn_bias on FSDPA for non-GDN hybri…#1482
iboiko-habana merged 3 commits into
vllm-project:mainfrom
rsmyrek:dev/rsmyrekx/skip_materialised_causal_attn_bias_revert

Conversation

@rsmyrek
Copy link
Copy Markdown
Contributor

@rsmyrek rsmyrek commented May 22, 2026

…d models (#1413)"

This reverts commit 808dbfa.

…d models (vllm-project#1413)"

This reverts commit 808dbfa.

Signed-off-by: Radoslaw Smyrek <radoslawx.smyrek@intel.com>
Copilot AI review requested due to automatic review settings May 22, 2026 09:46
@rsmyrek rsmyrek had a problem deploying to pre-merge-approval May 22, 2026 09:46 — with GitHub Actions Error
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Removes the “non-GDN hybrid” topology detection and the FSDPA-native causal short-circuit paths that skipped materializing attn_bias for certain hybrid models.

Changes:

  • Deleted is_non_gdn_hybrid topology detection in both runner/metadata init paths.
  • Removed early-return short-circuit in set_attn_bias / _set_attn_bias that avoided building a large causal attn_bias tensor.

Comment on lines 3891 to 3896
or not attn_metadata.is_prompt):
return attn_metadata

# Extended FSDPA-native causal short-circuit for non-GDN hybrid models
# (e.g. Granite-4 Mamba2+Transformer). FusedSDPA can encode a purely
# causal mask natively via is_causal=True + valid_seq_lengths, including
# chunked prefill where block_list is non-None. Skipping the
# materialised [bs, 1, q_len, total_kv_len] attn_bias avoids a large
# add_bf16 on the attention critical path (significant at long
# context). Conservative scope: only non-GDN hybrid models; GDN /
# pure-transformer / other topologies keep the materialised bias path
# until validated.
if (self.prefill_use_fusedsdpa and self.is_causal and not self.is_pooling_model
and not getattr(self, 'sliding_window', None)
and not getattr(self, 'model_has_chunked_attention', False)
and getattr(self, 'alibi_slopes', None) is None and self.is_non_gdn_hybrid):
return attn_metadata

if attn_metadata.attn_bias is not None:
return attn_metadata

Comment on lines 6780 to 6785
or not attn_metadata.is_prompt):
return attn_metadata

# Extended FSDPA-native causal short-circuit for non-GDN hybrid models
# (e.g. Granite-4 Mamba2+Transformer). FusedSDPA handles a purely
# causal mask natively (is_causal=True + valid_seq_lengths). Skip
# materialising a [bs, 1, q_len, total_kv_len] attn_bias even during
# chunked prefill (block_list is non-None) for these topologies; this
# removes a sizable add_bf16 from the attention critical path during
# long-context chunked prefill. interleaved_sliding_window and
# chunked-attention bias paths (window_attn_bias / chunked_attn_bias)
# are populated later in process_metadata and used by hpu_attn
# instead. Conservative scope: only non-GDN hybrid models; all other
# topologies retain the original behaviour.
if (self.prefill_use_fusedsdpa and not self.interleaved_sliding_window and self.is_non_gdn_hybrid):
return attn_metadata

if attn_metadata.attn_bias is not None:
return attn_metadata

@github-actions
Copy link
Copy Markdown

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

@rsmyrek rsmyrek temporarily deployed to pre-merge-approval May 26, 2026 14:01 — with GitHub Actions Inactive
@iboiko-habana iboiko-habana temporarily deployed to pre-merge-approval May 27, 2026 07:56 — with GitHub Actions Inactive
@github-actions
Copy link
Copy Markdown

✅ CI Passed

All checks passed successfully against the following vllm commit:
0a54df28471be07b3d668ea21c5e411569d3baea

@iboiko-habana iboiko-habana merged commit d8af506 into vllm-project:main May 27, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants