Revert "Skip materialised causal attn_bias on FSDPA for non-GDN hybri…#1480
Closed
rsmyrek wants to merge 1 commit into
Closed
Conversation
…d models (vllm-project#1413)" This reverts commit 808dbfa. Signed-off-by: Radoslaw Smyrek <radoslawx.smyrek@intel.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR removes the “non-GDN hybrid” topology detection and the associated FusedSDPA-native causal short-circuit that skipped materializing attn_bias for certain hybrid models.
Changes:
- Removed
is_non_gdn_hybridtopology detection in runner/config initialization. - Removed early-return paths in
set_attn_bias/_set_attn_biasthat previously avoided building a large causalattn_biasfor select hybrid models.
Comment on lines
3890
to
3894
| or not attn_metadata.is_prompt): | ||
| return attn_metadata | ||
|
|
||
| # Extended FSDPA-native causal short-circuit for non-GDN hybrid models | ||
| # (e.g. Granite-4 Mamba2+Transformer). FusedSDPA can encode a purely | ||
| # causal mask natively via is_causal=True + valid_seq_lengths, including | ||
| # chunked prefill where block_list is non-None. Skipping the | ||
| # materialised [bs, 1, q_len, total_kv_len] attn_bias avoids a large | ||
| # add_bf16 on the attention critical path (significant at long | ||
| # context). Conservative scope: only non-GDN hybrid models; GDN / | ||
| # pure-transformer / other topologies keep the materialised bias path | ||
| # until validated. | ||
| if (self.prefill_use_fusedsdpa and self.is_causal and not self.is_pooling_model | ||
| and not getattr(self, 'sliding_window', None) | ||
| and not getattr(self, 'model_has_chunked_attention', False) | ||
| and getattr(self, 'alibi_slopes', None) is None and self.is_non_gdn_hybrid): | ||
| return attn_metadata | ||
|
|
||
| if attn_metadata.attn_bias is not None: | ||
| return attn_metadata |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…d models (#1413)"
This reverts commit 808dbfa.