Fixing condition for materialised causal attn_bias by ksmusz · Pull Request #1433 · vllm-project/vllm-gaudi

ksmusz · 2026-05-11T08:21:07Z

The PR fixes the condition introduced in #1413

Signed-off-by: Krzysztof Smusz <ksmusz@habana.ai>

Copilot

Pull request overview

This PR adjusts the conditions under which prompt-phase attn_bias is materialized, aiming to avoid building a large causal bias tensor in cases where FusedSDPA can apply causal masking natively (via is_causal=True + valid_seq_lengths).

Changes:

Added/expanded early-return conditions in set_attn_bias and _set_attn_bias to skip materializing attn_bias in additional FusedSDPA scenarios.
Updated the surrounding comments to describe the intended short-circuit behavior (though some statements no longer match the broadened gating).

ksmusz · 2026-05-11T12:31:34Z

+        # Extended FSDPA-native causal short-circuit for non-GDN hybrid models
+        # (e.g. Granite-4 Mamba2+Transformer). FusedSDPA can encode a purely
+        # causal mask natively via is_causal=True + valid_seq_lengths, including
+        # chunked prefill where block_list is non-None. Skipping the
+        # materialised [bs, 1, q_len, total_kv_len] attn_bias avoids a large
+        # add_bf16 on the attention critical path (significant at long
+        # context). Conservative scope: only non-GDN hybrid models; GDN /
+        # pure-transformer / other topologies keep the materialised bias path
+        # until validated.


yes, the branch fires for more cases, as that's how it worked before #1413
The comment mentions how it's extended usage applies to non-GDN hybrid models.

ksmusz · 2026-05-11T12:31:54Z

+        # Extended FSDPA-native causal short-circuit for non-GDN hybrid models
+        # (e.g. Granite-4 Mamba2+Transformer). FusedSDPA handles a purely
+        # causal mask natively (is_causal=True + valid_seq_lengths). Skip
+        # materialising a [bs, 1, q_len, total_kv_len] attn_bias even during
+        # chunked prefill (block_list is non-None) for these topologies; this
+        # removes a sizable add_bf16 from the attention critical path during
+        # long-context chunked prefill. interleaved_sliding_window and
+        # chunked-attention bias paths (window_attn_bias / chunked_attn_bias)
+        # are populated later in process_metadata and used by hpu_attn
+        # instead. Conservative scope: only non-GDN hybrid models; all other
+        # topologies retain the original behaviour.


yes, the branch fires for more cases, as that's how it worked before #1413
The comment mentions how it's extended usage applies to non-GDN hybrid models.

github-actions · 2026-05-11T13:42:25Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
54f548e9e58087f0155e4e164e416ad7efdfde6d

#1433 fixed a Qwen3.5 accuracy regression that was only detected when the prompt bucket batch size is large. Adding VLLM_PROMPT_BS_BUCKET_MAX=32 to the CI test covers that case. Also tighten the passing threshold to better catch future regressions. Signed-off-by: Seunghyuk Park <separk@habana.ai> Co-authored-by: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com> Co-authored-by: Libin Tang <libin.tang@intel.com>

Fixing condition for materialised causal attn_bias

03937ca

Signed-off-by: Krzysztof Smusz <ksmusz@habana.ai>

Copilot AI review requested due to automatic review settings May 11, 2026 08:21

ksmusz requested review from PatrykWo, adobrzyn, afierka-intel, iboiko-habana, jbyczkow, kamil-kaczor, mgawarkiewicz-intel, michalkuligowski and xuechendi as code owners May 11, 2026 08:21

Copilot started reviewing on behalf of ksmusz May 11, 2026 08:21 View session

Copilot AI reviewed May 11, 2026

View reviewed changes

github-actions Bot mentioned this pull request May 11, 2026

🚦 Team Review Dashboard #701

Open

jbyczkow approved these changes May 11, 2026

View reviewed changes

ksmusz merged commit dfd3d1f into main May 11, 2026
6 checks passed

ksmusz deleted the dev/ksmusz/fix-materialised-causal-attn-bias branch May 11, 2026 16:01

shepark mentioned this pull request May 12, 2026

Harden Qwen3.5 CI test to detect regressions #1443

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing condition for materialised causal attn_bias#1433

Fixing condition for materialised causal attn_bias#1433
ksmusz merged 1 commit into
mainfrom
dev/ksmusz/fix-materialised-causal-attn-bias

ksmusz commented May 11, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

ksmusz May 11, 2026

Uh oh!

ksmusz May 11, 2026

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ksmusz commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

ksmusz May 11, 2026

Choose a reason for hiding this comment

Uh oh!

ksmusz May 11, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 11, 2026

✅ CI Passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ksmusz commented May 11, 2026 •

edited

Loading