Skip to content

Fix: restore boolean attention mask handling in _naive_prompt_attention#1

Draft
Copilot wants to merge 1 commit into
mainfrom
copilot/fix-lower-accuracy-bug
Draft

Fix: restore boolean attention mask handling in _naive_prompt_attention#1
Copilot wants to merge 1 commit into
mainfrom
copilot/fix-lower-accuracy-bug

Conversation

Copy link
Copy Markdown

Copilot AI commented May 19, 2026

Summary

Restores boolean attention mask handling in _naive_prompt_attention that was accidentally removed in commit f337029 (Enable slicing for fp8 FusedSDPA vllm-project#1285).

Problem

When attn_bias is a boolean tensor (e.g., from the Boolean attention mask introduced in vllm-project#1032), attn_weights.add_(attn_bias) only adds 0 or 1 to the attention weights instead of masking invalid positions with -inf. This causes incorrect attention scores and potential accuracy degradation, especially for long prompts where proper masking of padded positions is critical.

Fix

Restore the original boolean mask check in _naive_prompt_attention (vllm_gaudi/extension/ops.py):

  • If attn_bias.dtype == torch.bool: use masked_fill(~attn_bias, float("-inf")) to properly mask invalid positions
  • Otherwise: fall through to the existing add_ path for float-type attention biases

Note

This fix should also be cherry-picked to the aice branch as aice_patch. The same fix is available on the local aice_patch branch (commit 4b17717), based on origin/aice.

Signed-off-by: copilot copilot@github.com

The boolean mask handling for attn_bias was accidentally removed in commit
f337029 (Enable slicing for fp8 FusedSDPA vllm-project#1285). When attn_bias is a boolean
tensor, the code should use masked_fill to set invalid positions to -inf,
but instead it was using add_ which only adds 0/1 to the attention weights.
This causes incorrect attention scores and accuracy degradation, especially
for long prompts where proper masking of padded positions is critical.

Signed-off-by: copilot <copilot@github.com>
Signed-off-by: GitHub <noreply@github.com>

Co-authored-by: JyhWind <40982453+JyhWind@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants