Fix: restore boolean attention mask handling in _naive_prompt_attention by Copilot · Pull Request #1 · JyhWind/vllm-gaudi

Copilot · 2026-05-19T05:43:22Z

Summary

Restores boolean attention mask handling in _naive_prompt_attention that was accidentally removed in commit f337029 (Enable slicing for fp8 FusedSDPA vllm-project#1285).

Problem

When attn_bias is a boolean tensor (e.g., from the Boolean attention mask introduced in vllm-project#1032), attn_weights.add_(attn_bias) only adds 0 or 1 to the attention weights instead of masking invalid positions with -inf. This causes incorrect attention scores and potential accuracy degradation, especially for long prompts where proper masking of padded positions is critical.

Fix

Restore the original boolean mask check in _naive_prompt_attention (vllm_gaudi/extension/ops.py):

If attn_bias.dtype == torch.bool: use masked_fill(~attn_bias, float("-inf")) to properly mask invalid positions
Otherwise: fall through to the existing add_ path for float-type attention biases

Note

This fix should also be cherry-picked to the aice branch as aice_patch. The same fix is available on the local aice_patch branch (commit 4b17717), based on origin/aice.

Signed-off-by: copilot copilot@github.com

The boolean mask handling for attn_bias was accidentally removed in commit f337029 (Enable slicing for fp8 FusedSDPA vllm-project#1285). When attn_bias is a boolean tensor, the code should use masked_fill to set invalid positions to -inf, but instead it was using add_ which only adds 0/1 to the attention weights. This causes incorrect attention scores and accuracy degradation, especially for long prompts where proper masking of padded positions is critical. Signed-off-by: copilot <copilot@github.com> Signed-off-by: GitHub <noreply@github.com> Co-authored-by: JyhWind <40982453+JyhWind@users.noreply.github.com>

Copilot AI assigned Copilot and JyhWind May 19, 2026

Copilot created this pull request from a session on behalf of JyhWind May 19, 2026 05:43 View session

Copilot finished work on behalf of JyhWind May 19, 2026 05:43

Copilot AI requested a review from JyhWind May 19, 2026 05:43

Copilot started work on behalf of JyhWind May 19, 2026 05:44 View session

Copilot finished work on behalf of JyhWind May 19, 2026 05:47

Copilot started work on behalf of JyhWind May 19, 2026 05:51 View session

Copilot finished work on behalf of JyhWind May 19, 2026 05:54

Copilot started work on behalf of JyhWind May 19, 2026 06:37 View session

Copilot finished work on behalf of JyhWind May 19, 2026 06:40

Copilot started work on behalf of JyhWind May 19, 2026 06:42 View session

Copilot finished work on behalf of JyhWind May 19, 2026 06:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: restore boolean attention mask handling in _naive_prompt_attention#1

Fix: restore boolean attention mask handling in _naive_prompt_attention#1
Copilot wants to merge 1 commit into
mainfrom
copilot/fix-lower-accuracy-bug

Copilot AI commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented May 19, 2026

Summary

Problem

Fix

Note

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants