Add the option to use Boolean attention mask by yangulei · Pull Request #1154 · vllm-project/vllm-gaudi

yangulei · 2026-03-13T03:35:31Z

No description provided.

Copilot

Pull request overview

Adds a new runtime feature flag to optionally represent attention masks as boolean tensors (instead of additive 0/-inf bias tensors), and updates the HPU attention code paths to correctly apply boolean masks during attention score computation.

Changes:

Add use_boolean_mask feature flag (env var VLLM_USE_BOOLEAN_MASK) to runtime config.
Update attention-bias/mask construction in the HPU model runner to emit either boolean masks or additive -inf biases depending on the flag.
Update HPU attention ops to apply boolean masks via masked_fill instead of adding an additive bias tensor.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
`vllm_gaudi/v1/worker/hpu_model_runner.py`	Adds `use_boolean_mask` wiring and conditionally builds boolean vs additive attention masks/biases across multiple mask construction helpers.
`vllm_gaudi/extension/ops.py`	Applies boolean attention masks via `masked_fill` in prompt attention and paged-attention softmax paths.
`vllm_gaudi/extension/features.py`	Introduces `use_boolean_mask` config value driven by `VLLM_USE_BOOLEAN_MASK`.

github-actions · 2026-03-17T09:19:13Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
f296a1966dca96cd69e5c1fa1264edbf611a1bd6

adobrzyn · 2026-03-18T11:02:31Z

@yangulei please include this new flag in documentation

afierka-intel · 2026-03-18T13:11:20Z

@copilot open a new pull request to apply changes based on the comments in this thread

yangulei · 2026-03-19T01:48:34Z

A short description for the VLLM_USE_BOOLEAN_MASK was pushed.
More detailed documentation should be pushed in the PR for slicing of FusedSDPA (#1149 or #1155).

Signed-off-by: Youlei Yang <youlei.yang@intel.com>

yangulei · 2026-04-17T01:01:40Z

Close as no obvious perf gain and have accuracy issue when using with INC fp8.

yangulei requested review from adobrzyn, afierka-intel, mgawarkiewicz-intel, michalkuligowski and xuechendi as code owners March 13, 2026 03:35

Copilot AI review requested due to automatic review settings March 13, 2026 03:35

yangulei requested review from PatrykWo, iboiko-habana, kamil-kaczor and ksmusz as code owners March 13, 2026 03:35

Copilot started reviewing on behalf of yangulei March 13, 2026 03:35 View session

This comment was marked as resolved.

Sign in to view

github-actions Bot mentioned this pull request Mar 13, 2026

🚦 Team Review Dashboard #701

Open

yangulei mentioned this pull request Mar 13, 2026

Enable slicing for the FusedSDPA #1155

Merged

yangulei requested a review from Copilot March 13, 2026 06:27

Copilot started reviewing on behalf of yangulei March 13, 2026 06:29 View session

Copilot AI reviewed Mar 13, 2026

View reviewed changes

Comment thread vllm_gaudi/extension/features.py

yangulei force-pushed the bool_mask branch 2 times, most recently from 1b6d7af to fee2317 Compare March 17, 2026 05:12

adobrzyn assigned afierka-intel Mar 18, 2026

afierka-intel mentioned this pull request Mar 18, 2026

Use Boolean attention mask and enable FusedSDPA slicing for long sequences #1149

Closed

yangulei force-pushed the bool_mask branch from fee2317 to f8f834c Compare March 19, 2026 01:44

yangulei force-pushed the bool_mask branch from f542f12 to d58f0cb Compare March 19, 2026 07:29

yangulei added 3 commits March 24, 2026 11:45

add a feature to use boolean mask for attention

f3868b9

Signed-off-by: Youlei Yang <youlei.yang@intel.com>

use in-place masked_fill_

a058ab4

Signed-off-by: Youlei Yang <youlei.yang@intel.com>

add tests to CI

4b34e1f

Signed-off-by: Youlei Yang <youlei.yang@intel.com>

add doc

bba174c

Signed-off-by: Youlei Yang <youlei.yang@intel.com>

yangulei force-pushed the bool_mask branch from d58f0cb to bba174c Compare March 24, 2026 03:53

yangulei closed this Apr 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the option to use Boolean attention mask#1154

Add the option to use Boolean attention mask#1154
yangulei wants to merge 4 commits into
vllm-project:mainfrom
yangulei:bool_mask

yangulei commented Mar 13, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

github-actions Bot commented Mar 17, 2026

Uh oh!

adobrzyn commented Mar 18, 2026

Uh oh!

afierka-intel commented Mar 18, 2026

Uh oh!

yangulei commented Mar 19, 2026 •

edited

Loading

Uh oh!

yangulei commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

yangulei commented Mar 13, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

github-actions Bot commented Mar 17, 2026

✅ CI Passed

Uh oh!

adobrzyn commented Mar 18, 2026

Uh oh!

afierka-intel commented Mar 18, 2026

Uh oh!

yangulei commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yangulei commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yangulei commented Mar 19, 2026 •

edited

Loading