Skip to content

Add the option to use Boolean attention mask#1154

Closed
yangulei wants to merge 4 commits into
vllm-project:mainfrom
yangulei:bool_mask
Closed

Add the option to use Boolean attention mask#1154
yangulei wants to merge 4 commits into
vllm-project:mainfrom
yangulei:bool_mask

Conversation

@yangulei
Copy link
Copy Markdown
Collaborator

No description provided.

This comment was marked as resolved.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new runtime feature flag to optionally represent attention masks as boolean tensors (instead of additive 0/-inf bias tensors), and updates the HPU attention code paths to correctly apply boolean masks during attention score computation.

Changes:

  • Add use_boolean_mask feature flag (env var VLLM_USE_BOOLEAN_MASK) to runtime config.
  • Update attention-bias/mask construction in the HPU model runner to emit either boolean masks or additive -inf biases depending on the flag.
  • Update HPU attention ops to apply boolean masks via masked_fill instead of adding an additive bias tensor.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
vllm_gaudi/v1/worker/hpu_model_runner.py Adds use_boolean_mask wiring and conditionally builds boolean vs additive attention masks/biases across multiple mask construction helpers.
vllm_gaudi/extension/ops.py Applies boolean attention masks via masked_fill in prompt attention and paged-attention softmax paths.
vllm_gaudi/extension/features.py Introduces use_boolean_mask config value driven by VLLM_USE_BOOLEAN_MASK.

Comment thread vllm_gaudi/extension/features.py
@yangulei yangulei force-pushed the bool_mask branch 2 times, most recently from 1b6d7af to fee2317 Compare March 17, 2026 05:12
@github-actions
Copy link
Copy Markdown

✅ CI Passed

All checks passed successfully against the following vllm commit:
f296a1966dca96cd69e5c1fa1264edbf611a1bd6

@adobrzyn
Copy link
Copy Markdown
Collaborator

@yangulei please include this new flag in documentation

@afierka-intel
Copy link
Copy Markdown
Collaborator

@copilot open a new pull request to apply changes based on the comments in this thread

@yangulei
Copy link
Copy Markdown
Collaborator Author

yangulei commented Mar 19, 2026

A short description for the VLLM_USE_BOOLEAN_MASK was pushed.
More detailed documentation should be pushed in the PR for slicing of FusedSDPA (#1149 or #1155).

Signed-off-by: Youlei Yang <youlei.yang@intel.com>
Signed-off-by: Youlei Yang <youlei.yang@intel.com>
Signed-off-by: Youlei Yang <youlei.yang@intel.com>
Signed-off-by: Youlei Yang <youlei.yang@intel.com>
@yangulei
Copy link
Copy Markdown
Collaborator Author

Close as no obvious perf gain and have accuracy issue when using with INC fp8.

@yangulei yangulei closed this Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants