Skip to content

Add support for chunked attention (#597)#612

Merged
wpyszka merged 1 commit into
vllm-project:releases/v0.11.2from
jkaniecki:chunked_attention_0_11_2
Nov 21, 2025
Merged

Add support for chunked attention (#597)#612
wpyszka merged 1 commit into
vllm-project:releases/v0.11.2from
jkaniecki:chunked_attention_0_11_2

Conversation

@jkaniecki
Copy link
Copy Markdown
Contributor

Cherry-pick of
6e1be4e


Cherry-pick of
vllm-project@6e1be4e

---------

Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai>
Signed-off-by: Jan Kaniecki <jan.kaniecki@intel.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings November 21, 2025 07:47
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for chunked attention mechanism to the vLLM-Gaudi implementation. The key changes introduce infrastructure to handle attention computations in chunks, which can improve memory efficiency and performance for long sequences.

Key Changes:

  • Added chunked attention bias computation for both prefill and decode phases
  • Extended attention metadata structures to include chunked attention fields
  • Implemented logic to detect and configure models with chunked attention support

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
vllm_gaudi/v1/worker/hpu_model_runner.py Implements chunked attention support with bias computation, block mapping, metadata updates, and model initialization logic
vllm_gaudi/v1/attention/backends/hpu_attn.py Extends decode metadata creation to accept chunked attention parameters
vllm_gaudi/attention/backends/hpu_attn.py Adds chunked attention fields to metadata class and implements selection logic in forward pass

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py
Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py
Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py
Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py
@github-actions
Copy link
Copy Markdown

✅ CI Passed

All checks passed successfully against the following vllm commit:
275de34170654274616082721348b7edd9741d32

Copy link
Copy Markdown
Collaborator

@wpyszka wpyszka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved for 0.11.2

@wpyszka wpyszka merged commit 841a400 into vllm-project:releases/v0.11.2 Nov 21, 2025
38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants