Add support for chunked attention (#597) by jkaniecki · Pull Request #612 · vllm-project/vllm-gaudi

jkaniecki · 2025-11-21T07:47:42Z

Cherry-pick of
6e1be4e

Cherry-pick of vllm-project@6e1be4e --------- Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai> Signed-off-by: Jan Kaniecki <jan.kaniecki@intel.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull Request Overview

This PR adds support for chunked attention mechanism to the vLLM-Gaudi implementation. The key changes introduce infrastructure to handle attention computations in chunks, which can improve memory efficiency and performance for long sequences.

Key Changes:

Added chunked attention bias computation for both prefill and decode phases
Extended attention metadata structures to include chunked attention fields
Implemented logic to detect and configure models with chunked attention support

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File	Description
vllm_gaudi/v1/worker/hpu_model_runner.py	Implements chunked attention support with bias computation, block mapping, metadata updates, and model initialization logic
vllm_gaudi/v1/attention/backends/hpu_attn.py	Extends decode metadata creation to accept chunked attention parameters
vllm_gaudi/attention/backends/hpu_attn.py	Adds chunked attention fields to metadata class and implements selection logic in forward pass

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions · 2025-11-21T10:42:18Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
275de34170654274616082721348b7edd9741d32

wpyszka

approved for 0.11.2

Copilot AI review requested due to automatic review settings November 21, 2025 07:47

jkaniecki requested review from mgawarkiewicz-intel, piotrbocian and wpyszka as code owners November 21, 2025 07:47

Copilot AI reviewed Nov 21, 2025

View reviewed changes

Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py

Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py

Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py

Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py

wpyszka approved these changes Nov 21, 2025

View reviewed changes

wpyszka merged commit 841a400 into vllm-project:releases/v0.11.2 Nov 21, 2025
38 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for chunked attention (#597)#612

Add support for chunked attention (#597)#612
wpyszka merged 1 commit into
vllm-project:releases/v0.11.2from
jkaniecki:chunked_attention_0_11_2

jkaniecki commented Nov 21, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Nov 21, 2025

Uh oh!

wpyszka left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jkaniecki commented Nov 21, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Nov 21, 2025

✅ CI Passed

Uh oh!

wpyszka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants