Add support for chunked attention by jkaniecki · Pull Request #597 · vllm-project/vllm-gaudi

jkaniecki · 2025-11-20T11:34:20Z

Cherry-pick of 6e1be4e

Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai>

Signed-off-by: Jan Kaniecki <jan.kaniecki@intel.com>

Copilot

Pull Request Overview

This PR adds support for chunked attention mechanisms, cherry-picked from the vllm-gaudi repository. The implementation introduces the ability to process attention in fixed-size chunks, which can improve memory efficiency for certain model architectures.

Key Changes

Added _set_attn_bias_for_chunked_attention method to compute attention biases for chunked attention patterns
Extended metadata structures to include chunked attention-specific fields (block mappings, usage, groups)
Modified attention flow to conditionally use chunked attention based on model configuration

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
vllm_gaudi/v1/worker/hpu_model_runner.py	Main implementation of chunked attention support including metadata updates, block mapping logic, and model configuration detection
vllm_gaudi/v1/attention/backends/hpu_attn.py	Extended decode metadata factory method to accept chunked attention parameters
vllm_gaudi/attention/backends/hpu_attn.py	Added chunked attention metadata fields and conditional logic in attention forward pass

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions · 2025-11-20T13:26:45Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
439368496db48d8f992ba8c606a0c0b1eebbfa69

Cherry-pick of vllm-project@6e1be4e --------- Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai> Signed-off-by: Jan Kaniecki <jan.kaniecki@intel.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Cherry-pick of 6e1be4e --------- Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai> Signed-off-by: Jan Kaniecki <jan.kaniecki@intel.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Cherry-pick of vllm-project@6e1be4e --------- Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai> Signed-off-by: Jan Kaniecki <jan.kaniecki@intel.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Cherry-pick of 6e1be4e --------- --------- Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai> Signed-off-by: Jan Kaniecki <jan.kaniecki@intel.com> Signed-off-by: Katarzyna Fojcik <kfojcik@habana.ai> Co-authored-by: Jan Kaniecki <jkaniecki@habana.ai> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Iryna Boiko <iboiko@habana.ai>

jkaniecki and others added 3 commits November 20, 2025 13:27

Add support for chunked attention (vllm-project#560)

138e84e

Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai>

Resolve conflicts

e4d918a

Signed-off-by: Jan Kaniecki <jan.kaniecki@intel.com>

Fix conflicts

5b044fe

Signed-off-by: Jan Kaniecki <jan.kaniecki@intel.com>

Copilot AI review requested due to automatic review settings November 20, 2025 11:34

jkaniecki requested review from adobrzyn, afierka-intel, iboiko-habana, kzawora-intel, mgawarkiewicz-intel, michalkuligowski, mswiniarsk, vivekgoe and xuechendi as code owners November 20, 2025 11:34

Copilot AI reviewed Nov 20, 2025

View reviewed changes

Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py

Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py

michalkuligowski approved these changes Nov 20, 2025

View reviewed changes

michalkuligowski merged commit 6feb010 into vllm-project:releases/v0.11.1 Nov 20, 2025
38 checks passed

github-actions Bot mentioned this pull request Dec 8, 2025

🚦 Team Review Dashboard #701

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for chunked attention#597

Add support for chunked attention#597
michalkuligowski merged 3 commits into
vllm-project:releases/v0.11.1from
jkaniecki:chunked_attention_0_11_1

jkaniecki commented Nov 20, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Nov 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jkaniecki commented Nov 20, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Key Changes

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Nov 20, 2025

✅ CI Passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants