Skip to content

Add support for chunked attention#597

Merged
michalkuligowski merged 3 commits into
vllm-project:releases/v0.11.1from
jkaniecki:chunked_attention_0_11_1
Nov 20, 2025
Merged

Add support for chunked attention#597
michalkuligowski merged 3 commits into
vllm-project:releases/v0.11.1from
jkaniecki:chunked_attention_0_11_1

Conversation

@jkaniecki
Copy link
Copy Markdown
Contributor

Cherry-pick of 6e1be4e

jkaniecki and others added 3 commits November 20, 2025 13:27
Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai>
Signed-off-by: Jan Kaniecki <jan.kaniecki@intel.com>
Signed-off-by: Jan Kaniecki <jan.kaniecki@intel.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for chunked attention mechanisms, cherry-picked from the vllm-gaudi repository. The implementation introduces the ability to process attention in fixed-size chunks, which can improve memory efficiency for certain model architectures.

Key Changes

  • Added _set_attn_bias_for_chunked_attention method to compute attention biases for chunked attention patterns
  • Extended metadata structures to include chunked attention-specific fields (block mappings, usage, groups)
  • Modified attention flow to conditionally use chunked attention based on model configuration

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
vllm_gaudi/v1/worker/hpu_model_runner.py Main implementation of chunked attention support including metadata updates, block mapping logic, and model configuration detection
vllm_gaudi/v1/attention/backends/hpu_attn.py Extended decode metadata factory method to accept chunked attention parameters
vllm_gaudi/attention/backends/hpu_attn.py Added chunked attention metadata fields and conditional logic in attention forward pass

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py
Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py
@github-actions
Copy link
Copy Markdown

✅ CI Passed

All checks passed successfully against the following vllm commit:
439368496db48d8f992ba8c606a0c0b1eebbfa69

@michalkuligowski michalkuligowski merged commit 6feb010 into vllm-project:releases/v0.11.1 Nov 20, 2025
38 checks passed
jkaniecki added a commit to jkaniecki/vllm-gaudi that referenced this pull request Nov 21, 2025
Cherry-pick of
vllm-project@6e1be4e

---------

Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai>
Signed-off-by: Jan Kaniecki <jan.kaniecki@intel.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
wpyszka pushed a commit that referenced this pull request Nov 21, 2025
Cherry-pick of

6e1be4e

---------

Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai>
Signed-off-by: Jan Kaniecki <jan.kaniecki@intel.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
jkaniecki added a commit to jkaniecki/vllm-gaudi that referenced this pull request Dec 4, 2025
Cherry-pick of
vllm-project@6e1be4e

---------

Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai>
Signed-off-by: Jan Kaniecki <jan.kaniecki@intel.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
jkaniecki added a commit to jkaniecki/vllm-gaudi that referenced this pull request Dec 4, 2025
Cherry-pick of
vllm-project@6e1be4e

---------

Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai>
Signed-off-by: Jan Kaniecki <jan.kaniecki@intel.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
jkaniecki added a commit to jkaniecki/vllm-gaudi that referenced this pull request Dec 4, 2025
Cherry-pick of
vllm-project@6e1be4e

---------

Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai>
Signed-off-by: Jan Kaniecki <jan.kaniecki@intel.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
kfojcik-intel pushed a commit to kfojcik-intel/vllm-gaudi that referenced this pull request Jan 13, 2026
Cherry-pick of
vllm-project@6e1be4e

---------

Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai>
Signed-off-by: Jan Kaniecki <jan.kaniecki@intel.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
mgawarkiewicz-intel pushed a commit that referenced this pull request Jan 15, 2026
Cherry-pick of

6e1be4e

---------

---------

Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai>
Signed-off-by: Jan Kaniecki <jan.kaniecki@intel.com>
Signed-off-by: Katarzyna Fojcik <kfojcik@habana.ai>
Co-authored-by: Jan Kaniecki <jkaniecki@habana.ai>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Iryna Boiko <iboiko@habana.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants