Skip to content

Add support for chunked attention#560

Merged
wpyszka merged 12 commits into
vllm-project:releases/v0.11.0from
jkaniecki:chunked_attention
Nov 18, 2025
Merged

Add support for chunked attention#560
wpyszka merged 12 commits into
vllm-project:releases/v0.11.0from
jkaniecki:chunked_attention

Conversation

@jkaniecki
Copy link
Copy Markdown
Contributor

No description provided.

Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai>
Copilot AI review requested due to automatic review settings November 12, 2025 20:26
@github-actions
Copy link
Copy Markdown

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for chunked attention, a mechanism that restricts attention within fixed-size chunks during both prefill and decode phases. The implementation includes computing chunked attention biases, managing chunked block mappings, and integrating these features into the existing attention metadata infrastructure.

Key changes:

  • Implements chunked attention bias computation for both prompt and decode phases
  • Adds new metadata fields to track chunked block mappings, lists, groups, and usage
  • Integrates chunked attention configuration detection and layer initialization

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
vllm_gaudi/v1/worker/hpu_worker.py Adds empty pass statement without removing warmup call
vllm_gaudi/v1/worker/hpu_model_runner.py Implements chunked attention bias computation, metadata updates, and model initialization
vllm_gaudi/v1/attention/backends/hpu_attn.py Extends decode metadata creation with chunked attention parameters
vllm_gaudi/attention/backends/hpu_attn.py Adds chunked attention metadata fields and attention backend logic

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread vllm_gaudi/v1/worker/hpu_worker.py Outdated
Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py Outdated
Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py Outdated
Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py Outdated
Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py Outdated
Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai>
@github-actions
Copy link
Copy Markdown

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

@github-actions
Copy link
Copy Markdown

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

@github-actions
Copy link
Copy Markdown

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai>
@github-actions
Copy link
Copy Markdown

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

@github-actions
Copy link
Copy Markdown

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

@github-actions
Copy link
Copy Markdown

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

@jkaniecki jkaniecki marked this pull request as ready for review November 14, 2025 13:37
@wpyszka
Copy link
Copy Markdown
Collaborator

wpyszka commented Nov 18, 2025

/run-gaudi-tests

@github-actions
Copy link
Copy Markdown

✅ CI Passed

All checks passed successfully against the following vllm commit:
f71952c1c49fb86686b0b300b727b26282362bf4

Copy link
Copy Markdown
Collaborator

@wpyszka wpyszka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needed in 0.11, approved

@wpyszka wpyszka merged commit 6e1be4e into vllm-project:releases/v0.11.0 Nov 18, 2025
35 checks passed
jkaniecki added a commit to jkaniecki/vllm-gaudi that referenced this pull request Nov 20, 2025
Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants