Skip to content

Add support for chunked attention (#597)#683

Closed
jkaniecki wants to merge 5 commits into
vllm-project:mainfrom
jkaniecki:main
Closed

Add support for chunked attention (#597)#683
jkaniecki wants to merge 5 commits into
vllm-project:mainfrom
jkaniecki:main

Conversation

@jkaniecki
Copy link
Copy Markdown
Contributor

Cherry-pick of
6e1be4e


Cherry-pick of
vllm-project@6e1be4e

---------

Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai>
Signed-off-by: Jan Kaniecki <jan.kaniecki@intel.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for chunked attention to the vLLM-Gaudi implementation, cherry-picked from the upstream vllm-gaudi repository. Chunked attention divides attention computation into smaller chunks, which can help with memory efficiency and performance for long sequences.

Key changes:

  • Added chunked attention bias computation for both prefill and decode phases
  • Extended attention metadata structures to include chunked attention fields
  • Integrated chunked attention configuration detection and layer setup

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
vllm_gaudi/v1/worker/hpu_model_runner.py Core implementation of chunked attention including bias computation, block mapping, metadata updates, and model initialization logic
vllm_gaudi/v1/attention/backends/hpu_attn.py Updated decode metadata factory method to accept chunked attention parameters
vllm_gaudi/attention/backends/hpu_attn.py Added chunked attention metadata fields and logic to select appropriate attention blocks during forward pass

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py
Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py
Comment thread vllm_gaudi/attention/backends/hpu_attn.py
Comment thread vllm_gaudi/attention/backends/hpu_attn.py
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Dec 4, 2025

✅ CI Passed

All checks passed successfully against the following vllm commit:
1b7c7f5159484063af28cb47809d79e83d3301ec

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Dec 9, 2025

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

@github-actions
Copy link
Copy Markdown

✅ CI Passed

All checks passed successfully against the following vllm commit:
e2ed238885be6af358be1851cd43105b7d036c49

@github-actions
Copy link
Copy Markdown

✅ CI Passed

All checks passed successfully against the following vllm commit:
17fec3af0942da83bcebe2ca0cb4f6ae81c634d8

@PatrykWo
Copy link
Copy Markdown
Collaborator

@kzawora-intel please review and approve after resolving conflicts

@Luca-Calabria
Copy link
Copy Markdown
Contributor

This PR has been already merged here #821
You can close it

@adobrzyn
Copy link
Copy Markdown
Collaborator

adobrzyn commented Feb 4, 2026

#821 - wasn't this done here already?

@jkaniecki jkaniecki closed this Feb 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants