Add support for chunked attention (#597)#683
Conversation
Cherry-pick of vllm-project@6e1be4e --------- Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai> Signed-off-by: Jan Kaniecki <jan.kaniecki@intel.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR adds support for chunked attention to the vLLM-Gaudi implementation, cherry-picked from the upstream vllm-gaudi repository. Chunked attention divides attention computation into smaller chunks, which can help with memory efficiency and performance for long sequences.
Key changes:
- Added chunked attention bias computation for both prefill and decode phases
- Extended attention metadata structures to include chunked attention fields
- Integrated chunked attention configuration detection and layer setup
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| vllm_gaudi/v1/worker/hpu_model_runner.py | Core implementation of chunked attention including bias computation, block mapping, metadata updates, and model initialization logic |
| vllm_gaudi/v1/attention/backends/hpu_attn.py | Updated decode metadata factory method to accept chunked attention parameters |
| vllm_gaudi/attention/backends/hpu_attn.py | Added chunked attention metadata fields and logic to select appropriate attention blocks during forward pass |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
✅ CI PassedAll checks passed successfully against the following vllm commit: |
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
✅ CI PassedAll checks passed successfully against the following vllm commit: |
✅ CI PassedAll checks passed successfully against the following vllm commit: |
|
@kzawora-intel please review and approve after resolving conflicts |
|
This PR has been already merged here #821 |
|
#821 - wasn't this done here already? |
Cherry-pick of
6e1be4e