Add support for chunked attention#560
Conversation
Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai>
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
There was a problem hiding this comment.
Pull Request Overview
This PR adds support for chunked attention, a mechanism that restricts attention within fixed-size chunks during both prefill and decode phases. The implementation includes computing chunked attention biases, managing chunked block mappings, and integrating these features into the existing attention metadata infrastructure.
Key changes:
- Implements chunked attention bias computation for both prompt and decode phases
- Adds new metadata fields to track chunked block mappings, lists, groups, and usage
- Integrates chunked attention configuration detection and layer initialization
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| vllm_gaudi/v1/worker/hpu_worker.py | Adds empty pass statement without removing warmup call |
| vllm_gaudi/v1/worker/hpu_model_runner.py | Implements chunked attention bias computation, metadata updates, and model initialization |
| vllm_gaudi/v1/attention/backends/hpu_attn.py | Extends decode metadata creation with chunked attention parameters |
| vllm_gaudi/attention/backends/hpu_attn.py | Adds chunked attention metadata fields and attention backend logic |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai>
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai>
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
|
/run-gaudi-tests |
✅ CI PassedAll checks passed successfully against the following vllm commit: |
wpyszka
left a comment
There was a problem hiding this comment.
needed in 0.11, approved
Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai>
No description provided.