cherry-pick chunked attention from #821 + 32k+ context window fix from #855 by Luca-Calabria · Pull Request #870 · vllm-project/vllm-gaudi

Luca-Calabria · 2026-01-23T15:24:17Z

Cherry pick missing fixes:
chunked attention fixes from #821
llama4 32k+ context window #855

Copilot

Pull request overview

This pull request adds support for chunked attention by cherry-picking fixes from PR #821. The changes implement the infrastructure needed to handle models that use chunked attention patterns, ensuring proper attention bias computation and block mapping for both prefill and decode phases.

Changes:

Added chunked attention detection and initialization logic
Extended attention metadata structures with chunked-specific fields
Implemented chunked attention bias computation for prefill and decode phases

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
vllm_gaudi/v1/worker/hpu_model_runner.py	Core implementation of chunked attention support including model detection, metadata processing, attention bias computation, and block mapping
vllm_gaudi/v1/spec_decode/hpu_eagle.py	Added chunked attention metadata fields to speculative decoding
vllm_gaudi/v1/attention/backends/hpu_attn.py	Updated attention metadata creation with chunked attention parameters
vllm_gaudi/attention/backends/hpu_attn.py	Extended attention metadata dataclass and added chunked attention handling in forward pass

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Luca Calabria <luca.calabria@intel.com>

github-actions · 2026-01-23T16:00:09Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Luca Calabria <luca.calabria@intel.com>

github-actions · 2026-01-23T16:00:39Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

github-actions · 2026-01-26T09:07:14Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

cherry-pick chunked attention

4b57887

Luca-Calabria requested a review from mgawarkiewicz-intel as a code owner January 23, 2026 15:24

Copilot AI review requested due to automatic review settings January 23, 2026 15:24

Luca-Calabria requested review from piotrbocian and wpyszka as code owners January 23, 2026 15:24

Copilot AI reviewed Jan 23, 2026

View reviewed changes

Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py Outdated

Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py Outdated

Luca-Calabria marked this pull request as draft January 23, 2026 15:33

Update vllm_gaudi/v1/worker/hpu_model_runner.py

15ed298

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Luca Calabria <luca.calabria@intel.com>

Update vllm_gaudi/v1/worker/hpu_model_runner.py

1b72163

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Luca Calabria <luca.calabria@intel.com>

fix for 32k+ context window Llama4

326ca33

Luca-Calabria changed the title ~~cherry-pick chunked attention from #821~~ cherry-pick chunked attention from #821 + 32k+ context window fix from #855 Jan 26, 2026

Luca-Calabria closed this by deleting the head repository Jan 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cherry-pick chunked attention from #821 + 32k+ context window fix from #855#870

cherry-pick chunked attention from #821 + 32k+ context window fix from #855#870
Luca-Calabria wants to merge 4 commits into
vllm-project:releases/v0.14.0from
Luca-Calabria:cherry-pick-chunked-attn

Luca-Calabria commented Jan 23, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jan 23, 2026

Uh oh!

github-actions Bot commented Jan 23, 2026

Uh oh!

github-actions Bot commented Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Luca-Calabria commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jan 23, 2026

🚧 CI Blocked

Uh oh!

github-actions Bot commented Jan 23, 2026

🚧 CI Blocked

Uh oh!

github-actions Bot commented Jan 26, 2026

🚧 CI Blocked

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Luca-Calabria commented Jan 23, 2026 •

edited

Loading