cherry-pick chunked attention from #821 + 32k+ context window fix from #855 by Luca-Calabria · Pull Request #881 · vllm-project/vllm-gaudi

Luca-Calabria · 2026-01-26T22:34:06Z

Cherry pick missing fixes:
chunked attention fixes from #821
llama4 32k+ context window #855

Copilot

Pull request overview

This PR cherry-picks two important fixes: chunked attention support from PR #821 and a fix for Llama4 models with 32k+ context windows from PR #855. The changes enable proper handling of chunked attention patterns and ensure correct attention metadata processing for models using attention chunking.

Changes:

Added chunked attention support throughout the attention pipeline, including metadata handling and bias computation
Fixed output tensor reshaping in fused MoE operations based on data parallel configuration
Integrated chunked attention configuration detection and setup during model loading

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
vllm_gaudi/v1/worker/hpu_model_runner.py	Core implementation of chunked attention support including metadata processing, bias calculation, and model configuration detection
vllm_gaudi/v1/spec_decode/hpu_eagle.py	Added chunked attention metadata fields to speculative decoding
vllm_gaudi/v1/attention/backends/hpu_attn.py	Updated attention metadata factory method to include chunked attention parameters
vllm_gaudi/ops/hpu_fused_moe.py	Fixed tensor reshaping logic for MoE operations with data parallelism
vllm_gaudi/attention/backends/hpu_attn.py	Added chunked attention metadata fields and selection logic in attention implementation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-26T22:35:38Z

+            self.model_has_chunked_attention = True
+            try:
+                for layer in model.language_model.model.layers:
+                    if "ChunkedLocalAttention" in layer.self_attn.attn.get_attn_backend().__name__:


The string comparison 'in' check for class names is fragile and could match unintended class names. Consider using isinstance() or checking class.name with exact equality instead.

Suggested change

if "ChunkedLocalAttention" in layer.self_attn.attn.get_attn_backend().__name__:

backend = layer.self_attn.attn.get_attn_backend()

backend_name = getattr(backend, "__name__", backend.__class__.__name__)

if backend_name == "ChunkedLocalAttention":

Copilot · 2026-01-26T22:35:38Z

+            except Exception:
+                pass


Catching and silently suppressing all exceptions with bare 'except Exception: pass' hides potential configuration or attribute errors. Consider logging the exception or catching more specific exception types.

Copilot · 2026-01-26T22:35:38Z

+            block_tables_chunk = [
+                block_table[num_seq_chunks[i] * chunk_size_in_blocks:]
+                for i, block_table in enumerate(block_tables_list)
+            ]


There is duplicated logic between the chunked attention buffer generation (lines 2152-2164) and the similar pattern for window blocks. Consider extracting this into a helper method to reduce code duplication.

Copilot · 2026-01-26T22:35:38Z

+        if layer.dp_size > 1:
+            return output.view(*(output.size(0), *input_shape[1:]))
+        else:
+            return output.view(*input_shape)


The conditional reshaping logic based on dp_size lacks explanation. Add a comment explaining why different reshaping is needed when data parallelism is enabled versus disabled.

github-actions · 2026-01-27T01:41:12Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
d7de043d55d1dd629554467e23874097e1c48993

github-actions · 2026-01-28T13:08:20Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

Signed-off-by: Luca Calabria <luca.calabria@intel.com>

Due to MambaMixer2 implementation requirements, all buckets used for mamba must be a multiple of mamba chunk size. Signed-off-by: Jakub Byczkowski <jbyczkowski@habana.ai> Signed-off-by: Luca Calabria <luca.calabria@intel.com>

Reverts #780 --------- Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai> Co-authored-by: Chendi.Xue <chendi.xue@intel.com> Signed-off-by: Luca Calabria <luca.calabria@intel.com>

1. #805 2. #837 3. #855 4. #862 --------- Signed-off-by: Radoslaw Smyrek <radoslawx.smyrek@intel.com> Signed-off-by: linoy buchnik <lbuchnik@habana.ai> Signed-off-by: Iryna Boiko <iboiko@habana.ai> Signed-off-by: Artur Fierka <artur.fierka@intel.com> Co-authored-by: Linoy Buchnik <linoybu@gmail.com> Co-authored-by: Iryna Boiko <iboiko@habana.ai> Co-authored-by: Artur Fierka <artur.fierka@intel.com> Signed-off-by: Luca Calabria <luca.calabria@intel.com>

Signed-off-by: Luca Calabria <luca.calabria@intel.com>

1. #805 2. #837 3. #855 4. #862 --------- Signed-off-by: Radoslaw Smyrek <radoslawx.smyrek@intel.com> Signed-off-by: linoy buchnik <lbuchnik@habana.ai> Signed-off-by: Iryna Boiko <iboiko@habana.ai> Signed-off-by: Artur Fierka <artur.fierka@intel.com> Co-authored-by: Linoy Buchnik <linoybu@gmail.com> Co-authored-by: Iryna Boiko <iboiko@habana.ai> Co-authored-by: Artur Fierka <artur.fierka@intel.com> Signed-off-by: Luca Calabria <luca.calabria@intel.com>

github-actions · 2026-01-28T13:20:22Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

…plus_context_fix Signed-off-by: Luca Calabria <luca.calabria@intel.com>

github-actions · 2026-01-28T15:22:34Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
d7de043d55d1dd629554467e23874097e1c48993

wpyszka

fix is approved for 0.14.1

…ndow fix from vllm-project#855 (vllm-project#881) Cherry pick missing fixes: chunked attention fixes from vllm-project#821 llama4 32k+ context window vllm-project#855 --------- Signed-off-by: Luca Calabria <luca.calabria@intel.com> Signed-off-by: Jakub Byczkowski <jbyczkowski@habana.ai> Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai> Signed-off-by: Radoslaw Smyrek <radoslawx.smyrek@intel.com> Signed-off-by: linoy buchnik <lbuchnik@habana.ai> Signed-off-by: Iryna Boiko <iboiko@habana.ai> Signed-off-by: Artur Fierka <artur.fierka@intel.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Jakub Byczkowski <jbyczkowski@habana.ai> Co-authored-by: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com> Co-authored-by: Chendi.Xue <chendi.xue@intel.com> Co-authored-by: Radosław Smyrek <radoslawx.smyrek@intel.com> Co-authored-by: Linoy Buchnik <linoybu@gmail.com> Co-authored-by: Iryna Boiko <iboiko@habana.ai> Co-authored-by: Artur Fierka <artur.fierka@intel.com> Signed-off-by: slokesha <slokeshappa@habana.ai>

Signed-off-by: Luca Calabria <luca.calabria@intel.com>

Cherry pick Llama4 missing fixes from #881 #862 #884 on releases/ branch Signed-off-by: Luca Calabria <luca.calabria@intel.com>

Added Llama4 missing fixes from #881 #862 #884 on main branch --------- Signed-off-by: Luca Calabria <luca.calabria@intel.com> Co-authored-by: Wojciech Pyszka <wpyszka@habana.ai>

Luca-Calabria requested review from mgawarkiewicz-intel, piotrbocian and wpyszka as code owners January 26, 2026 22:34

Copilot AI review requested due to automatic review settings January 26, 2026 22:34

Copilot AI reviewed Jan 26, 2026

View reviewed changes

Luca-Calabria force-pushed the cherry_pick_chunked_attn_and_32kplus_context_fix branch 2 times, most recently from 288b1a7 to 1631984 Compare January 26, 2026 23:49

github-actions Bot mentioned this pull request Jan 27, 2026

🚦 Team Review Dashboard #701

Open

wpyszka requested review from afierka-intel and michalkuligowski January 28, 2026 11:45

Luca-Calabria force-pushed the cherry_pick_chunked_attn_and_32kplus_context_fix branch from 4fd940d to 40bdb59 Compare January 28, 2026 13:08

github-actions Bot and others added 11 commits January 28, 2026 14:19

update CODEOWNERS for v0.14.1

c708f8d

Signed-off-by: Luca Calabria <luca.calabria@intel.com>

cherry pick llama4 chunked attn and 32k+ context fix

b95ee8d

Signed-off-by: Luca Calabria <luca.calabria@intel.com>

add comment

7fafd82

Signed-off-by: Luca Calabria <luca.calabria@intel.com>

fix format

c537112

Signed-off-by: Luca Calabria <luca.calabria@intel.com>

Revert "skip HPU graphs for long prefills" (#850) (#888)

8243450

Reverts #780 --------- Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai> Co-authored-by: Chendi.Xue <chendi.xue@intel.com> Signed-off-by: Luca Calabria <luca.calabria@intel.com>

moved as default function

7e5adea

Signed-off-by: Luca Calabria <luca.calabria@intel.com>

moved as default function

93eb785

Signed-off-by: Luca Calabria <luca.calabria@intel.com>

fix format

1dc2f3b

Signed-off-by: Luca Calabria <luca.calabria@intel.com>

Luca-Calabria force-pushed the cherry_pick_chunked_attn_and_32kplus_context_fix branch from bb5c13f to 0206d3f Compare January 28, 2026 13:19

Merge branch 'releases/v0.14.1' into cherry_pick_chunked_attn_and_32k…

6df24d0

…plus_context_fix Signed-off-by: Luca Calabria <luca.calabria@intel.com>

jkaniecki approved these changes Jan 28, 2026

View reviewed changes

wpyszka approved these changes Jan 28, 2026

View reviewed changes

wpyszka merged commit 82b0e8a into vllm-project:releases/v0.14.1 Jan 28, 2026
53 checks passed

Luca-Calabria deleted the cherry_pick_chunked_attn_and_32kplus_context_fix branch January 29, 2026 09:01

Luca-Calabria added a commit to Luca-Calabria/vllm-gaudi that referenced this pull request Feb 6, 2026

refactoring apply patches for Llama4 from vllm-project#881

778c5cf

Signed-off-by: Luca Calabria <luca.calabria@intel.com>

This was referenced Feb 6, 2026

Missing updates for Llama4 on main #940

Merged

cherry pick Llama4 on apply patches + QK flatten pos + perf drop #942

Merged

wpyszka pushed a commit that referenced this pull request Feb 9, 2026

cherry pick Llama4 on apply patches + QK flatten pos + perf drop (#942)

c0675d9

Cherry pick Llama4 missing fixes from #881 #862 #884 on releases/ branch Signed-off-by: Luca Calabria <luca.calabria@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cherry-pick chunked attention from #821 + 32k+ context window fix from #855#881

cherry-pick chunked attention from #821 + 32k+ context window fix from #855#881
wpyszka merged 12 commits into
vllm-project:releases/v0.14.1from
Luca-Calabria:cherry_pick_chunked_attn_and_32kplus_context_fix

Luca-Calabria commented Jan 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 26, 2026

Uh oh!

Copilot AI Jan 26, 2026

Uh oh!

Copilot AI Jan 26, 2026

Uh oh!

Copilot AI Jan 26, 2026

Uh oh!

github-actions Bot commented Jan 27, 2026

Uh oh!

github-actions Bot commented Jan 28, 2026

Uh oh!

github-actions Bot commented Jan 28, 2026

Uh oh!

github-actions Bot commented Jan 28, 2026

Uh oh!

wpyszka left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

-                    if "ChunkedLocalAttention" in layer.self_attn.attn.get_attn_backend().__name__:
+                    backend = layer.self_attn.attn.get_attn_backend()
+                    backend_name = getattr(backend, "__name__", backend.__class__.__name__)
+                    if backend_name == "ChunkedLocalAttention":

Conversation

Luca-Calabria commented Jan 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jan 27, 2026

✅ CI Passed

Uh oh!

github-actions Bot commented Jan 28, 2026

🚧 CI Blocked

Uh oh!

github-actions Bot commented Jan 28, 2026

🚧 CI Blocked

Uh oh!

github-actions Bot commented Jan 28, 2026

✅ CI Passed

Uh oh!

wpyszka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants