cherry pick Llama4 on apply patches + QK flatten pos + perf drop by Luca-Calabria · Pull Request #942 · vllm-project/vllm-gaudi

Luca-Calabria · 2026-02-06T10:10:58Z

Cherry pick Llama4 missing fixes from #881 #862 #884 on releases/ branch

Signed-off-by: Luca Calabria <luca.calabria@intel.com>

Copilot

Pull request overview

This PR cherry-picks Llama4 fixes from PRs #881, #862, and #884 onto the releases branch. The changes focus on improving the Llama4 attention scale calculation and refactoring model-specific patches.

Changes:

Refactored _get_attn_scale_for_hpu to directly implement attention scale logic instead of wrapping original method
Moved maybe_set_chunked_attention_layers from instance method to module-level function
Updated apply_model_specific_patches to accept model_runner instead of model parameter

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-06T10:11:37Z

+            for layer in model_runner.model.language_model.model.layers:
+                if "ChunkedLocalAttention" in layer.self_attn.attn.get_attn_backend().__name__:
+                    layer.self_attn.attn.impl.is_chunked_attention = True
+        except Exception:
+            # add explicit warning
+            pass


The comment indicates a warning should be added, but the exception handler silently passes. Either add the intended warning message or remove the misleading comment.

github-actions · 2026-02-06T13:54:17Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
1892993bc18e243e2c05841314c5e9c06a80c70d

wpyszka

approved for 0.15.1

cherry pick Llama4 on apply patches + QK flatten pos + perf drop

9026505

Signed-off-by: Luca Calabria <luca.calabria@intel.com>

Luca-Calabria requested review from mgawarkiewicz-intel, piotrbocian and wpyszka as code owners February 6, 2026 10:10

Copilot AI review requested due to automatic review settings February 6, 2026 10:10

Copilot AI reviewed Feb 6, 2026

View reviewed changes

adobrzyn approved these changes Feb 6, 2026

View reviewed changes

github-actions Bot mentioned this pull request Feb 6, 2026

🚦 Team Review Dashboard #701

Open

jkaniecki approved these changes Feb 6, 2026

View reviewed changes

wpyszka approved these changes Feb 9, 2026

View reviewed changes

wpyszka merged commit c0675d9 into vllm-project:releases/v0.15.1 Feb 9, 2026
55 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cherry pick Llama4 on apply patches + QK flatten pos + perf drop#942

cherry pick Llama4 on apply patches + QK flatten pos + perf drop#942
wpyszka merged 1 commit into
vllm-project:releases/v0.15.1from
Luca-Calabria:cherry_pick_llama4

Luca-Calabria commented Feb 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 6, 2026

Uh oh!

github-actions Bot commented Feb 6, 2026

Uh oh!

wpyszka left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

Luca-Calabria commented Feb 6, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Feb 6, 2026

✅ CI Passed

Uh oh!

wpyszka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants