Skip to content

cherry pick Llama4 on apply patches + QK flatten pos + perf drop#942

Merged
wpyszka merged 1 commit into
vllm-project:releases/v0.15.1from
Luca-Calabria:cherry_pick_llama4
Feb 9, 2026
Merged

cherry pick Llama4 on apply patches + QK flatten pos + perf drop#942
wpyszka merged 1 commit into
vllm-project:releases/v0.15.1from
Luca-Calabria:cherry_pick_llama4

Conversation

@Luca-Calabria
Copy link
Copy Markdown
Contributor

Cherry pick Llama4 missing fixes from #881 #862 #884 on releases/ branch

Signed-off-by: Luca Calabria <luca.calabria@intel.com>
Copilot AI review requested due to automatic review settings February 6, 2026 10:10
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR cherry-picks Llama4 fixes from PRs #881, #862, and #884 onto the releases branch. The changes focus on improving the Llama4 attention scale calculation and refactoring model-specific patches.

Changes:

  • Refactored _get_attn_scale_for_hpu to directly implement attention scale logic instead of wrapping original method
  • Moved maybe_set_chunked_attention_layers from instance method to module-level function
  • Updated apply_model_specific_patches to accept model_runner instead of model parameter

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +423 to +428
for layer in model_runner.model.language_model.model.layers:
if "ChunkedLocalAttention" in layer.self_attn.attn.get_attn_backend().__name__:
layer.self_attn.attn.impl.is_chunked_attention = True
except Exception:
# add explicit warning
pass
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment indicates a warning should be added, but the exception handler silently passes. Either add the intended warning message or remove the misleading comment.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 6, 2026

✅ CI Passed

All checks passed successfully against the following vllm commit:
1892993bc18e243e2c05841314c5e9c06a80c70d

Copy link
Copy Markdown
Collaborator

@wpyszka wpyszka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved for 0.15.1

@wpyszka wpyszka merged commit c0675d9 into vllm-project:releases/v0.15.1 Feb 9, 2026
55 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants