Skip to content

Missing updates for Llama4 on main#940

Merged
wpyszka merged 10 commits into
vllm-project:mainfrom
Luca-Calabria:main
Feb 9, 2026
Merged

Missing updates for Llama4 on main#940
wpyszka merged 10 commits into
vllm-project:mainfrom
Luca-Calabria:main

Conversation

@Luca-Calabria
Copy link
Copy Markdown
Contributor

Added Llama4 missing fixes from #881 #862 #884 on main branch

Signed-off-by: Luca Calabria <luca.calabria@intel.com>
Signed-off-by: Luca Calabria <luca.calabria@intel.com>
Signed-off-by: Luca Calabria <luca.calabria@intel.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR ports Llama4-specific fixes from PRs #881, #862, and #884 to the main branch, focusing on attention scaling and chunked attention layer handling improvements.

Changes:

  • Updated _get_attn_scale_for_hpu implementation to remove closure dependency and match the actual attention scale calculation
  • Refactored chunked attention layer detection to be a standalone function and changed the signature of apply_model_specific_patches to accept model_runner instead of model
  • Consolidated model-specific patches by removing duplicate maybe_set_chunked_attention_layers method from the class

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +428 to +429
# add explicit warning
pass
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment 'add explicit warning' suggests that an exception handler should log a warning, but the current implementation silently ignores exceptions. Consider adding a proper warning message using a logger to help with debugging when chunked attention setup fails.

Copilot uses AI. Check for mistakes.
Signed-off-by: Luca Calabria <luca.calabria@intel.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 6, 2026

✅ CI Passed

All checks passed successfully against the following vllm commit:
17b17c068453e6dc6af79240bb94857ae175cc51

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 7, 2026

✅ CI Passed

All checks passed successfully against the following vllm commit:
17b17c068453e6dc6af79240bb94857ae175cc51

@wpyszka wpyszka enabled auto-merge (squash) February 9, 2026 15:50
@wpyszka wpyszka merged commit d5491ac into vllm-project:main Feb 9, 2026
12 of 13 checks passed
adobrzyn pushed a commit that referenced this pull request Mar 31, 2026
Added Llama4 missing fixes from #881 #862 #884 on main branch

---------

Signed-off-by: Luca Calabria <luca.calabria@intel.com>
Co-authored-by: Wojciech Pyszka <wpyszka@habana.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants