Skip to content

Fix for Llama4 Maverick performance drop#904

Merged
wpyszka merged 3 commits into
vllm-project:releases/v0.14.1from
jkaniecki:Fix_maverick_0_14_1
Feb 2, 2026
Merged

Fix for Llama4 Maverick performance drop#904
wpyszka merged 3 commits into
vllm-project:releases/v0.14.1from
jkaniecki:Fix_maverick_0_14_1

Conversation

@jkaniecki
Copy link
Copy Markdown
Contributor

@jkaniecki jkaniecki commented Jan 30, 2026

This is a fix for Maverick performance drop - t.compile does not handle functions with methods as inputs, so to avoid recompilations we need to declare a scale function directly.

Signed-off-by: Jan Kaniecki <jan.kaniecki@intel.com>
Copilot AI review requested due to automatic review settings January 30, 2026 15:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reverts a performance-degrading patch for Llama4 Maverick models by removing the patch_llama4_get_attn_scale function and its invocation. The patch was causing recompilations that significantly reduced performance.

Changes:

  • Removed the patch_llama4_get_attn_scale function that was modifying attention scale behavior for Llama4 models
  • Removed the call to patch_llama4_get_attn_scale from apply_model_specific_patches

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 2, 2026

✅ CI Passed

All checks passed successfully against the following vllm commit:
d7de043d55d1dd629554467e23874097e1c48993

@wpyszka wpyszka merged commit 26d3799 into vllm-project:releases/v0.14.1 Feb 2, 2026
53 checks passed
@michalkuligowski
Copy link
Copy Markdown
Collaborator

Is this needed on main?

Luca-Calabria added a commit to Luca-Calabria/vllm-gaudi that referenced this pull request Feb 6, 2026
Signed-off-by: Luca Calabria <luca.calabria@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants