Fix for Llama4 Maverick performance drop by jkaniecki · Pull Request #904 · vllm-project/vllm-gaudi

jkaniecki · 2026-01-30T15:48:40Z

This is a fix for Maverick performance drop - t.compile does not handle functions with methods as inputs, so to avoid recompilations we need to declare a scale function directly.

Signed-off-by: Jan Kaniecki <jan.kaniecki@intel.com>

Copilot

Pull request overview

This PR reverts a performance-degrading patch for Llama4 Maverick models by removing the patch_llama4_get_attn_scale function and its invocation. The patch was causing recompilations that significantly reduced performance.

Changes:

Removed the patch_llama4_get_attn_scale function that was modifying attention scale behavior for Llama4 models
Removed the call to patch_llama4_get_attn_scale from apply_model_specific_patches

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions · 2026-02-02T09:46:40Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
d7de043d55d1dd629554467e23874097e1c48993

michalkuligowski · 2026-02-03T08:36:57Z

Is this needed on main?

Signed-off-by: Luca Calabria <luca.calabria@intel.com>

Revert part of vllm-project#758 to fix Llama4 Maverick

2c837d2

Signed-off-by: Jan Kaniecki <jan.kaniecki@intel.com>

jkaniecki requested review from mgawarkiewicz-intel, piotrbocian and wpyszka as code owners January 30, 2026 15:48

Copilot AI review requested due to automatic review settings January 30, 2026 15:48

Copilot AI reviewed Jan 30, 2026

View reviewed changes

github-actions Bot mentioned this pull request Jan 30, 2026

🚦 Team Review Dashboard #701

Open

jkaniecki added 2 commits February 2, 2026 08:25

Update hpu_model_runner.py

c4943dd

Update hpu_model_runner.py

634e1f2

mgawarkiewicz-intel approved these changes Feb 2, 2026

View reviewed changes

wpyszka merged commit 26d3799 into vllm-project:releases/v0.14.1 Feb 2, 2026
53 checks passed

Luca-Calabria added a commit to Luca-Calabria/vllm-gaudi that referenced this pull request Feb 6, 2026

fix Llama3 Maverick perf drop from vllm-project#904

fbf8a7c

Signed-off-by: Luca Calabria <luca.calabria@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for Llama4 Maverick performance drop#904

Fix for Llama4 Maverick performance drop#904
wpyszka merged 3 commits into
vllm-project:releases/v0.14.1from
jkaniecki:Fix_maverick_0_14_1

jkaniecki commented Jan 30, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented Feb 2, 2026

Uh oh!

Uh oh!

michalkuligowski commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

jkaniecki commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

github-actions Bot commented Feb 2, 2026

✅ CI Passed

Uh oh!

Uh oh!

michalkuligowski commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jkaniecki commented Jan 30, 2026 •

edited

Loading