Cherry-picks to enable Llama4 Maverick by rsmyrek · Pull Request #882 · vllm-project/vllm-gaudi

rsmyrek · 2026-01-26T23:52:52Z

Following reasoning stated in PR: vllm-project#616 Signed-off-by: Radoslaw Smyrek <radoslawx.smyrek@intel.com>

…ect#837) Signed-off-by: linoy buchnik <lbuchnik@habana.ai> Signed-off-by: Iryna Boiko <iboiko@habana.ai> Co-authored-by: Iryna Boiko <iboiko@habana.ai>

…llm-project#855) Llama4 for `max_model_len > 32k` enable temperature adjustment https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/llama4.py#L719. Enabled adjustment causes tensor `q` shape modification from 2D to 3D: https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/llama4.py#L307. This tensor is passing to `UnqnatizedFusedMoEMetod -> forward`: https://github.com/vllm-project/vllm-gaudi/blob/main/vllm_gaudi/ops/hpu_fused_moe.py#L163 causing invalid reshaping - we trying to return a 3D `output.view` based on 2D output tensor. Found that following PR introduced the bug: vllm-project#680 and vllm-project#684 Cherry-picked from `releases/v0.13.0` --------- Signed-off-by: Artur Fierka <artur.fierka@intel.com>

Signed-off-by: Radoslaw Smyrek <radoslawx.smyrek@intel.com>

github-actions · 2026-01-26T23:53:01Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

github-actions · 2026-01-27T01:49:15Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
d7de043d55d1dd629554467e23874097e1c48993

afierka-intel

LGTM

wpyszka

0.14.1 approved

Reverts part of: #882 Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>

1. vllm-project#805 2. vllm-project#837 3. vllm-project#855 4. vllm-project#862 --------- Signed-off-by: Radoslaw Smyrek <radoslawx.smyrek@intel.com> Signed-off-by: linoy buchnik <lbuchnik@habana.ai> Signed-off-by: Iryna Boiko <iboiko@habana.ai> Signed-off-by: Artur Fierka <artur.fierka@intel.com> Co-authored-by: Linoy Buchnik <linoybu@gmail.com> Co-authored-by: Iryna Boiko <iboiko@habana.ai> Co-authored-by: Artur Fierka <artur.fierka@intel.com> Signed-off-by: slokesha <slokeshappa@habana.ai>

Reverts part of: vllm-project#882 Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai> Signed-off-by: slokesha <slokeshappa@habana.ai>

rsmyrek and others added 4 commits January 27, 2026 01:18

Interleaved sliding window fix (vllm-project#805)

4f7f615

Following reasoning stated in PR: vllm-project#616 Signed-off-by: Radoslaw Smyrek <radoslawx.smyrek@intel.com>

[GAUDISW-245665] fix diverge from vllm in multiModalBudget (vllm-proj…

d4ef895

…ect#837) Signed-off-by: linoy buchnik <lbuchnik@habana.ai> Signed-off-by: Iryna Boiko <iboiko@habana.ai> Co-authored-by: Iryna Boiko <iboiko@habana.ai>

Flatten positions only when QK norm is enabled

70d8e72

Signed-off-by: Radoslaw Smyrek <radoslawx.smyrek@intel.com>

rsmyrek marked this pull request as ready for review January 26, 2026 23:54

rsmyrek requested review from mgawarkiewicz-intel, piotrbocian and wpyszka as code owners January 26, 2026 23:54

github-actions Bot mentioned this pull request Jan 27, 2026

🚦 Team Review Dashboard #701

Open

jkaniecki approved these changes Jan 27, 2026

View reviewed changes

wpyszka requested review from adobrzyn, afierka-intel, iboiko-habana and michalkuligowski January 27, 2026 08:38

afierka-intel approved these changes Jan 28, 2026

View reviewed changes

adobrzyn approved these changes Jan 28, 2026

View reviewed changes

wpyszka approved these changes Jan 28, 2026

View reviewed changes

wpyszka merged commit 9ab497d into vllm-project:releases/v0.14.1 Jan 28, 2026
64 of 65 checks passed

adobrzyn mentioned this pull request Jan 28, 2026

Fix MultiModalBudget error #892

Merged

wpyszka pushed a commit that referenced this pull request Jan 29, 2026

Fix MultiModalBudget error (#892)

54d7a41

Reverts part of: #882 Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>

slokesha pushed a commit to libinta/vllm-gaudi that referenced this pull request Jan 29, 2026

Fix MultiModalBudget error (vllm-project#892)

0a74f1b

Reverts part of: vllm-project#882 Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai> Signed-off-by: slokesha <slokeshappa@habana.ai>

rsmyrek deleted the llama4_maverick_enablement branch February 13, 2026 04:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cherry-picks to enable Llama4 Maverick#882

Cherry-picks to enable Llama4 Maverick#882
wpyszka merged 4 commits into
vllm-project:releases/v0.14.1from
rsmyrek:llama4_maverick_enablement

rsmyrek commented Jan 26, 2026

Uh oh!

github-actions Bot commented Jan 26, 2026

Uh oh!

github-actions Bot commented Jan 27, 2026

Uh oh!

afierka-intel left a comment

Uh oh!

wpyszka left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

rsmyrek commented Jan 26, 2026

Uh oh!

github-actions Bot commented Jan 26, 2026

🚧 CI Blocked

Uh oh!

github-actions Bot commented Jan 27, 2026

✅ CI Passed

Uh oh!

afierka-intel left a comment

Choose a reason for hiding this comment

Uh oh!

wpyszka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants