Fixes from PR#1635 applied to v1.22.0_next branch #1647

MohitIntel · 2025-07-24T03:20:53Z

Depends on vllm-hpu-extension PR302 (Now merged and have updated requirements/hpu.txt in this PR)
Applied gemma3: fix accuracy issue caused by not skipping image on top right from PR gemma3: fix accuracy issue caused by not skipping image on top right #1635 in this branch

michalkuligowski

base for this PR is not vllm-fork branch is it needed? if so please fix precommit issues

FusedSDPA kernel with window_size+causal only works when seq_len is multiple of SLICE_SIZE. If not, fallback to the original implementation which creates attention_mask with window_size

-Move the seq_len check for use_sdpa_window to attn_metadata -Automatically set all environment variable

Additional changes from PR1597

remove print statement

MohitIntel · 2025-07-24T23:48:06Z

base for this PR is not vllm-fork branch is it needed? if so please fix precommit issues

It has now been rebased on vllm-fork v1.22.0_next after yesterday's merge of PR1616.

This branch has now been rebased on latest vllm-fork v1.22.0_next after yesterdays' PR1616 got merged.

MohitIntel · 2025-07-25T16:50:44Z

@michalkuligowski , This is ready to merge to v1.22.0_next.

…sions

xuechendi · 2025-07-25T20:09:21Z

/run-gaudi-tests

michalkuligowski · 2025-07-28T06:31:30Z

requirements/hpu.txt

 setuptools>=77.0.3
 setuptools-scm>=8
-vllm-hpu-extension @ git+https://github.com/HabanaAI/vllm-hpu-extension.git@5135570
+vllm-hpu-extension @ git+https://github.com/HabanaAI/vllm-hpu-extension.git@009adb2


This PR is almost identical to #1660. Please apply comments from there

michalkuligowski · 2025-07-31T08:51:15Z

closing this is done in #1660

This PR contains following changes 1. Port Gemma3 SLIDING_WINDOW FusedSDPA feature from habana_main + Add a few extra fixes including.. - Sliding FusedSDPA kernel, we are adding threshold variable to enable or disable to use optimized kernel. This kernel will be performance/memory benefit for longer sequence. We are providing environment variable to control per customer request. - Based on the threshold, choose different prompt bucket, if it's smaller than the threshold, use PROMPT_BUCKET_STEP, otherwise use SLICE_SIZE. - Added mark_step before SLIDING FusedSDPA is run. - Misc fixes for bucket related issue. 2. upstream fixes vllm-project#18732 vllm-project#21479 vllm-project#19788 3. optimized Gemma3RMSNorm with FusedRMSNorm Dependent on #1647 Run command with. VLLM_FUSEDSDPA_SLIDE_THLD=2048 VLLM_EXPONENTIAL_BUCKETING=false VLLM_PROMPT_BS_BUCKET_MAX=64 VLLM_PROMPT_SEQ_BUCKET_STEP=1024 VLLM_PROMPT_SEQ_BUCKET_MAX=20480 PT_HPU_SDPA_QKV_SLICE_MODE_FWD=1 --------- Signed-off-by: Lukas Geiger <[email protected]> Signed-off-by: Hongmin Fan <[email protected]> Co-authored-by: Henry Tang <[email protected]> Co-authored-by: Mohit Deopujari <[email protected]> Co-authored-by: Shiv Kaul <[email protected]> Co-authored-by: Shiv Kaul <[email protected]> Co-authored-by: Libin Tang <[email protected]> Co-authored-by: Lukas Geiger <[email protected]> Co-authored-by: Hongmin Fan <[email protected]> Co-authored-by: Harish Subramony <[email protected]> Co-authored-by: Jianhong-Zhang <[email protected]> Co-authored-by: Libin Tang <[email protected]> Co-authored-by: Michał Kuligowski <[email protected]>

MohitIntel requested review from PatrykWo, afierka-intel, jikunshang, kzawora-intel, madamczyk-intel, mgawarkiewicz-intel, michalkuligowski, mswiniarsk, vivekgoe and xuechendi as code owners July 24, 2025 03:20

libinta changed the base branch from v1.22.0_next to mdeopujari/merge_PR1589 July 24, 2025 06:13

MohitIntel force-pushed the mdeopujari/new_rebase_PR1635 branch from 542611e to 8e07f7f Compare July 24, 2025 07:02

michalkuligowski previously requested changes Jul 24, 2025

View reviewed changes

Base automatically changed from mdeopujari/merge_PR1589 to v1.22.0_next July 24, 2025 15:07

jiminha and others added 16 commits July 24, 2025 10:57

Added support for FusedSDPA with window_size

3131947

Add envi variable to check validity of window kernel

4247aa1

FusedSDPA kernel with window_size+causal only works when seq_len is multiple of SLICE_SIZE. If not, fallback to the original implementation which creates attention_mask with window_size

Changes after code review

5e2ee1f

-Move the seq_len check for use_sdpa_window to attn_metadata -Automatically set all environment variable

Update hpu extension version to dev commit

bed05d6

Update hpu extension requirement commit id

4f2edff

merge #PR1589 onto v1.22.0

1c45fe7

Update hpu_model_runner.py with PR1597

8769df2

Additional changes from PR1597

Update requirements/hpu.txt based on PR1614

7c049a0

added missing definitions

bffcc67

Gemma3 related changes for 1.22

6e059b6

Update utils.py

30ef70b

remove print statement

more fixes after merging 1597

c56d886

fix bypass_model_exec

25e9196

fix precommit error

917a861

change requirements/hpu.txt mode

22d0809

Modifications from PR#1635 (rebased on PR#1616)

f95cc0f

MohitIntel added 3 commits July 24, 2025 14:27

fixes for pre-commit checks

ad61fa0

minor pre-commit fix

d6d49d1

rebased on latest v1.22.0_next

518dab2

MohitIntel force-pushed the mdeopujari/new_rebase_PR1635 branch from e01eae7 to 518dab2 Compare July 24, 2025 22:12

jiminha mentioned this pull request Jul 25, 2025

Gemma3 v1.22 changes (Sliding_Window feature + few others) #1660

Merged

Updated requirements/hpu.txt SHA ID for latest v1.22.0 vllm-hpu-exten…

47d7f06

…sions

michalkuligowski requested changes Jul 28, 2025

View reviewed changes

michalkuligowski closed this Jul 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixes from PR#1635 applied to v1.22.0_next branch #1647

Fixes from PR#1635 applied to v1.22.0_next branch #1647

Uh oh!

MohitIntel commented Jul 24, 2025 •

edited by github-actions bot

Loading

Uh oh!

michalkuligowski left a comment •

edited

Loading

Uh oh!

MohitIntel commented Jul 24, 2025

Uh oh!

MohitIntel commented Jul 25, 2025

Uh oh!

xuechendi commented Jul 25, 2025

Uh oh!

michalkuligowski Jul 28, 2025

Uh oh!

michalkuligowski commented Jul 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Fixes from PR#1635 applied to v1.22.0_next branch #1647

Fixes from PR#1635 applied to v1.22.0_next branch #1647

Uh oh!

Conversation

MohitIntel commented Jul 24, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michalkuligowski left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MohitIntel commented Jul 24, 2025

Uh oh!

MohitIntel commented Jul 25, 2025

Uh oh!

xuechendi commented Jul 25, 2025

Uh oh!

michalkuligowski Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

michalkuligowski commented Jul 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

MohitIntel commented Jul 24, 2025 •

edited by github-actions bot

Loading

michalkuligowski left a comment •

edited

Loading