Skip to content

Conversation

@MohitIntel
Copy link

@MohitIntel MohitIntel commented Jul 24, 2025

@libinta libinta changed the base branch from v1.22.0_next to mdeopujari/merge_PR1589 July 24, 2025 06:13
@MohitIntel MohitIntel force-pushed the mdeopujari/new_rebase_PR1635 branch from 542611e to 8e07f7f Compare July 24, 2025 07:02
Copy link

@michalkuligowski michalkuligowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

base for this PR is not vllm-fork branch is it needed? if so please fix precommit issues

Base automatically changed from mdeopujari/merge_PR1589 to v1.22.0_next July 24, 2025 15:07
@MohitIntel MohitIntel force-pushed the mdeopujari/new_rebase_PR1635 branch from e01eae7 to 518dab2 Compare July 24, 2025 22:12
@MohitIntel
Copy link
Author

base for this PR is not vllm-fork branch is it needed? if so please fix precommit issues

It has now been rebased on vllm-fork v1.22.0_next after yesterday's merge of PR1616.

@MohitIntel MohitIntel dismissed michalkuligowski’s stale review July 24, 2025 23:48

This branch has now been rebased on latest vllm-fork v1.22.0_next after yesterdays' PR1616 got merged.

@MohitIntel
Copy link
Author

@michalkuligowski , This is ready to merge to v1.22.0_next.

@xuechendi
Copy link

/run-gaudi-tests

setuptools>=77.0.3
setuptools-scm>=8
vllm-hpu-extension @ git+https://github.com/HabanaAI/vllm-hpu-extension.git@5135570
vllm-hpu-extension @ git+https://github.com/HabanaAI/vllm-hpu-extension.git@009adb2

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is almost identical to #1660. Please apply comments from there

@michalkuligowski
Copy link

closing this is done in #1660

wpyszka pushed a commit that referenced this pull request Aug 1, 2025
This PR contains following changes
1. Port Gemma3 SLIDING_WINDOW FusedSDPA feature from habana_main + Add a
few extra fixes including..
- Sliding FusedSDPA kernel, we are adding threshold variable to enable
or disable to use optimized kernel. This kernel will be
performance/memory benefit for longer sequence. We are providing
environment variable to control per customer request.
- Based on the threshold, choose different prompt bucket, if it's
smaller than the threshold, use PROMPT_BUCKET_STEP, otherwise use
SLICE_SIZE.
 - Added mark_step before SLIDING FusedSDPA is run. 
 - Misc fixes for bucket related issue. 
 2. upstream fixes
 vllm-project#18732
vllm-project#21479
vllm-project#19788

3. optimized Gemma3RMSNorm with FusedRMSNorm
Dependent on #1647 


Run command with. 
VLLM_FUSEDSDPA_SLIDE_THLD=2048 VLLM_EXPONENTIAL_BUCKETING=false
VLLM_PROMPT_BS_BUCKET_MAX=64 VLLM_PROMPT_SEQ_BUCKET_STEP=1024
VLLM_PROMPT_SEQ_BUCKET_MAX=20480 PT_HPU_SDPA_QKV_SLICE_MODE_FWD=1

---------

Signed-off-by: Lukas Geiger <[email protected]>
Signed-off-by: Hongmin Fan <[email protected]>
Co-authored-by: Henry Tang <[email protected]>
Co-authored-by: Mohit Deopujari <[email protected]>
Co-authored-by: Shiv Kaul <[email protected]>
Co-authored-by: Shiv Kaul <[email protected]>
Co-authored-by: Libin Tang <[email protected]>
Co-authored-by: Lukas Geiger <[email protected]>
Co-authored-by: Hongmin Fan <[email protected]>
Co-authored-by: Harish Subramony <[email protected]>
Co-authored-by: Jianhong-Zhang <[email protected]>
Co-authored-by: Libin Tang <[email protected]>
Co-authored-by: Michał Kuligowski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants