[Attention Metadata Overhaul 1/N] Extract metadata update to HPUAttentionMetadataProcessor by kzawora-intel · Pull Request #526 · vllm-project/vllm-gaudi

kzawora-intel · 2025-11-05T14:25:56Z

This PR is pretty simple - it takes all the metadata post-processing logic we do inside adapter, and yeets it from there into a separate class. This shouldn't introduce any functional changes other than a small refactor. In the next PR, I intend to remove metadata postprocessing from the adapter and do it beforehand, on CPU, but I didn't want to introduce too major changes here.

I made this because I absolutely hated how #475 ended up w.r.t. metadata postprocessing, so I'd like to gradually fix it before that PR lands.

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

…ata_processor

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

Copilot

Pull Request Overview

This PR refactors attention metadata post-processing logic by extracting it from HpuModelAdapter into a new dedicated class HPUAttentionMetadataProcessor. The refactoring improves code organization by separating metadata processing concerns from the model adapter, making the codebase more maintainable without introducing functional changes.

Key Changes:

Extracted metadata post-processing methods into HPUAttentionMetadataProcessor class
Removed metadata processing attributes and methods from HpuModelAdapter
Updated HpuModelAdapter.forward() to delegate metadata processing to the new processor

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

Copilot

Pull Request Overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-05T15:44:57Z

+        Returns:
+            Dictionary with post-processed attention metadata
+        """
+        from vllm_gaudi.extension.logger import logger


Import statement should be placed at the module level (top of file) rather than inside a method. This violates Python's import conventions and can impact performance when the method is called repeatedly.

Copilot · 2025-11-05T15:44:58Z

+        if self.interleaved_sliding_window:
+            self.use_window_sdpa = os.getenv("PT_HPU_SDPA_QKV_SLICE_MODE_FWD", "false").strip().lower() in ("1", "true")
+            self.slice_size = int(os.getenv("PT_HPU_SDPA_BC_FACTOR", "1024"))
+            self.slice_thld = int(os.environ.get('VLLM_FUSEDSDPA_SLIDE_THLD', '8192'))


[nitpick] Inconsistent environment variable access methods: line 4627 uses os.getenv() while line 4628 uses os.environ.get(). These are functionally equivalent, but consistency improves readability. Use os.getenv() on both lines.

Suggested change

self.slice_thld = int(os.environ.get('VLLM_FUSEDSDPA_SLIDE_THLD', '8192'))

self.slice_thld = int(os.getenv('VLLM_FUSEDSDPA_SLIDE_THLD', '8192'))

Copilot · 2025-11-05T15:44:58Z

+        if self.prefill_use_fusedsdpa and self.use_window_sdpa and \
+            seq_len >= self.slice_thld and self.slice_size != 0 and \
+            seq_len % self.slice_size == 0 and attn_metadata.block_list is None:


[nitpick] Complex conditional with 6 conditions is difficult to read and maintain. Consider extracting this into a helper method like _should_use_builtin_window_sdpa() that returns a boolean and includes a docstring explaining the conditions.

adobrzyn · 2025-11-06T12:58:44Z

+        self.interleaved_sliding_window = is_interleaved(vllm_config.model_config.hf_text_config)
+
+        if self.interleaved_sliding_window:
+            self.use_window_sdpa = os.getenv("PT_HPU_SDPA_QKV_SLICE_MODE_FWD", "false").strip().lower() in ("1", "true")


I know it's not part of your changes, just simple copy-paste, but can we change it to use get_config().FLAG? The same way as everywhere else? "false").strip().lower() in ("1", "true") - this wouldn't be necessary

github-actions · 2025-11-13T11:46:33Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
2dacd5739409847e91299e7747a142e200fdff6c

jkaniecki

Overall good change with one modification proposed

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

github-actions · 2025-12-22T13:45:01Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
326e7c31055812277957e3e2b43715b4f366facb

Cherry-pick of 6e1be4e but adapted to recent changes in #526 --------- Signed-off-by: Katarzyna Fojcik <kfojcik@habana.ai>

Cherry-pick of vllm-project@6e1be4e but adapted to recent changes in vllm-project#526 --------- Signed-off-by: Katarzyna Fojcik <kfojcik@habana.ai> Signed-off-by: Wang, Zheng W <zheng.w.wang@intel.com>

Cherry-pick of vllm-project@6e1be4e but adapted to recent changes in vllm-project#526 --------- Signed-off-by: Katarzyna Fojcik <kfojcik@habana.ai> Signed-off-by: slokesha <slokeshappa@habana.ai>

…tionMetadataProcessor (vllm-project#526) This PR is pretty simple - it takes all the metadata post-processing logic we do inside adapter, and yeets it from there into a separate class. This shouldn't introduce any functional changes other than a small refactor. In the next PR, I intend to remove metadata postprocessing from the adapter and do it beforehand, on CPU, but I didn't want to introduce too major changes here. I made this because I absolutely hated how vllm-project#475 ended up w.r.t. metadata postprocessing, so I'd like to gradually fix it before that PR lands. --------- Signed-off-by: Konrad Zawora <kzawora@habana.ai> Co-authored-by: Michał Kuligowski <michal.kuligowski@intel.com>

…tionMetadataProcessor (#526) This PR is pretty simple - it takes all the metadata post-processing logic we do inside adapter, and yeets it from there into a separate class. This shouldn't introduce any functional changes other than a small refactor. In the next PR, I intend to remove metadata postprocessing from the adapter and do it beforehand, on CPU, but I didn't want to introduce too major changes here. I made this because I absolutely hated how #475 ended up w.r.t. metadata postprocessing, so I'd like to gradually fix it before that PR lands. --------- Signed-off-by: Konrad Zawora <kzawora@habana.ai> Co-authored-by: Michał Kuligowski <michal.kuligowski@intel.com>

Cherry-pick of 6e1be4e but adapted to recent changes in #526 --------- Signed-off-by: Katarzyna Fojcik <kfojcik@habana.ai>

kzawora-intel added 5 commits November 5, 2025 13:44

WA for preemptions

8ab19f2

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

Fix spec decode & unified attn preemptions

602ac39

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

code cleanup

1391286

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

Extract metadata update to HPUAttentionMetadataProcessor

0f8b45f

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

Merge remote-tracking branch 'origin/main' into private/kzawora/metad…

6cdfbb6

…ata_processor

kzawora-intel requested review from adobrzyn, afierka-intel, iboiko-habana, mgawarkiewicz-intel, michalkuligowski, mswiniarsk, vivekgoe and xuechendi as code owners November 5, 2025 14:25

use vllm_config

437b0c3

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

kzawora-intel mentioned this pull request Nov 5, 2025

[Attention Metadata Overhaul 2/N] Move metadata processing outside HPUModelAdapter #530

Closed

fix precommit

00c22bc

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

Copilot AI review requested due to automatic review settings November 5, 2025 15:33

Copilot AI reviewed Nov 5, 2025

View reviewed changes

Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py Outdated

Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py Outdated

Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py Outdated

make copilot happy

1a5a3e1

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

kzawora-intel requested a review from Copilot November 5, 2025 15:43

Copilot AI reviewed Nov 5, 2025

View reviewed changes

adobrzyn reviewed Nov 6, 2025

View reviewed changes

Merge branch 'main' into private/kzawora/metadata_processor

6d5bde5

github-actions Bot mentioned this pull request Dec 8, 2025

🚦 Team Review Dashboard #701

Open

jkaniecki suggested changes Dec 17, 2025

View reviewed changes

Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py Outdated

kzawora-intel added 2 commits December 19, 2025 15:32

Merge remote-tracking branch 'origin/main' into HEAD

bceacd8

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

restore _update_metadata so torch.compile is happy

03a61a4

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

kzawora-intel requested review from kamil-kaczor and ksmusz as code owners December 19, 2025 13:35

jkaniecki reviewed Dec 19, 2025

View reviewed changes

Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py Outdated

jkaniecki approved these changes Dec 22, 2025

View reviewed changes

kzawora-intel added 2 commits December 22, 2025 11:43

Merge remote-tracking branch 'origin/main' into HEAD

78eb910

remove _update_metadata

c56124a

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

adobrzyn approved these changes Dec 22, 2025

View reviewed changes

kzawora-intel enabled auto-merge (squash) December 22, 2025 12:39

kzawora-intel merged commit b5a980d into main Dec 22, 2025
50 checks passed

kfojcik-intel mentioned this pull request Jan 15, 2026

Add support for chunked attention #821

Merged

ksmusz pushed a commit that referenced this pull request Jan 23, 2026

Add support for chunked attention (#821)

7e97f22

Cherry-pick of 6e1be4e but adapted to recent changes in #526 --------- Signed-off-by: Katarzyna Fojcik <kfojcik@habana.ai>

adobrzyn pushed a commit that referenced this pull request Mar 31, 2026

Add support for chunked attention (#821)

fd782fb

Cherry-pick of 6e1be4e but adapted to recent changes in #526 --------- Signed-off-by: Katarzyna Fojcik <kfojcik@habana.ai>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Attention Metadata Overhaul 1/N] Extract metadata update to HPUAttentionMetadataProcessor #526

[Attention Metadata Overhaul 1/N] Extract metadata update to HPUAttentionMetadataProcessor #526
kzawora-intel merged 13 commits into
mainfrom
private/kzawora/metadata_processor

kzawora-intel commented Nov 5, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 5, 2025

Uh oh!

Copilot AI Nov 5, 2025

Uh oh!

Copilot AI Nov 5, 2025

Uh oh!

adobrzyn Nov 6, 2025

Uh oh!

github-actions Bot commented Nov 13, 2025

Uh oh!

jkaniecki left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	self.slice_thld = int(os.environ.get('VLLM_FUSEDSDPA_SLIDE_THLD', '8192'))
	self.slice_thld = int(os.getenv('VLLM_FUSEDSDPA_SLIDE_THLD', '8192'))

Conversation

kzawora-intel commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

adobrzyn Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Nov 13, 2025

✅ CI Passed

Uh oh!

jkaniecki left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Dec 22, 2025

✅ CI Passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kzawora-intel commented Nov 5, 2025 •

edited

Loading