Skip to content

[Attention Metadata Overhaul 1/N] Extract metadata update to HPUAttentionMetadataProcessor #526

Merged
kzawora-intel merged 13 commits into
mainfrom
private/kzawora/metadata_processor
Dec 22, 2025
Merged

[Attention Metadata Overhaul 1/N] Extract metadata update to HPUAttentionMetadataProcessor #526
kzawora-intel merged 13 commits into
mainfrom
private/kzawora/metadata_processor

Conversation

@kzawora-intel
Copy link
Copy Markdown
Contributor

@kzawora-intel kzawora-intel commented Nov 5, 2025

This PR is pretty simple - it takes all the metadata post-processing logic we do inside adapter, and yeets it from there into a separate class. This shouldn't introduce any functional changes other than a small refactor. In the next PR, I intend to remove metadata postprocessing from the adapter and do it beforehand, on CPU, but I didn't want to introduce too major changes here.

I made this because I absolutely hated how #475 ended up w.r.t. metadata postprocessing, so I'd like to gradually fix it before that PR lands.

Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Copilot AI review requested due to automatic review settings November 5, 2025 15:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors attention metadata post-processing logic by extracting it from HpuModelAdapter into a new dedicated class HPUAttentionMetadataProcessor. The refactoring improves code organization by separating metadata processing concerns from the model adapter, making the codebase more maintainable without introducing functional changes.

Key Changes:

  • Extracted metadata post-processing methods into HPUAttentionMetadataProcessor class
  • Removed metadata processing attributes and methods from HpuModelAdapter
  • Updated HpuModelAdapter.forward() to delegate metadata processing to the new processor

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py Outdated
Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py Outdated
Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py Outdated
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
@kzawora-intel kzawora-intel requested a review from Copilot November 5, 2025 15:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Returns:
Dictionary with post-processed attention metadata
"""
from vllm_gaudi.extension.logger import logger
Copy link

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import statement should be placed at the module level (top of file) rather than inside a method. This violates Python's import conventions and can impact performance when the method is called repeatedly.

Copilot uses AI. Check for mistakes.
if self.interleaved_sliding_window:
self.use_window_sdpa = os.getenv("PT_HPU_SDPA_QKV_SLICE_MODE_FWD", "false").strip().lower() in ("1", "true")
self.slice_size = int(os.getenv("PT_HPU_SDPA_BC_FACTOR", "1024"))
self.slice_thld = int(os.environ.get('VLLM_FUSEDSDPA_SLIDE_THLD', '8192'))
Copy link

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Inconsistent environment variable access methods: line 4627 uses os.getenv() while line 4628 uses os.environ.get(). These are functionally equivalent, but consistency improves readability. Use os.getenv() on both lines.

Suggested change
self.slice_thld = int(os.environ.get('VLLM_FUSEDSDPA_SLIDE_THLD', '8192'))
self.slice_thld = int(os.getenv('VLLM_FUSEDSDPA_SLIDE_THLD', '8192'))

Copilot uses AI. Check for mistakes.
Comment on lines +4702 to +4704
if self.prefill_use_fusedsdpa and self.use_window_sdpa and \
seq_len >= self.slice_thld and self.slice_size != 0 and \
seq_len % self.slice_size == 0 and attn_metadata.block_list is None:
Copy link

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Complex conditional with 6 conditions is difficult to read and maintain. Consider extracting this into a helper method like _should_use_builtin_window_sdpa() that returns a boolean and includes a docstring explaining the conditions.

Copilot uses AI. Check for mistakes.
self.interleaved_sliding_window = is_interleaved(vllm_config.model_config.hf_text_config)

if self.interleaved_sliding_window:
self.use_window_sdpa = os.getenv("PT_HPU_SDPA_QKV_SLICE_MODE_FWD", "false").strip().lower() in ("1", "true")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's not part of your changes, just simple copy-paste, but can we change it to use get_config().FLAG? The same way as everywhere else? "false").strip().lower() in ("1", "true") - this wouldn't be necessary

@github-actions
Copy link
Copy Markdown

✅ CI Passed

All checks passed successfully against the following vllm commit:
2dacd5739409847e91299e7747a142e200fdff6c

Copy link
Copy Markdown
Contributor

@jkaniecki jkaniecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall good change with one modification proposed

Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py Outdated
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py Outdated
@kzawora-intel kzawora-intel enabled auto-merge (squash) December 22, 2025 12:39
@kzawora-intel kzawora-intel merged commit b5a980d into main Dec 22, 2025
50 checks passed
@github-actions
Copy link
Copy Markdown

✅ CI Passed

All checks passed successfully against the following vllm commit:
326e7c31055812277957e3e2b43715b4f366facb

ksmusz pushed a commit that referenced this pull request Jan 23, 2026
Cherry-pick of

6e1be4e
but adapted to recent changes in
#526

---------

Signed-off-by: Katarzyna Fojcik <kfojcik@habana.ai>
testdig pushed a commit to testdig/vllm-gaudi-fork that referenced this pull request Jan 29, 2026
Cherry-pick of

vllm-project@6e1be4e
but adapted to recent changes in
vllm-project#526

---------

Signed-off-by: Katarzyna Fojcik <kfojcik@habana.ai>
Signed-off-by: Wang, Zheng W <zheng.w.wang@intel.com>
slokesha pushed a commit to libinta/vllm-gaudi that referenced this pull request Feb 9, 2026
Cherry-pick of

vllm-project@6e1be4e
but adapted to recent changes in
vllm-project#526

---------

Signed-off-by: Katarzyna Fojcik <kfojcik@habana.ai>
Signed-off-by: slokesha <slokeshappa@habana.ai>
rajanintel24 pushed a commit to rajanintel24/vllm-gaudi that referenced this pull request Feb 11, 2026
…tionMetadataProcessor (vllm-project#526)

This PR is pretty simple - it takes all the metadata post-processing
logic we do inside adapter, and yeets it from there into a separate
class. This shouldn't introduce any functional changes other than a
small refactor. In the next PR, I intend to remove metadata
postprocessing from the adapter and do it beforehand, on CPU, but I
didn't want to introduce too major changes here.

I made this because I absolutely hated how
vllm-project#475 ended up w.r.t.
metadata postprocessing, so I'd like to gradually fix it before that PR
lands.

---------

Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Co-authored-by: Michał Kuligowski <michal.kuligowski@intel.com>
adobrzyn pushed a commit that referenced this pull request Mar 31, 2026
…tionMetadataProcessor (#526)

This PR is pretty simple - it takes all the metadata post-processing
logic we do inside adapter, and yeets it from there into a separate
class. This shouldn't introduce any functional changes other than a
small refactor. In the next PR, I intend to remove metadata
postprocessing from the adapter and do it beforehand, on CPU, but I
didn't want to introduce too major changes here.

I made this because I absolutely hated how
#475 ended up w.r.t.
metadata postprocessing, so I'd like to gradually fix it before that PR
lands.

---------

Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Co-authored-by: Michał Kuligowski <michal.kuligowski@intel.com>
adobrzyn pushed a commit that referenced this pull request Mar 31, 2026
Cherry-pick of

6e1be4e
but adapted to recent changes in
#526

---------

Signed-off-by: Katarzyna Fojcik <kfojcik@habana.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants