[Attention Metadata Overhaul 2/N] Move metadata processing outside HPUModelAdapter by kzawora-intel · Pull Request #530 · vllm-project/vllm-gaudi

kzawora-intel · 2025-11-05T15:24:43Z

requires #526, the next logical step - we remove usage of metadata postprocessor inside HpuModelAdapter and do it at input preparation time, and on CPU, copying data asynchronously to HPU. I needed also to change some stuff around for the processor to accept untrimmed metadata - this works as-is, but unfortunately I've noticed pretty significant performance drop in small models e2e perf.

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

…ata_processor

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

Copilot

Pull Request Overview

This PR moves HPU attention metadata processing from the HpuModelAdapter into a dedicated HPUAttentionMetadataProcessor class, allowing metadata biases to be computed on CPU and copied asynchronously to HPU. This refactoring removes metadata processing logic from the model forward path and handles it at input preparation time instead.

Key Changes:

Extracted metadata processing into a standalone HPUAttentionMetadataProcessor class
Moved metadata processing to occur during input preparation (prefill/decode batch formation) rather than in model forward
Added support for processing metadata on CPU with async copy to HPU device

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

…or' into private/kzawora/metadata_process_cpu Signed-off-by: Konrad Zawora <kzawora@habana.ai>

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

Copilot

Pull Request Overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

Copilot

Pull Request Overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-05T15:58:19Z

+def metadata_update_with_trim(obj: object, typename: str, trim: bool, **to_override):
+    if trim:
+        return custom_tuple_replace(obj, typename, **to_override)
+
+    for key in to_override:
+        assert hasattr(obj, key), f"Field {key} must exist in untrimmed metadata."
+        setattr(obj, key, to_override[key])
+    return obj


The function metadata_update_with_trim lacks a docstring explaining its purpose, parameters, return value, and the distinction between trimmed and untrimmed metadata handling. This is especially important given the conditional logic and the use of setattr for dynamic attribute modification.

Copilot · 2025-11-05T15:58:20Z

+        assert seq_lens_t is not None, "seq_lens_tensor is required to build attn_bias"
+        context_lens_t = prefill_metadata.context_lens_tensor
+        assert context_lens_t is not None, "context_lens_tensor is required to build attn_bias"


The error message should be more specific by indicating which phase (prefill) or operation is being performed when this assertion fails, to help with debugging.

Suggested change

assert seq_lens_t is not None, "seq_lens_tensor is required to build attn_bias"

context_lens_t = prefill_metadata.context_lens_tensor

assert context_lens_t is not None, "context_lens_tensor is required to build attn_bias"

assert seq_lens_t is not None, "seq_lens_tensor is required to build attn_bias during prefill (prompt) phase"

context_lens_t = prefill_metadata.context_lens_tensor

assert context_lens_t is not None, "context_lens_tensor is required to build attn_bias during prefill (prompt) phase"

Copilot · 2025-11-05T15:58:20Z

+        seq_lens_t = prefill_metadata.seq_lens_tensor
+        assert seq_lens_t is not None, "seq_lens_tensor is required to build attn_bias"
+        context_lens_t = prefill_metadata.context_lens_tensor
+        assert context_lens_t is not None, "context_lens_tensor is required to build attn_bias"


The error message should be more specific by indicating which phase (prefill) or operation is being performed when this assertion fails, to help with debugging.

Suggested change

assert context_lens_t is not None, "context_lens_tensor is required to build attn_bias"

assert context_lens_t is not None, "context_lens_tensor is required to build attn_bias during prefill phase"

Copilot · 2025-11-05T15:58:20Z

+
+        if self.prefill_use_fusedsdpa and attn_metadata.block_list is not None:
+            context_lens_t = prefill_metadata.context_lens_tensor
+            assert context_lens_t is not None, "context_lens_tensor is required to build attn_bias"


The error message should be more specific by indicating this is for sliding window attention to aid debugging when this assertion fails.

Suggested change

assert context_lens_t is not None, "context_lens_tensor is required to build attn_bias"

assert context_lens_t is not None, "context_lens_tensor is required to build attn_bias for sliding window attention"

Copilot · 2025-11-05T15:58:21Z

+            # NOTE(kzawora): I'm not sure why we set block mapping twice for sliding window
+            # - we should check if that can be reduced to a single call.


This TODO-style comment expresses uncertainty about the implementation. Either investigate and resolve this concern, or rephrase as a clearer explanation if the double call is intentional (e.g., for separate window and non-window blocks).

Suggested change

# NOTE(kzawora): I'm not sure why we set block mapping twice for sliding window

# - we should check if that can be reduced to a single call.

# For sliding window, we set block mapping twice: once for the base mapping and once for the sliding window mapping.

# This ensures both standard and sliding window block mappings are correctly applied.

jkaniecki · 2025-12-17T16:02:12Z



+def metadata_update_with_trim(obj: object, typename: str, trim: bool, **to_override):
+    if trim:


I can't find a place where we use trimmed replace - do we need this boolean ?

jkaniecki · 2025-12-17T16:04:53Z

    return _TYPE_CACHE[typename]['type'](**values)  # type: ignore


+def metadata_update_with_trim(obj: object, typename: str, trim: bool, **to_override):


I can't find a place where we use trimmed replace - do we need typename and trim args ? IF not, we can just rename this function to metadata_update

jkaniecki · 2025-12-17T16:05:48Z

+                           dst_device: torch.device,
+                           dtype: torch.dtype,
+                           is_window_block: bool = False,
+                           trim: bool = False) -> HPUAttentionMetadataV1:


jkaniecki · 2025-12-17T16:06:14Z

+                                                     "TrimmedAttentionMetadata",
+                                                     trim=trim,


jkaniecki · 2025-12-17T16:06:38Z

+                                                     "TrimmedAttentionMetadata",
+                                                     trim=trim,


jkaniecki · 2025-12-17T16:08:46Z

+
+    def _set_attn_bias_for_sliding_window(self, attn_metadata: HPUAttentionMetadataV1, batch_size: int, seq_len: int,
+                                          window_size: int, src_device: torch.device, dst_device: torch.device,
+                                          dtype: torch.dtype, trim: bool) -> HPUAttentionMetadataV1:


trim not needed

jkaniecki · 2025-12-17T16:09:08Z

+                                                  "TrimmedAttentionMetadata",
+                                                  trim=trim,


jkaniecki · 2025-12-17T16:09:15Z

+
+    def _set_attn_bias(self, attn_metadata: HPUAttentionMetadataV1, batch_size: int, seq_len: int,
+                       src_device: torch.device, dst_device: torch.device, dtype: torch.dtype,
+                       trim: bool) -> HPUAttentionMetadataV1:


trim not needed

jkaniecki · 2025-12-17T16:09:29Z

+                                                                 torch.device('cpu'),
+                                                                 token_ids_device.device,
+                                                                 self.dtype,
+                                                                 trim=False)


trim not needed

jkaniecki · 2025-12-17T16:09:36Z

+                                                                 torch.device('cpu'),
+                                                                 token_ids.device,
+                                                                 self.dtype,
+                                                                 trim=False)


trim not needed

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

…or' into HEAD Signed-off-by: Konrad Zawora <kzawora@habana.ai>

kzawora-intel added 7 commits November 5, 2025 13:44

WA for preemptions

8ab19f2

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

Fix spec decode & unified attn preemptions

602ac39

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

code cleanup

1391286

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

Extract metadata update to HPUAttentionMetadataProcessor

0f8b45f

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

Merge remote-tracking branch 'origin/main' into private/kzawora/metad…

6cdfbb6

…ata_processor

use vllm_config

437b0c3

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

Move metadata processing outside HPUModelAdapter, process biases on CPU

5b79e4c

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

Copilot AI review requested due to automatic review settings November 5, 2025 15:24

kzawora-intel requested review from adobrzyn, afierka-intel, iboiko-habana, mgawarkiewicz-intel, michalkuligowski, mswiniarsk, vivekgoe and xuechendi as code owners November 5, 2025 15:24

kzawora-intel changed the title ~~[Attention Metadata Overhaul 2/N] Move metadata processing outside HPUModelAdapter, process biases on CPU~~ [Attention Metadata Overhaul 2/N] Move metadata processing outside HPUModelAdapter, prepare biases on CPU Nov 5, 2025

Copilot AI reviewed Nov 5, 2025

View reviewed changes

kzawora-intel added 4 commits November 5, 2025 17:32

fix precommit

00c22bc

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

make copilot happy

1a5a3e1

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

Merge remote-tracking branch 'origin/private/kzawora/metadata_process…

e446c09

…or' into private/kzawora/metadata_process_cpu Signed-off-by: Konrad Zawora <kzawora@habana.ai>

make copilot happy

09d7130

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

Copilot AI review requested due to automatic review settings November 5, 2025 15:44

Copilot AI reviewed Nov 5, 2025

View reviewed changes

Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py Outdated

Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py Outdated

kzawora-intel added 3 commits November 5, 2025 17:47

make copilot happy again

e35abfb

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

make copilot happy again

36aef30

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

GODDAMN COMMA

138ecaf

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

Copilot AI review requested due to automatic review settings November 5, 2025 15:56

Copilot AI reviewed Nov 5, 2025

View reviewed changes

PatrykWo self-requested a review November 5, 2025 16:07

PatrykWo removed their request for review November 5, 2025 16:07

Merge branch 'main' into private/kzawora/metadata_processor

6d5bde5

github-actions Bot mentioned this pull request Dec 8, 2025

🚦 Team Review Dashboard #701

Open

jkaniecki suggested changes Dec 17, 2025

View reviewed changes

kzawora-intel added 3 commits December 19, 2025 15:32

Merge remote-tracking branch 'origin/main' into HEAD

bceacd8

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

restore _update_metadata so torch.compile is happy

03a61a4

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

Merge remote-tracking branch 'origin/private/kzawora/metadata_process…

2c4f8f4

…or' into HEAD Signed-off-by: Konrad Zawora <kzawora@habana.ai>

kzawora-intel requested review from kamil-kaczor and ksmusz as code owners December 19, 2025 13:43

jkaniecki approved these changes Dec 19, 2025

View reviewed changes

kzawora-intel changed the title ~~[Attention Metadata Overhaul 2/N] Move metadata processing outside HPUModelAdapter, prepare biases on CPU~~ [Attention Metadata Overhaul 2/N] Move metadata processing outside HPUModelAdapter Jan 29, 2026

adobrzyn marked this pull request as draft February 5, 2026 09:37

iboiko-habana closed this May 5, 2026

	assert context_lens_t is not None, "context_lens_tensor is required to build attn_bias"
	assert context_lens_t is not None, "context_lens_tensor is required to build attn_bias during prefill phase"

		# NOTE(kzawora): I'm not sure why we set block mapping twice for sliding window
		# - we should check if that can be reduced to a single call.



		def metadata_update_with_trim(obj: object, typename: str, trim: bool, **to_override):
		if trim:

		return _TYPE_CACHE[typename]['type'](**values) # type: ignore


		def metadata_update_with_trim(obj: object, typename: str, trim: bool, **to_override):

Conversation

kzawora-intel commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kzawora-intel commented Nov 5, 2025 •

edited

Loading