[Attention Metadata Overhaul 3/N] Add per-layer attention metadata by kzawora-intel · Pull Request #475 · vllm-project/vllm-gaudi

kzawora-intel · 2025-10-24T13:20:32Z

No description provided.

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

github-actions · 2025-10-24T13:20:50Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

jkaniecki · 2025-10-24T13:30:14Z

+            first_metadata = next(iter(kwargs['attn_metadata'].values()))
+
+            updated_metadata = self._update_metadata(first_metadata, input_ids.size(0), input_ids.size(1),
+                                                     input_ids.device, self.dtype)
+            for key in kwargs['attn_metadata']:
+                if kwargs['attn_metadata'][key] is first_metadata:
+                    kwargs['attn_metadata'][key] = updated_metadata
+                else:
+                    msg = f"Different attn_metadata encountered on layer {key}. Updating it individually."
+                    logger.warning(msg)
+                    kwargs['attn_metadata'][key] = self._update_metadata(kwargs['attn_metadata'][key],
+                                                                         input_ids.size(0), input_ids.size(1),
+                                                                         input_ids.device, self.dtype)


Why not use _update_metadata_dict here ?

because both are trash, i'm currently overhauling this specific part

jkaniecki · 2025-10-24T13:33:03Z

+class HashableDict(dict):
+
+    def __hash__(self):
+        return hash((frozenset(self), frozenset(self.values())))
+


Is this used somewhere ?

was used initially, i've removed it since, will delete

jkaniecki · 2025-10-24T13:39:11Z

+        kv_cache_config = deepcopy(kv_cache_config)
+        self.kv_cache_config = kv_cache_config
+


Why do we need this ?

we will need to access at least self.kv_cache_config.kv_cache_groups when we'll be preparing different metadata instances for different attention types - each group should have one distinct instance of attn metadata - it's how we can distinguish between chunked and non-chunked attn, or any other hybrid shenanigans. in the current state, we don't bother and just prepare one instance regardless of number of groups - i'm not sure if i'll complete it in this PR tho or the next ones, since it's already a big can of worms

github-actions · 2025-10-30T10:09:09Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

…tionMetadataProcessor (#526) This PR is pretty simple - it takes all the metadata post-processing logic we do inside adapter, and yeets it from there into a separate class. This shouldn't introduce any functional changes other than a small refactor. In the next PR, I intend to remove metadata postprocessing from the adapter and do it beforehand, on CPU, but I didn't want to introduce too major changes here. I made this because I absolutely hated how #475 ended up w.r.t. metadata postprocessing, so I'd like to gradually fix it before that PR lands. --------- Signed-off-by: Konrad Zawora <kzawora@habana.ai> Co-authored-by: Michał Kuligowski <michal.kuligowski@intel.com>

kzawora-intel · 2026-01-29T21:48:05Z

@jkaniecki will take over this PR

…tionMetadataProcessor (vllm-project#526) This PR is pretty simple - it takes all the metadata post-processing logic we do inside adapter, and yeets it from there into a separate class. This shouldn't introduce any functional changes other than a small refactor. In the next PR, I intend to remove metadata postprocessing from the adapter and do it beforehand, on CPU, but I didn't want to introduce too major changes here. I made this because I absolutely hated how vllm-project#475 ended up w.r.t. metadata postprocessing, so I'd like to gradually fix it before that PR lands. --------- Signed-off-by: Konrad Zawora <kzawora@habana.ai> Co-authored-by: Michał Kuligowski <michal.kuligowski@intel.com>

- Refactor get_kv_cache_spec to delegate to attn_module.get_kv_cache_spec() - Add key encoding/decoding utilities with distinctive prefix (__lyr_) - Update subtuple to support key encoding for layer name fields - Convert _form_prefill_batch to produce per-layer attn_metadata dict - Convert _create_decode_input_data to produce per-layer attn_metadata dict - Add _build_per_layer_metadata helper (deduplicated from 3 places in original PR) - Update _execute_model_generic to handle per-layer attn_metadata dict - Update HpuModelAdapter.forward to handle per-layer attn_metadata dict - Update hpu_worker.py: FullAttentionSpec -> AttentionSpec isinstance check - Remove unused imports (MLAAttention, MambaBase, MLAAttentionSpec) - No dead code: no HashableDict or unused _update_metadata_dict Agent-Logs-Url: https://github.com/vllm-project/vllm-gaudi/sessions/c6acba10-1e18-469e-a594-ba802a6396fb Co-authored-by: michalkuligowski <23379006+michalkuligowski@users.noreply.github.com>

…tionMetadataProcessor (#526) This PR is pretty simple - it takes all the metadata post-processing logic we do inside adapter, and yeets it from there into a separate class. This shouldn't introduce any functional changes other than a small refactor. In the next PR, I intend to remove metadata postprocessing from the adapter and do it beforehand, on CPU, but I didn't want to introduce too major changes here. I made this because I absolutely hated how #475 ended up w.r.t. metadata postprocessing, so I'd like to gradually fix it before that PR lands. --------- Signed-off-by: Konrad Zawora <kzawora@habana.ai> Co-authored-by: Michał Kuligowski <michal.kuligowski@intel.com>

Add per-layer attention metadata

41a7ca2

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

kzawora-intel requested review from adobrzyn, afierka-intel, iboiko-habana, mgawarkiewicz-intel, michalkuligowski, mswiniarsk, vivekgoe and xuechendi as code owners October 24, 2025 13:20

kzawora-intel marked this pull request as draft October 24, 2025 13:21

jkaniecki reviewed Oct 24, 2025

View reviewed changes

Merge branch 'main' into private/kzawora/per_layer_attn_metadata

dc706d7

kzawora-intel changed the title ~~[Attention Metadata Overhaul 1/N] Add per-layer attention metadata~~ [Attention Metadata Overhaul 2/N] Add per-layer attention metadata Nov 5, 2025

kzawora-intel mentioned this pull request Nov 5, 2025

[Attention Metadata Overhaul 1/N] Extract metadata update to HPUAttentionMetadataProcessor #526

Merged

kzawora-intel changed the title ~~[Attention Metadata Overhaul 2/N] Add per-layer attention metadata~~ [Attention Metadata Overhaul 3/N] Add per-layer attention metadata Nov 5, 2025

adobrzyn closed this Apr 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Attention Metadata Overhaul 3/N] Add per-layer attention metadata#475

[Attention Metadata Overhaul 3/N] Add per-layer attention metadata#475
kzawora-intel wants to merge 2 commits into
mainfrom
private/kzawora/per_layer_attn_metadata

kzawora-intel commented Oct 24, 2025

Uh oh!

github-actions Bot commented Oct 24, 2025

Uh oh!

jkaniecki Oct 24, 2025

Uh oh!

kzawora-intel Oct 30, 2025

Uh oh!

jkaniecki Oct 24, 2025

Uh oh!

kzawora-intel Oct 30, 2025 •

edited

Loading

Uh oh!

jkaniecki Oct 24, 2025

Uh oh!

kzawora-intel Oct 30, 2025

Uh oh!

github-actions Bot commented Oct 30, 2025

Uh oh!

kzawora-intel commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		kv_cache_config = deepcopy(kv_cache_config)
		self.kv_cache_config = kv_cache_config

Conversation

kzawora-intel commented Oct 24, 2025

Uh oh!

github-actions Bot commented Oct 24, 2025

🚧 CI Blocked

Uh oh!

jkaniecki Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

kzawora-intel Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

jkaniecki Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

kzawora-intel Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkaniecki Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

kzawora-intel Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Oct 30, 2025

🚧 CI Blocked

Uh oh!

kzawora-intel commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kzawora-intel Oct 30, 2025 •

edited

Loading