Skip to content

[Attention Metadata Overhaul 3/N] Add per-layer attention metadata#475

Closed
kzawora-intel wants to merge 2 commits into
mainfrom
private/kzawora/per_layer_attn_metadata
Closed

[Attention Metadata Overhaul 3/N] Add per-layer attention metadata#475
kzawora-intel wants to merge 2 commits into
mainfrom
private/kzawora/per_layer_attn_metadata

Conversation

@kzawora-intel
Copy link
Copy Markdown
Contributor

No description provided.

Signed-off-by: Konrad Zawora <kzawora@habana.ai>
@github-actions
Copy link
Copy Markdown

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

@kzawora-intel kzawora-intel marked this pull request as draft October 24, 2025 13:21
Comment on lines +541 to +553
first_metadata = next(iter(kwargs['attn_metadata'].values()))

updated_metadata = self._update_metadata(first_metadata, input_ids.size(0), input_ids.size(1),
input_ids.device, self.dtype)
for key in kwargs['attn_metadata']:
if kwargs['attn_metadata'][key] is first_metadata:
kwargs['attn_metadata'][key] = updated_metadata
else:
msg = f"Different attn_metadata encountered on layer {key}. Updating it individually."
logger.warning(msg)
kwargs['attn_metadata'][key] = self._update_metadata(kwargs['attn_metadata'][key],
input_ids.size(0), input_ids.size(1),
input_ids.device, self.dtype)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use _update_metadata_dict here ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because both are trash, i'm currently overhauling this specific part

Comment on lines +611 to +615
class HashableDict(dict):

def __hash__(self):
return hash((frozenset(self), frozenset(self.values())))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used somewhere ?

Copy link
Copy Markdown
Contributor Author

@kzawora-intel kzawora-intel Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was used initially, i've removed it since, will delete

Comment on lines +4258 to +4260
kv_cache_config = deepcopy(kv_cache_config)
self.kv_cache_config = kv_cache_config

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will need to access at least self.kv_cache_config.kv_cache_groups when we'll be preparing different metadata instances for different attention types - each group should have one distinct instance of attn metadata - it's how we can distinguish between chunked and non-chunked attn, or any other hybrid shenanigans. in the current state, we don't bother and just prepare one instance regardless of number of groups - i'm not sure if i'll complete it in this PR tho or the next ones, since it's already a big can of worms

@github-actions
Copy link
Copy Markdown

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

@kzawora-intel kzawora-intel changed the title [Attention Metadata Overhaul 1/N] Add per-layer attention metadata [Attention Metadata Overhaul 2/N] Add per-layer attention metadata Nov 5, 2025
@kzawora-intel kzawora-intel changed the title [Attention Metadata Overhaul 2/N] Add per-layer attention metadata [Attention Metadata Overhaul 3/N] Add per-layer attention metadata Nov 5, 2025
kzawora-intel added a commit that referenced this pull request Dec 22, 2025
…tionMetadataProcessor (#526)

This PR is pretty simple - it takes all the metadata post-processing
logic we do inside adapter, and yeets it from there into a separate
class. This shouldn't introduce any functional changes other than a
small refactor. In the next PR, I intend to remove metadata
postprocessing from the adapter and do it beforehand, on CPU, but I
didn't want to introduce too major changes here.

I made this because I absolutely hated how
#475 ended up w.r.t.
metadata postprocessing, so I'd like to gradually fix it before that PR
lands.

---------

Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Co-authored-by: Michał Kuligowski <michal.kuligowski@intel.com>
@kzawora-intel
Copy link
Copy Markdown
Contributor Author

@jkaniecki will take over this PR

rajanintel24 pushed a commit to rajanintel24/vllm-gaudi that referenced this pull request Feb 11, 2026
…tionMetadataProcessor (vllm-project#526)

This PR is pretty simple - it takes all the metadata post-processing
logic we do inside adapter, and yeets it from there into a separate
class. This shouldn't introduce any functional changes other than a
small refactor. In the next PR, I intend to remove metadata
postprocessing from the adapter and do it beforehand, on CPU, but I
didn't want to introduce too major changes here.

I made this because I absolutely hated how
vllm-project#475 ended up w.r.t.
metadata postprocessing, so I'd like to gradually fix it before that PR
lands.

---------

Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Co-authored-by: Michał Kuligowski <michal.kuligowski@intel.com>
Copilot AI added a commit that referenced this pull request Mar 31, 2026
- Refactor get_kv_cache_spec to delegate to attn_module.get_kv_cache_spec()
- Add key encoding/decoding utilities with distinctive prefix (__lyr_)
- Update subtuple to support key encoding for layer name fields
- Convert _form_prefill_batch to produce per-layer attn_metadata dict
- Convert _create_decode_input_data to produce per-layer attn_metadata dict
- Add _build_per_layer_metadata helper (deduplicated from 3 places in original PR)
- Update _execute_model_generic to handle per-layer attn_metadata dict
- Update HpuModelAdapter.forward to handle per-layer attn_metadata dict
- Update hpu_worker.py: FullAttentionSpec -> AttentionSpec isinstance check
- Remove unused imports (MLAAttention, MambaBase, MLAAttentionSpec)
- No dead code: no HashableDict or unused _update_metadata_dict

Agent-Logs-Url: https://github.com/vllm-project/vllm-gaudi/sessions/c6acba10-1e18-469e-a594-ba802a6396fb

Co-authored-by: michalkuligowski <23379006+michalkuligowski@users.noreply.github.com>
adobrzyn pushed a commit that referenced this pull request Mar 31, 2026
…tionMetadataProcessor (#526)

This PR is pretty simple - it takes all the metadata post-processing
logic we do inside adapter, and yeets it from there into a separate
class. This shouldn't introduce any functional changes other than a
small refactor. In the next PR, I intend to remove metadata
postprocessing from the adapter and do it beforehand, on CPU, but I
didn't want to introduce too major changes here.

I made this because I absolutely hated how
#475 ended up w.r.t.
metadata postprocessing, so I'd like to gradually fix it before that PR
lands.

---------

Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Co-authored-by: Michał Kuligowski <michal.kuligowski@intel.com>
@adobrzyn adobrzyn closed this Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants