[Attention Metadata Overhaul 3/N] Add per-layer attention metadata#475
[Attention Metadata Overhaul 3/N] Add per-layer attention metadata#475kzawora-intel wants to merge 2 commits into
Conversation
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
| first_metadata = next(iter(kwargs['attn_metadata'].values())) | ||
|
|
||
| updated_metadata = self._update_metadata(first_metadata, input_ids.size(0), input_ids.size(1), | ||
| input_ids.device, self.dtype) | ||
| for key in kwargs['attn_metadata']: | ||
| if kwargs['attn_metadata'][key] is first_metadata: | ||
| kwargs['attn_metadata'][key] = updated_metadata | ||
| else: | ||
| msg = f"Different attn_metadata encountered on layer {key}. Updating it individually." | ||
| logger.warning(msg) | ||
| kwargs['attn_metadata'][key] = self._update_metadata(kwargs['attn_metadata'][key], | ||
| input_ids.size(0), input_ids.size(1), | ||
| input_ids.device, self.dtype) |
There was a problem hiding this comment.
Why not use _update_metadata_dict here ?
There was a problem hiding this comment.
because both are trash, i'm currently overhauling this specific part
| class HashableDict(dict): | ||
|
|
||
| def __hash__(self): | ||
| return hash((frozenset(self), frozenset(self.values()))) | ||
|
|
There was a problem hiding this comment.
Is this used somewhere ?
There was a problem hiding this comment.
was used initially, i've removed it since, will delete
| kv_cache_config = deepcopy(kv_cache_config) | ||
| self.kv_cache_config = kv_cache_config | ||
|
|
There was a problem hiding this comment.
we will need to access at least self.kv_cache_config.kv_cache_groups when we'll be preparing different metadata instances for different attention types - each group should have one distinct instance of attn metadata - it's how we can distinguish between chunked and non-chunked attn, or any other hybrid shenanigans. in the current state, we don't bother and just prepare one instance regardless of number of groups - i'm not sure if i'll complete it in this PR tho or the next ones, since it's already a big can of worms
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
…tionMetadataProcessor (#526) This PR is pretty simple - it takes all the metadata post-processing logic we do inside adapter, and yeets it from there into a separate class. This shouldn't introduce any functional changes other than a small refactor. In the next PR, I intend to remove metadata postprocessing from the adapter and do it beforehand, on CPU, but I didn't want to introduce too major changes here. I made this because I absolutely hated how #475 ended up w.r.t. metadata postprocessing, so I'd like to gradually fix it before that PR lands. --------- Signed-off-by: Konrad Zawora <kzawora@habana.ai> Co-authored-by: Michał Kuligowski <michal.kuligowski@intel.com>
|
@jkaniecki will take over this PR |
…tionMetadataProcessor (vllm-project#526) This PR is pretty simple - it takes all the metadata post-processing logic we do inside adapter, and yeets it from there into a separate class. This shouldn't introduce any functional changes other than a small refactor. In the next PR, I intend to remove metadata postprocessing from the adapter and do it beforehand, on CPU, but I didn't want to introduce too major changes here. I made this because I absolutely hated how vllm-project#475 ended up w.r.t. metadata postprocessing, so I'd like to gradually fix it before that PR lands. --------- Signed-off-by: Konrad Zawora <kzawora@habana.ai> Co-authored-by: Michał Kuligowski <michal.kuligowski@intel.com>
- Refactor get_kv_cache_spec to delegate to attn_module.get_kv_cache_spec() - Add key encoding/decoding utilities with distinctive prefix (__lyr_) - Update subtuple to support key encoding for layer name fields - Convert _form_prefill_batch to produce per-layer attn_metadata dict - Convert _create_decode_input_data to produce per-layer attn_metadata dict - Add _build_per_layer_metadata helper (deduplicated from 3 places in original PR) - Update _execute_model_generic to handle per-layer attn_metadata dict - Update HpuModelAdapter.forward to handle per-layer attn_metadata dict - Update hpu_worker.py: FullAttentionSpec -> AttentionSpec isinstance check - Remove unused imports (MLAAttention, MambaBase, MLAAttentionSpec) - No dead code: no HashableDict or unused _update_metadata_dict Agent-Logs-Url: https://github.com/vllm-project/vllm-gaudi/sessions/c6acba10-1e18-469e-a594-ba802a6396fb Co-authored-by: michalkuligowski <23379006+michalkuligowski@users.noreply.github.com>
…tionMetadataProcessor (#526) This PR is pretty simple - it takes all the metadata post-processing logic we do inside adapter, and yeets it from there into a separate class. This shouldn't introduce any functional changes other than a small refactor. In the next PR, I intend to remove metadata postprocessing from the adapter and do it beforehand, on CPU, but I didn't want to introduce too major changes here. I made this because I absolutely hated how #475 ended up w.r.t. metadata postprocessing, so I'd like to gradually fix it before that PR lands. --------- Signed-off-by: Konrad Zawora <kzawora@habana.ai> Co-authored-by: Michał Kuligowski <michal.kuligowski@intel.com>
No description provided.