[Attention Metadata Overhaul 1/N] Extract metadata update to HPUAttentionMetadataProcessor #526
Conversation
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
There was a problem hiding this comment.
Pull Request Overview
This PR refactors attention metadata post-processing logic by extracting it from HpuModelAdapter into a new dedicated class HPUAttentionMetadataProcessor. The refactoring improves code organization by separating metadata processing concerns from the model adapter, making the codebase more maintainable without introducing functional changes.
Key Changes:
- Extracted metadata post-processing methods into
HPUAttentionMetadataProcessorclass - Removed metadata processing attributes and methods from
HpuModelAdapter - Updated
HpuModelAdapter.forward()to delegate metadata processing to the new processor
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Returns: | ||
| Dictionary with post-processed attention metadata | ||
| """ | ||
| from vllm_gaudi.extension.logger import logger |
There was a problem hiding this comment.
Import statement should be placed at the module level (top of file) rather than inside a method. This violates Python's import conventions and can impact performance when the method is called repeatedly.
| if self.interleaved_sliding_window: | ||
| self.use_window_sdpa = os.getenv("PT_HPU_SDPA_QKV_SLICE_MODE_FWD", "false").strip().lower() in ("1", "true") | ||
| self.slice_size = int(os.getenv("PT_HPU_SDPA_BC_FACTOR", "1024")) | ||
| self.slice_thld = int(os.environ.get('VLLM_FUSEDSDPA_SLIDE_THLD', '8192')) |
There was a problem hiding this comment.
[nitpick] Inconsistent environment variable access methods: line 4627 uses os.getenv() while line 4628 uses os.environ.get(). These are functionally equivalent, but consistency improves readability. Use os.getenv() on both lines.
| self.slice_thld = int(os.environ.get('VLLM_FUSEDSDPA_SLIDE_THLD', '8192')) | |
| self.slice_thld = int(os.getenv('VLLM_FUSEDSDPA_SLIDE_THLD', '8192')) |
| if self.prefill_use_fusedsdpa and self.use_window_sdpa and \ | ||
| seq_len >= self.slice_thld and self.slice_size != 0 and \ | ||
| seq_len % self.slice_size == 0 and attn_metadata.block_list is None: |
There was a problem hiding this comment.
[nitpick] Complex conditional with 6 conditions is difficult to read and maintain. Consider extracting this into a helper method like _should_use_builtin_window_sdpa() that returns a boolean and includes a docstring explaining the conditions.
| self.interleaved_sliding_window = is_interleaved(vllm_config.model_config.hf_text_config) | ||
|
|
||
| if self.interleaved_sliding_window: | ||
| self.use_window_sdpa = os.getenv("PT_HPU_SDPA_QKV_SLICE_MODE_FWD", "false").strip().lower() in ("1", "true") |
There was a problem hiding this comment.
I know it's not part of your changes, just simple copy-paste, but can we change it to use get_config().FLAG? The same way as everywhere else? "false").strip().lower() in ("1", "true") - this wouldn't be necessary
✅ CI PassedAll checks passed successfully against the following vllm commit: |
jkaniecki
left a comment
There was a problem hiding this comment.
Overall good change with one modification proposed
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
✅ CI PassedAll checks passed successfully against the following vllm commit: |
Cherry-pick of vllm-project@6e1be4e but adapted to recent changes in vllm-project#526 --------- Signed-off-by: Katarzyna Fojcik <kfojcik@habana.ai> Signed-off-by: Wang, Zheng W <zheng.w.wang@intel.com>
Cherry-pick of vllm-project@6e1be4e but adapted to recent changes in vllm-project#526 --------- Signed-off-by: Katarzyna Fojcik <kfojcik@habana.ai> Signed-off-by: slokesha <slokeshappa@habana.ai>
…tionMetadataProcessor (vllm-project#526) This PR is pretty simple - it takes all the metadata post-processing logic we do inside adapter, and yeets it from there into a separate class. This shouldn't introduce any functional changes other than a small refactor. In the next PR, I intend to remove metadata postprocessing from the adapter and do it beforehand, on CPU, but I didn't want to introduce too major changes here. I made this because I absolutely hated how vllm-project#475 ended up w.r.t. metadata postprocessing, so I'd like to gradually fix it before that PR lands. --------- Signed-off-by: Konrad Zawora <kzawora@habana.ai> Co-authored-by: Michał Kuligowski <michal.kuligowski@intel.com>
…tionMetadataProcessor (#526) This PR is pretty simple - it takes all the metadata post-processing logic we do inside adapter, and yeets it from there into a separate class. This shouldn't introduce any functional changes other than a small refactor. In the next PR, I intend to remove metadata postprocessing from the adapter and do it beforehand, on CPU, but I didn't want to introduce too major changes here. I made this because I absolutely hated how #475 ended up w.r.t. metadata postprocessing, so I'd like to gradually fix it before that PR lands. --------- Signed-off-by: Konrad Zawora <kzawora@habana.ai> Co-authored-by: Michał Kuligowski <michal.kuligowski@intel.com>
This PR is pretty simple - it takes all the metadata post-processing logic we do inside adapter, and yeets it from there into a separate class. This shouldn't introduce any functional changes other than a small refactor. In the next PR, I intend to remove metadata postprocessing from the adapter and do it beforehand, on CPU, but I didn't want to introduce too major changes here.
I made this because I absolutely hated how #475 ended up w.r.t. metadata postprocessing, so I'd like to gradually fix it before that PR lands.