[Refactor] AttentionBuilder inherit from base class in vllm#5916
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Code Review
This pull request correctly refactors AscendMLAMetadataBuilder and AscendSFAMetadataBuilder to inherit from MLACommonMetadataBuilder, which improves code structure and reduces duplication by leveraging the base class's __init__. The override of determine_chunked_prefill_workspace_size to support Ascend-specific requirements is also a good addition. I have one suggestion to further improve maintainability by avoiding code duplication for this new static method across the two builder classes.
| @staticmethod | ||
| def determine_chunked_prefill_workspace_size(vllm_config: VllmConfig) -> int: | ||
| """Override parent's workspace size calculation for Ascend NPU. | ||
|
|
||
| Ascend NPU requires larger workspace (128k tokens) compared to | ||
| the default 64k tokens in the parent class. | ||
| """ | ||
| scheduler_config = vllm_config.scheduler_config | ||
| cache_config = vllm_config.cache_config | ||
| model_config = vllm_config.model_config | ||
|
|
||
| chunked_prefill_workspace_size = min( | ||
| # Make sure there is enough for 8 full length request or at least | ||
| # 4 pages of cache per request | ||
| max(8 * model_config.max_model_len, | ||
| 4 * scheduler_config.max_num_seqs * cache_config.block_size), | ||
| # For long-context models try not to over-allocate limiting | ||
| # kv-cache space, limiting it to 128k tokens for Ascend NPU, | ||
| # which would result in the workspace being: | ||
| # 2*(576)*(128*1024) = 288mb | ||
| # (assuming 576 MLA head dim, and fp16) | ||
| # which would result in up-projected context being | ||
| # 2*(192*128)*(128*1024) = 6gb | ||
| # (assuming 192 QK head dim, 128 heads, and fp16) | ||
| 128 * 1024) | ||
|
|
||
| # Enforce that we have enough for at least 1 page per request | ||
| chunked_prefill_workspace_size = max( | ||
| chunked_prefill_workspace_size, | ||
| scheduler_config.max_num_seqs * cache_config.block_size, | ||
| ) | ||
|
|
||
| return chunked_prefill_workspace_size |
There was a problem hiding this comment.
This static method determine_chunked_prefill_workspace_size is also duplicated in vllm_ascend/attention/sfa_v1.py. To improve maintainability and adhere to the DRY (Don't Repeat Yourself) principle, consider extracting this method into a common mixin class that both AscendMLAMetadataBuilder and AscendSFAMetadataBuilder can inherit from.
For example, you could create a mixin like this:
class AscendChunkedWorkspaceMixin:
@staticmethod
def determine_chunked_prefill_workspace_size(vllm_config: VllmConfig) -> int:
"""Provide workspace size calculation for Ascend NPUs."""
scheduler_config = vllm_config.scheduler_config
cache_config = vllm_config.cache_config
model_config = vllm_config.model_config
chunked_prefill_workspace_size = min(
max(8 * model_config.max_model_len,
4 * scheduler_config.max_num_seqs * cache_config.block_size),
128 * 1024)
# Enforce that we have enough for at least 1 page per request
chunked_prefill_workspace_size = max(
chunked_prefill_workspace_size,
scheduler_config.max_num_seqs * cache_config.block_size,
)
return chunked_prefill_workspace_size
# Then apply it to the builder classes:
class AscendMLAMetadataBuilder(AscendChunkedWorkspaceMixin, MLACommonMetadataBuilder[AscendMLAMetadata]):
...
# in sfa_v1.py
class AscendSFAMetadataBuilder(AscendChunkedWorkspaceMixin, MLACommonMetadataBuilder[AscendSFAMetadata]):
...This would centralize the logic and make future changes easier.
f5c1cc1 to
42c44d4
Compare
Head branch was pushed to by a user without write access
0dd0a33 to
defd72e
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
defd72e to
5dd8932
Compare
be0b8fa to
916ca69
Compare
af42239 to
56e92c0
Compare
Signed-off-by: lico67373 <918688502@qq.com>
56e92c0 to
8ccb36c
Compare
…to FIA_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: [Refactor] AttentionBuilder inherit from base class in vllm (vllm-project#5916) [Nightly] Use Qwen repo for qwen3-next (vllm-project#6064)
…ject#5916) ### What this PR does / why we need it? This PR makes `AscendMLAMetadataBuilder` and `AscendSFAMetadataBuilder` properly inherit from the base class `MLACommonMetadataBuilder` in vllm by adding `super().__init__()` calls. **Changes:** - Add `super().__init__()` call in `AscendMLAMetadataBuilder.__init__()` - Add `super().__init__()` call in `AscendSFAMetadataBuilder.__init__()` - Extract `ascend_chunked_prefill_workspace_size()` to `vllm_ascend/attention/utils.py` to avoid code duplication - Override `determine_chunked_prefill_workspace_size()` to support Ascend-specific 128k tokens workspace size (vs 64k in parent class) - Update unit tests to mock parent class `__init__` for proper isolation **Why we need it:** - Follow proper Python inheritance patterns by calling `super().__init__()` - Reduce code duplication by reusing parent class initialization logic - Better maintainability as parent class changes will be automatically inherited Part of issue vllm-project#5463 item 10 ### Does this PR introduce _any_ user-facing change? No, this is an internal refactoring that does not change any user-facing behavior. Signed-off-by: lico67373 <918688502@qq.com> Signed-off-by: huangning1995 <huangning12@huawei.com>
…llm-project#5916)" This reverts commit ae8e310.
…ject#5916) ### What this PR does / why we need it? This PR makes `AscendMLAMetadataBuilder` and `AscendSFAMetadataBuilder` properly inherit from the base class `MLACommonMetadataBuilder` in vllm by adding `super().__init__()` calls. **Changes:** - Add `super().__init__()` call in `AscendMLAMetadataBuilder.__init__()` - Add `super().__init__()` call in `AscendSFAMetadataBuilder.__init__()` - Extract `ascend_chunked_prefill_workspace_size()` to `vllm_ascend/attention/utils.py` to avoid code duplication - Override `determine_chunked_prefill_workspace_size()` to support Ascend-specific 128k tokens workspace size (vs 64k in parent class) - Update unit tests to mock parent class `__init__` for proper isolation **Why we need it:** - Follow proper Python inheritance patterns by calling `super().__init__()` - Reduce code duplication by reusing parent class initialization logic - Better maintainability as parent class changes will be automatically inherited Part of issue vllm-project#5463 item 10 ### Does this PR introduce _any_ user-facing change? No, this is an internal refactoring that does not change any user-facing behavior. Signed-off-by: lico67373 <918688502@qq.com>
…ject#5916) ### What this PR does / why we need it? This PR makes `AscendMLAMetadataBuilder` and `AscendSFAMetadataBuilder` properly inherit from the base class `MLACommonMetadataBuilder` in vllm by adding `super().__init__()` calls. **Changes:** - Add `super().__init__()` call in `AscendMLAMetadataBuilder.__init__()` - Add `super().__init__()` call in `AscendSFAMetadataBuilder.__init__()` - Extract `ascend_chunked_prefill_workspace_size()` to `vllm_ascend/attention/utils.py` to avoid code duplication - Override `determine_chunked_prefill_workspace_size()` to support Ascend-specific 128k tokens workspace size (vs 64k in parent class) - Update unit tests to mock parent class `__init__` for proper isolation **Why we need it:** - Follow proper Python inheritance patterns by calling `super().__init__()` - Reduce code duplication by reusing parent class initialization logic - Better maintainability as parent class changes will be automatically inherited Part of issue vllm-project#5463 item 10 ### Does this PR introduce _any_ user-facing change? No, this is an internal refactoring that does not change any user-facing behavior. Signed-off-by: lico67373 <918688502@qq.com>
…ject#5916) ### What this PR does / why we need it? This PR makes `AscendMLAMetadataBuilder` and `AscendSFAMetadataBuilder` properly inherit from the base class `MLACommonMetadataBuilder` in vllm by adding `super().__init__()` calls. **Changes:** - Add `super().__init__()` call in `AscendMLAMetadataBuilder.__init__()` - Add `super().__init__()` call in `AscendSFAMetadataBuilder.__init__()` - Extract `ascend_chunked_prefill_workspace_size()` to `vllm_ascend/attention/utils.py` to avoid code duplication - Override `determine_chunked_prefill_workspace_size()` to support Ascend-specific 128k tokens workspace size (vs 64k in parent class) - Update unit tests to mock parent class `__init__` for proper isolation **Why we need it:** - Follow proper Python inheritance patterns by calling `super().__init__()` - Reduce code duplication by reusing parent class initialization logic - Better maintainability as parent class changes will be automatically inherited Part of issue vllm-project#5463 item 10 ### Does this PR introduce _any_ user-facing change? No, this is an internal refactoring that does not change any user-facing behavior. Signed-off-by: lico67373 <918688502@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
…ject#5916) ### What this PR does / why we need it? This PR makes `AscendMLAMetadataBuilder` and `AscendSFAMetadataBuilder` properly inherit from the base class `MLACommonMetadataBuilder` in vllm by adding `super().__init__()` calls. **Changes:** - Add `super().__init__()` call in `AscendMLAMetadataBuilder.__init__()` - Add `super().__init__()` call in `AscendSFAMetadataBuilder.__init__()` - Extract `ascend_chunked_prefill_workspace_size()` to `vllm_ascend/attention/utils.py` to avoid code duplication - Override `determine_chunked_prefill_workspace_size()` to support Ascend-specific 128k tokens workspace size (vs 64k in parent class) - Update unit tests to mock parent class `__init__` for proper isolation **Why we need it:** - Follow proper Python inheritance patterns by calling `super().__init__()` - Reduce code duplication by reusing parent class initialization logic - Better maintainability as parent class changes will be automatically inherited Part of issue vllm-project#5463 item 10 ### Does this PR introduce _any_ user-facing change? No, this is an internal refactoring that does not change any user-facing behavior. Signed-off-by: lico67373 <918688502@qq.com>
…ject#5916) ### What this PR does / why we need it? This PR makes `AscendMLAMetadataBuilder` and `AscendSFAMetadataBuilder` properly inherit from the base class `MLACommonMetadataBuilder` in vllm by adding `super().__init__()` calls. **Changes:** - Add `super().__init__()` call in `AscendMLAMetadataBuilder.__init__()` - Add `super().__init__()` call in `AscendSFAMetadataBuilder.__init__()` - Extract `ascend_chunked_prefill_workspace_size()` to `vllm_ascend/attention/utils.py` to avoid code duplication - Override `determine_chunked_prefill_workspace_size()` to support Ascend-specific 128k tokens workspace size (vs 64k in parent class) - Update unit tests to mock parent class `__init__` for proper isolation **Why we need it:** - Follow proper Python inheritance patterns by calling `super().__init__()` - Reduce code duplication by reusing parent class initialization logic - Better maintainability as parent class changes will be automatically inherited Part of issue vllm-project#5463 item 10 ### Does this PR introduce _any_ user-facing change? No, this is an internal refactoring that does not change any user-facing behavior. Signed-off-by: lico67373 <918688502@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
…ject#5916) ### What this PR does / why we need it? This PR makes `AscendMLAMetadataBuilder` and `AscendSFAMetadataBuilder` properly inherit from the base class `MLACommonMetadataBuilder` in vllm by adding `super().__init__()` calls. **Changes:** - Add `super().__init__()` call in `AscendMLAMetadataBuilder.__init__()` - Add `super().__init__()` call in `AscendSFAMetadataBuilder.__init__()` - Extract `ascend_chunked_prefill_workspace_size()` to `vllm_ascend/attention/utils.py` to avoid code duplication - Override `determine_chunked_prefill_workspace_size()` to support Ascend-specific 128k tokens workspace size (vs 64k in parent class) - Update unit tests to mock parent class `__init__` for proper isolation **Why we need it:** - Follow proper Python inheritance patterns by calling `super().__init__()` - Reduce code duplication by reusing parent class initialization logic - Better maintainability as parent class changes will be automatically inherited Part of issue vllm-project#5463 item 10 ### Does this PR introduce _any_ user-facing change? No, this is an internal refactoring that does not change any user-facing behavior. Signed-off-by: lico67373 <918688502@qq.com>
What this PR does / why we need it?
This PR makes
AscendMLAMetadataBuilderandAscendSFAMetadataBuilderproperly inherit from the base classMLACommonMetadataBuilderin vllm by addingsuper().__init__()calls.Changes:
super().__init__()call inAscendMLAMetadataBuilder.__init__()super().__init__()call inAscendSFAMetadataBuilder.__init__()ascend_chunked_prefill_workspace_size()tovllm_ascend/attention/utils.pyto avoid code duplicationdetermine_chunked_prefill_workspace_size()to support Ascend-specific 128k tokens workspace size (vs 64k in parent class)__init__for proper isolationWhy we need it:
super().__init__()Part of issue #5463 item 10
Does this PR introduce any user-facing change?
No, this is an internal refactoring that does not change any user-facing behavior.