[Refactor] AttentionBuilder inherit from base class in vllm by LICO1314 · Pull Request #5916 · vllm-project/vllm-ascend

LICO1314 · 2026-01-15T07:25:47Z

What this PR does / why we need it?

This PR makes AscendMLAMetadataBuilder and AscendSFAMetadataBuilder properly inherit from the base class MLACommonMetadataBuilder in vllm by adding super().__init__() calls.

Changes:

Add super().__init__() call in AscendMLAMetadataBuilder.__init__()
Add super().__init__() call in AscendSFAMetadataBuilder.__init__()
Extract ascend_chunked_prefill_workspace_size() to vllm_ascend/attention/utils.py to avoid code duplication
Override determine_chunked_prefill_workspace_size() to support Ascend-specific 128k tokens workspace size (vs 64k in parent class)
Update unit tests to mock parent class __init__ for proper isolation

Why we need it:

Follow proper Python inheritance patterns by calling super().__init__()
Reduce code duplication by reusing parent class initialization logic
Better maintainability as parent class changes will be automatically inherited

Part of issue #5463 item 10

Does this PR introduce any user-facing change?

No, this is an internal refactoring that does not change any user-facing behavior.

github-actions · 2026-01-15T07:26:05Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request correctly refactors AscendMLAMetadataBuilder and AscendSFAMetadataBuilder to inherit from MLACommonMetadataBuilder, which improves code structure and reduces duplication by leveraging the base class's __init__. The override of determine_chunked_prefill_workspace_size to support Ascend-specific requirements is also a good addition. I have one suggestion to further improve maintainability by avoiding code duplication for this new static method across the two builder classes.

gemini-code-assist · 2026-01-15T07:27:14Z

+    @staticmethod
+    def determine_chunked_prefill_workspace_size(vllm_config: VllmConfig) -> int:
+        """Override parent's workspace size calculation for Ascend NPU.
+
+        Ascend NPU requires larger workspace (128k tokens) compared to
+        the default 64k tokens in the parent class.
+        """
+        scheduler_config = vllm_config.scheduler_config
+        cache_config = vllm_config.cache_config
+        model_config = vllm_config.model_config
+
+        chunked_prefill_workspace_size = min(
+            # Make sure there is enough for 8 full length request or at least
+            # 4 pages of cache per request
+            max(8 * model_config.max_model_len,
+                4 * scheduler_config.max_num_seqs * cache_config.block_size),
+            # For long-context models try not to over-allocate limiting
+            # kv-cache space, limiting it to 128k tokens for Ascend NPU,
+            # which would result in the workspace being:
+            #   2*(576)*(128*1024) = 288mb
+            # (assuming 576 MLA head dim, and fp16)
+            # which would result in up-projected context being
+            #   2*(192*128)*(128*1024) = 6gb
+            # (assuming 192 QK head dim, 128 heads, and fp16)
+            128 * 1024)
+
+        # Enforce that we have enough for at least 1 page per request
+        chunked_prefill_workspace_size = max(
+            chunked_prefill_workspace_size,
+            scheduler_config.max_num_seqs * cache_config.block_size,
+        )
+
+        return chunked_prefill_workspace_size


This static method determine_chunked_prefill_workspace_size is also duplicated in vllm_ascend/attention/sfa_v1.py. To improve maintainability and adhere to the DRY (Don't Repeat Yourself) principle, consider extracting this method into a common mixin class that both AscendMLAMetadataBuilder and AscendSFAMetadataBuilder can inherit from.

For example, you could create a mixin like this:

class AscendChunkedWorkspaceMixin: @staticmethod def determine_chunked_prefill_workspace_size(vllm_config: VllmConfig) -> int: """Provide workspace size calculation for Ascend NPUs.""" scheduler_config = vllm_config.scheduler_config cache_config = vllm_config.cache_config model_config = vllm_config.model_config chunked_prefill_workspace_size = min( max(8 * model_config.max_model_len, 4 * scheduler_config.max_num_seqs * cache_config.block_size), 128 * 1024) # Enforce that we have enough for at least 1 page per request chunked_prefill_workspace_size = max( chunked_prefill_workspace_size, scheduler_config.max_num_seqs * cache_config.block_size, ) return chunked_prefill_workspace_size # Then apply it to the builder classes: class AscendMLAMetadataBuilder(AscendChunkedWorkspaceMixin, MLACommonMetadataBuilder[AscendMLAMetadata]): ... # in sfa_v1.py class AscendSFAMetadataBuilder(AscendChunkedWorkspaceMixin, MLACommonMetadataBuilder[AscendSFAMetadata]): ...

This would centralize the logic and make future changes easier.

github-actions · 2026-01-16T09:57:01Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: lico67373 <918688502@qq.com>

…to FIA_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: [Refactor] AttentionBuilder inherit from base class in vllm (vllm-project#5916) [Nightly] Use Qwen repo for qwen3-next (vllm-project#6064)

…ject#5916) ### What this PR does / why we need it? This PR makes `AscendMLAMetadataBuilder` and `AscendSFAMetadataBuilder` properly inherit from the base class `MLACommonMetadataBuilder` in vllm by adding `super().__init__()` calls. **Changes:** - Add `super().__init__()` call in `AscendMLAMetadataBuilder.__init__()` - Add `super().__init__()` call in `AscendSFAMetadataBuilder.__init__()` - Extract `ascend_chunked_prefill_workspace_size()` to `vllm_ascend/attention/utils.py` to avoid code duplication - Override `determine_chunked_prefill_workspace_size()` to support Ascend-specific 128k tokens workspace size (vs 64k in parent class) - Update unit tests to mock parent class `__init__` for proper isolation **Why we need it:** - Follow proper Python inheritance patterns by calling `super().__init__()` - Reduce code duplication by reusing parent class initialization logic - Better maintainability as parent class changes will be automatically inherited Part of issue vllm-project#5463 item 10 ### Does this PR introduce _any_ user-facing change? No, this is an internal refactoring that does not change any user-facing behavior. Signed-off-by: lico67373 <918688502@qq.com> Signed-off-by: huangning1995 <huangning12@huawei.com>

…llm-project#5916)" This reverts commit ae8e310.

…ject#5916) ### What this PR does / why we need it? This PR makes `AscendMLAMetadataBuilder` and `AscendSFAMetadataBuilder` properly inherit from the base class `MLACommonMetadataBuilder` in vllm by adding `super().__init__()` calls. **Changes:** - Add `super().__init__()` call in `AscendMLAMetadataBuilder.__init__()` - Add `super().__init__()` call in `AscendSFAMetadataBuilder.__init__()` - Extract `ascend_chunked_prefill_workspace_size()` to `vllm_ascend/attention/utils.py` to avoid code duplication - Override `determine_chunked_prefill_workspace_size()` to support Ascend-specific 128k tokens workspace size (vs 64k in parent class) - Update unit tests to mock parent class `__init__` for proper isolation **Why we need it:** - Follow proper Python inheritance patterns by calling `super().__init__()` - Reduce code duplication by reusing parent class initialization logic - Better maintainability as parent class changes will be automatically inherited Part of issue vllm-project#5463 item 10 ### Does this PR introduce _any_ user-facing change? No, this is an internal refactoring that does not change any user-facing behavior. Signed-off-by: lico67373 <918688502@qq.com>

…ject#5916) ### What this PR does / why we need it? This PR makes `AscendMLAMetadataBuilder` and `AscendSFAMetadataBuilder` properly inherit from the base class `MLACommonMetadataBuilder` in vllm by adding `super().__init__()` calls. **Changes:** - Add `super().__init__()` call in `AscendMLAMetadataBuilder.__init__()` - Add `super().__init__()` call in `AscendSFAMetadataBuilder.__init__()` - Extract `ascend_chunked_prefill_workspace_size()` to `vllm_ascend/attention/utils.py` to avoid code duplication - Override `determine_chunked_prefill_workspace_size()` to support Ascend-specific 128k tokens workspace size (vs 64k in parent class) - Update unit tests to mock parent class `__init__` for proper isolation **Why we need it:** - Follow proper Python inheritance patterns by calling `super().__init__()` - Reduce code duplication by reusing parent class initialization logic - Better maintainability as parent class changes will be automatically inherited Part of issue vllm-project#5463 item 10 ### Does this PR introduce _any_ user-facing change? No, this is an internal refactoring that does not change any user-facing behavior. Signed-off-by: lico67373 <918688502@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

…ject#5916) ### What this PR does / why we need it? This PR makes `AscendMLAMetadataBuilder` and `AscendSFAMetadataBuilder` properly inherit from the base class `MLACommonMetadataBuilder` in vllm by adding `super().__init__()` calls. **Changes:** - Add `super().__init__()` call in `AscendMLAMetadataBuilder.__init__()` - Add `super().__init__()` call in `AscendSFAMetadataBuilder.__init__()` - Extract `ascend_chunked_prefill_workspace_size()` to `vllm_ascend/attention/utils.py` to avoid code duplication - Override `determine_chunked_prefill_workspace_size()` to support Ascend-specific 128k tokens workspace size (vs 64k in parent class) - Update unit tests to mock parent class `__init__` for proper isolation **Why we need it:** - Follow proper Python inheritance patterns by calling `super().__init__()` - Reduce code duplication by reusing parent class initialization logic - Better maintainability as parent class changes will be automatically inherited Part of issue vllm-project#5463 item 10 ### Does this PR introduce _any_ user-facing change? No, this is an internal refactoring that does not change any user-facing behavior. Signed-off-by: lico67373 <918688502@qq.com>

…ject#5916) ### What this PR does / why we need it? This PR makes `AscendMLAMetadataBuilder` and `AscendSFAMetadataBuilder` properly inherit from the base class `MLACommonMetadataBuilder` in vllm by adding `super().__init__()` calls. **Changes:** - Add `super().__init__()` call in `AscendMLAMetadataBuilder.__init__()` - Add `super().__init__()` call in `AscendSFAMetadataBuilder.__init__()` - Extract `ascend_chunked_prefill_workspace_size()` to `vllm_ascend/attention/utils.py` to avoid code duplication - Override `determine_chunked_prefill_workspace_size()` to support Ascend-specific 128k tokens workspace size (vs 64k in parent class) - Update unit tests to mock parent class `__init__` for proper isolation **Why we need it:** - Follow proper Python inheritance patterns by calling `super().__init__()` - Reduce code duplication by reusing parent class initialization logic - Better maintainability as parent class changes will be automatically inherited Part of issue vllm-project#5463 item 10 ### Does this PR introduce _any_ user-facing change? No, this is an internal refactoring that does not change any user-facing behavior. Signed-off-by: lico67373 <918688502@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

…ject#5916) ### What this PR does / why we need it? This PR makes `AscendMLAMetadataBuilder` and `AscendSFAMetadataBuilder` properly inherit from the base class `MLACommonMetadataBuilder` in vllm by adding `super().__init__()` calls. **Changes:** - Add `super().__init__()` call in `AscendMLAMetadataBuilder.__init__()` - Add `super().__init__()` call in `AscendSFAMetadataBuilder.__init__()` - Extract `ascend_chunked_prefill_workspace_size()` to `vllm_ascend/attention/utils.py` to avoid code duplication - Override `determine_chunked_prefill_workspace_size()` to support Ascend-specific 128k tokens workspace size (vs 64k in parent class) - Update unit tests to mock parent class `__init__` for proper isolation **Why we need it:** - Follow proper Python inheritance patterns by calling `super().__init__()` - Reduce code duplication by reusing parent class initialization logic - Better maintainability as parent class changes will be automatically inherited Part of issue vllm-project#5463 item 10 ### Does this PR introduce _any_ user-facing change? No, this is an internal refactoring that does not change any user-facing behavior. Signed-off-by: lico67373 <918688502@qq.com>

LICO1314 requested a review from weijinqian0 as a code owner January 15, 2026 07:25

gemini-code-assist bot reviewed Jan 15, 2026

View reviewed changes

LICO1314 force-pushed the refactor/attention-builder-inherit-base-class branch 3 times, most recently from f5c1cc1 to 42c44d4 Compare January 15, 2026 07:58

weijinqian0 mentioned this pull request Jan 15, 2026

[RFC]: Refactor Attention module #5463

Closed

wangxiyuan approved these changes Jan 15, 2026

View reviewed changes

wangxiyuan enabled auto-merge (squash) January 15, 2026 09:00

wangxiyuan added ready read for review ready-for-test start test by label for PR labels Jan 15, 2026

auto-merge was automatically disabled January 15, 2026 09:06
Head branch was pushed to by a user without write access

LICO1314 force-pushed the refactor/attention-builder-inherit-base-class branch 9 times, most recently from 0dd0a33 to defd72e Compare January 16, 2026 09:24

github-actions bot added the merge-conflicts label Jan 16, 2026

LICO1314 force-pushed the refactor/attention-builder-inherit-base-class branch from defd72e to 5dd8932 Compare January 16, 2026 10:50

github-actions bot removed the merge-conflicts label Jan 16, 2026

LICO1314 force-pushed the refactor/attention-builder-inherit-base-class branch 5 times, most recently from be0b8fa to 916ca69 Compare January 19, 2026 03:18

LICO1314 force-pushed the refactor/attention-builder-inherit-base-class branch 4 times, most recently from af42239 to 56e92c0 Compare January 20, 2026 03:23

[Refactor] AttentionBuilder inherit from base class in vllm

8ccb36c

Signed-off-by: lico67373 <918688502@qq.com>

LICO1314 force-pushed the refactor/attention-builder-inherit-base-class branch from 56e92c0 to 8ccb36c Compare January 20, 2026 11:05

weijinqian0 approved these changes Jan 21, 2026

View reviewed changes

weijinqian0 merged commit 12a668b into vllm-project:main Jan 21, 2026
20 checks passed

huangfeifei1995 added a commit to huangfeifei1995/vllm-ascend that referenced this pull request Jan 21, 2026

Revert "[Refactor] AttentionBuilder inherit from base class in vllm (v…

f0e26ea

…llm-project#5916)" This reverts commit ae8e310.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Refactor] AttentionBuilder inherit from base class in vllm#5916

[Refactor] AttentionBuilder inherit from base class in vllm#5916
weijinqian0 merged 1 commit intovllm-project:mainfrom
LICO1314:refactor/attention-builder-inherit-base-class

LICO1314 commented Jan 15, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 15, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 15, 2026

Uh oh!

github-actions bot commented Jan 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

LICO1314 commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

Uh oh!

github-actions bot commented Jan 15, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

LICO1314 commented Jan 15, 2026 •

edited

Loading