Skip to content

[Refactor] AttentionBuilder inherit from base class in vllm#5916

Merged
weijinqian0 merged 1 commit intovllm-project:mainfrom
LICO1314:refactor/attention-builder-inherit-base-class
Jan 21, 2026
Merged

[Refactor] AttentionBuilder inherit from base class in vllm#5916
weijinqian0 merged 1 commit intovllm-project:mainfrom
LICO1314:refactor/attention-builder-inherit-base-class

Conversation

@LICO1314
Copy link
Copy Markdown
Contributor

@LICO1314 LICO1314 commented Jan 15, 2026

What this PR does / why we need it?

This PR makes AscendMLAMetadataBuilder and AscendSFAMetadataBuilder properly inherit from the base class MLACommonMetadataBuilder in vllm by adding super().__init__() calls.

Changes:

  • Add super().__init__() call in AscendMLAMetadataBuilder.__init__()
  • Add super().__init__() call in AscendSFAMetadataBuilder.__init__()
  • Extract ascend_chunked_prefill_workspace_size() to vllm_ascend/attention/utils.py to avoid code duplication
  • Override determine_chunked_prefill_workspace_size() to support Ascend-specific 128k tokens workspace size (vs 64k in parent class)
  • Update unit tests to mock parent class __init__ for proper isolation

Why we need it:

  • Follow proper Python inheritance patterns by calling super().__init__()
  • Reduce code duplication by reusing parent class initialization logic
  • Better maintainability as parent class changes will be automatically inherited

Part of issue #5463 item 10

Does this PR introduce any user-facing change?

No, this is an internal refactoring that does not change any user-facing behavior.

@LICO1314 LICO1314 requested a review from weijinqian0 as a code owner January 15, 2026 07:25
@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly refactors AscendMLAMetadataBuilder and AscendSFAMetadataBuilder to inherit from MLACommonMetadataBuilder, which improves code structure and reduces duplication by leveraging the base class's __init__. The override of determine_chunked_prefill_workspace_size to support Ascend-specific requirements is also a good addition. I have one suggestion to further improve maintainability by avoiding code duplication for this new static method across the two builder classes.

Comment thread vllm_ascend/attention/mla_v1.py Outdated
Comment on lines +207 to +239
@staticmethod
def determine_chunked_prefill_workspace_size(vllm_config: VllmConfig) -> int:
"""Override parent's workspace size calculation for Ascend NPU.

Ascend NPU requires larger workspace (128k tokens) compared to
the default 64k tokens in the parent class.
"""
scheduler_config = vllm_config.scheduler_config
cache_config = vllm_config.cache_config
model_config = vllm_config.model_config

chunked_prefill_workspace_size = min(
# Make sure there is enough for 8 full length request or at least
# 4 pages of cache per request
max(8 * model_config.max_model_len,
4 * scheduler_config.max_num_seqs * cache_config.block_size),
# For long-context models try not to over-allocate limiting
# kv-cache space, limiting it to 128k tokens for Ascend NPU,
# which would result in the workspace being:
# 2*(576)*(128*1024) = 288mb
# (assuming 576 MLA head dim, and fp16)
# which would result in up-projected context being
# 2*(192*128)*(128*1024) = 6gb
# (assuming 192 QK head dim, 128 heads, and fp16)
128 * 1024)

# Enforce that we have enough for at least 1 page per request
chunked_prefill_workspace_size = max(
chunked_prefill_workspace_size,
scheduler_config.max_num_seqs * cache_config.block_size,
)

return chunked_prefill_workspace_size
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This static method determine_chunked_prefill_workspace_size is also duplicated in vllm_ascend/attention/sfa_v1.py. To improve maintainability and adhere to the DRY (Don't Repeat Yourself) principle, consider extracting this method into a common mixin class that both AscendMLAMetadataBuilder and AscendSFAMetadataBuilder can inherit from.

For example, you could create a mixin like this:

class AscendChunkedWorkspaceMixin:
    @staticmethod
    def determine_chunked_prefill_workspace_size(vllm_config: VllmConfig) -> int:
        """Provide workspace size calculation for Ascend NPUs."""
        scheduler_config = vllm_config.scheduler_config
        cache_config = vllm_config.cache_config
        model_config = vllm_config.model_config

        chunked_prefill_workspace_size = min(
            max(8 * model_config.max_model_len,
                4 * scheduler_config.max_num_seqs * cache_config.block_size),
            128 * 1024)

        # Enforce that we have enough for at least 1 page per request
        chunked_prefill_workspace_size = max(
            chunked_prefill_workspace_size,
            scheduler_config.max_num_seqs * cache_config.block_size,
        )

        return chunked_prefill_workspace_size

# Then apply it to the builder classes:
class AscendMLAMetadataBuilder(AscendChunkedWorkspaceMixin, MLACommonMetadataBuilder[AscendMLAMetadata]):
    ...

# in sfa_v1.py
class AscendSFAMetadataBuilder(AscendChunkedWorkspaceMixin, MLACommonMetadataBuilder[AscendSFAMetadata]):
    ...

This would centralize the logic and make future changes easier.

@LICO1314 LICO1314 force-pushed the refactor/attention-builder-inherit-base-class branch 3 times, most recently from f5c1cc1 to 42c44d4 Compare January 15, 2026 07:58
@wangxiyuan wangxiyuan enabled auto-merge (squash) January 15, 2026 09:00
@wangxiyuan wangxiyuan added ready read for review ready-for-test start test by label for PR labels Jan 15, 2026
auto-merge was automatically disabled January 15, 2026 09:06

Head branch was pushed to by a user without write access

@LICO1314 LICO1314 force-pushed the refactor/attention-builder-inherit-base-class branch 9 times, most recently from 0dd0a33 to defd72e Compare January 16, 2026 09:24
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@LICO1314 LICO1314 force-pushed the refactor/attention-builder-inherit-base-class branch from defd72e to 5dd8932 Compare January 16, 2026 10:50
@LICO1314 LICO1314 force-pushed the refactor/attention-builder-inherit-base-class branch 5 times, most recently from be0b8fa to 916ca69 Compare January 19, 2026 03:18
@LICO1314 LICO1314 force-pushed the refactor/attention-builder-inherit-base-class branch 4 times, most recently from af42239 to 56e92c0 Compare January 20, 2026 03:23
Signed-off-by: lico67373 <918688502@qq.com>
@LICO1314 LICO1314 force-pushed the refactor/attention-builder-inherit-base-class branch from 56e92c0 to 8ccb36c Compare January 20, 2026 11:05
@weijinqian0 weijinqian0 merged commit 12a668b into vllm-project:main Jan 21, 2026
20 checks passed
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Jan 21, 2026
…to FIA_rebase

* 'main' of https://github.com/vllm-project/vllm-ascend:
  [Refactor] AttentionBuilder inherit from base class in vllm (vllm-project#5916)
  [Nightly] Use Qwen repo for qwen3-next (vllm-project#6064)
huangfeifei1995 pushed a commit to huangfeifei1995/vllm-ascend that referenced this pull request Jan 21, 2026
…ject#5916)

### What this PR does / why we need it?

This PR makes `AscendMLAMetadataBuilder` and `AscendSFAMetadataBuilder`
properly inherit from the base class `MLACommonMetadataBuilder` in vllm
by adding `super().__init__()` calls.

**Changes:**
- Add `super().__init__()` call in `AscendMLAMetadataBuilder.__init__()`
- Add `super().__init__()` call in `AscendSFAMetadataBuilder.__init__()`
- Extract `ascend_chunked_prefill_workspace_size()` to
`vllm_ascend/attention/utils.py` to avoid code duplication
- Override `determine_chunked_prefill_workspace_size()` to support
Ascend-specific 128k tokens workspace size (vs 64k in parent class)
- Update unit tests to mock parent class `__init__` for proper isolation

**Why we need it:**
- Follow proper Python inheritance patterns by calling
`super().__init__()`
- Reduce code duplication by reusing parent class initialization logic
- Better maintainability as parent class changes will be automatically
inherited

Part of issue vllm-project#5463 item 10

### Does this PR introduce _any_ user-facing change?

No, this is an internal refactoring that does not change any user-facing
behavior.

Signed-off-by: lico67373 <918688502@qq.com>
Signed-off-by: huangning1995 <huangning12@huawei.com>
huangfeifei1995 added a commit to huangfeifei1995/vllm-ascend that referenced this pull request Jan 21, 2026
starmountain1997 pushed a commit to starmountain1997/vllm-ascend that referenced this pull request Jan 31, 2026
…ject#5916)

### What this PR does / why we need it?

This PR makes `AscendMLAMetadataBuilder` and `AscendSFAMetadataBuilder`
properly inherit from the base class `MLACommonMetadataBuilder` in vllm
by adding `super().__init__()` calls.

**Changes:**
- Add `super().__init__()` call in `AscendMLAMetadataBuilder.__init__()`
- Add `super().__init__()` call in `AscendSFAMetadataBuilder.__init__()`
- Extract `ascend_chunked_prefill_workspace_size()` to
`vllm_ascend/attention/utils.py` to avoid code duplication
- Override `determine_chunked_prefill_workspace_size()` to support
Ascend-specific 128k tokens workspace size (vs 64k in parent class)
- Update unit tests to mock parent class `__init__` for proper isolation

**Why we need it:**
- Follow proper Python inheritance patterns by calling
`super().__init__()`
- Reduce code duplication by reusing parent class initialization logic
- Better maintainability as parent class changes will be automatically
inherited

Part of issue vllm-project#5463 item 10

### Does this PR introduce _any_ user-facing change?

No, this is an internal refactoring that does not change any user-facing
behavior.

Signed-off-by: lico67373 <918688502@qq.com>
starmountain1997 pushed a commit to starmountain1997/vllm-ascend that referenced this pull request Jan 31, 2026
…ject#5916)

### What this PR does / why we need it?

This PR makes `AscendMLAMetadataBuilder` and `AscendSFAMetadataBuilder`
properly inherit from the base class `MLACommonMetadataBuilder` in vllm
by adding `super().__init__()` calls.

**Changes:**
- Add `super().__init__()` call in `AscendMLAMetadataBuilder.__init__()`
- Add `super().__init__()` call in `AscendSFAMetadataBuilder.__init__()`
- Extract `ascend_chunked_prefill_workspace_size()` to
`vllm_ascend/attention/utils.py` to avoid code duplication
- Override `determine_chunked_prefill_workspace_size()` to support
Ascend-specific 128k tokens workspace size (vs 64k in parent class)
- Update unit tests to mock parent class `__init__` for proper isolation

**Why we need it:**
- Follow proper Python inheritance patterns by calling
`super().__init__()`
- Reduce code duplication by reusing parent class initialization logic
- Better maintainability as parent class changes will be automatically
inherited

Part of issue vllm-project#5463 item 10

### Does this PR introduce _any_ user-facing change?

No, this is an internal refactoring that does not change any user-facing
behavior.

Signed-off-by: lico67373 <918688502@qq.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
…ject#5916)

### What this PR does / why we need it?

This PR makes `AscendMLAMetadataBuilder` and `AscendSFAMetadataBuilder`
properly inherit from the base class `MLACommonMetadataBuilder` in vllm
by adding `super().__init__()` calls.

**Changes:**
- Add `super().__init__()` call in `AscendMLAMetadataBuilder.__init__()`
- Add `super().__init__()` call in `AscendSFAMetadataBuilder.__init__()`
- Extract `ascend_chunked_prefill_workspace_size()` to
`vllm_ascend/attention/utils.py` to avoid code duplication
- Override `determine_chunked_prefill_workspace_size()` to support
Ascend-specific 128k tokens workspace size (vs 64k in parent class)
- Update unit tests to mock parent class `__init__` for proper isolation

**Why we need it:**
- Follow proper Python inheritance patterns by calling
`super().__init__()`
- Reduce code duplication by reusing parent class initialization logic
- Better maintainability as parent class changes will be automatically
inherited

Part of issue vllm-project#5463 item 10

### Does this PR introduce _any_ user-facing change?

No, this is an internal refactoring that does not change any user-facing
behavior.

Signed-off-by: lico67373 <918688502@qq.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
…ject#5916)

### What this PR does / why we need it?

This PR makes `AscendMLAMetadataBuilder` and `AscendSFAMetadataBuilder`
properly inherit from the base class `MLACommonMetadataBuilder` in vllm
by adding `super().__init__()` calls.

**Changes:**
- Add `super().__init__()` call in `AscendMLAMetadataBuilder.__init__()`
- Add `super().__init__()` call in `AscendSFAMetadataBuilder.__init__()`
- Extract `ascend_chunked_prefill_workspace_size()` to
`vllm_ascend/attention/utils.py` to avoid code duplication
- Override `determine_chunked_prefill_workspace_size()` to support
Ascend-specific 128k tokens workspace size (vs 64k in parent class)
- Update unit tests to mock parent class `__init__` for proper isolation

**Why we need it:**
- Follow proper Python inheritance patterns by calling
`super().__init__()`
- Reduce code duplication by reusing parent class initialization logic
- Better maintainability as parent class changes will be automatically
inherited

Part of issue vllm-project#5463 item 10

### Does this PR introduce _any_ user-facing change?

No, this is an internal refactoring that does not change any user-facing
behavior.

Signed-off-by: lico67373 <918688502@qq.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
…ject#5916)

### What this PR does / why we need it?

This PR makes `AscendMLAMetadataBuilder` and `AscendSFAMetadataBuilder`
properly inherit from the base class `MLACommonMetadataBuilder` in vllm
by adding `super().__init__()` calls.

**Changes:**
- Add `super().__init__()` call in `AscendMLAMetadataBuilder.__init__()`
- Add `super().__init__()` call in `AscendSFAMetadataBuilder.__init__()`
- Extract `ascend_chunked_prefill_workspace_size()` to
`vllm_ascend/attention/utils.py` to avoid code duplication
- Override `determine_chunked_prefill_workspace_size()` to support
Ascend-specific 128k tokens workspace size (vs 64k in parent class)
- Update unit tests to mock parent class `__init__` for proper isolation

**Why we need it:**
- Follow proper Python inheritance patterns by calling
`super().__init__()`
- Reduce code duplication by reusing parent class initialization logic
- Better maintainability as parent class changes will be automatically
inherited

Part of issue vllm-project#5463 item 10

### Does this PR introduce _any_ user-facing change?

No, this is an internal refactoring that does not change any user-facing
behavior.

Signed-off-by: lico67373 <918688502@qq.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
LCAIZJ pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Mar 7, 2026
…ject#5916)

### What this PR does / why we need it?

This PR makes `AscendMLAMetadataBuilder` and `AscendSFAMetadataBuilder`
properly inherit from the base class `MLACommonMetadataBuilder` in vllm
by adding `super().__init__()` calls.

**Changes:**
- Add `super().__init__()` call in `AscendMLAMetadataBuilder.__init__()`
- Add `super().__init__()` call in `AscendSFAMetadataBuilder.__init__()`
- Extract `ascend_chunked_prefill_workspace_size()` to
`vllm_ascend/attention/utils.py` to avoid code duplication
- Override `determine_chunked_prefill_workspace_size()` to support
Ascend-specific 128k tokens workspace size (vs 64k in parent class)
- Update unit tests to mock parent class `__init__` for proper isolation

**Why we need it:**
- Follow proper Python inheritance patterns by calling
`super().__init__()`
- Reduce code duplication by reusing parent class initialization logic
- Better maintainability as parent class changes will be automatically
inherited

Part of issue vllm-project#5463 item 10

### Does this PR introduce _any_ user-facing change?

No, this is an internal refactoring that does not change any user-facing
behavior.

Signed-off-by: lico67373 <918688502@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants