[Refactor] Fix AttentionMaskBuilder singleton and remove redundant pcp_prefill_mask by LICO1314 · Pull Request #4870 · vllm-project/vllm-ascend

LICO1314 · 2025-12-10T06:36:11Z

What this PR does / why we need it?

This PR fixes the AttentionMaskBuilder singleton initialization issue introduced in PR #4779 and removes the unused pcp_prefill_mask field.

Background

After PR #4779 made AttentionMaskBuilder a singleton with @singleton decorator, the class constructor now requires a device parameter. However, two initialization sites were still using the old parameterless constructor, causing failures.

Changes

Fix singleton initialization
- Fixed AttentionMaskBuilder() → AttentionMaskBuilder(self.device) in AscendMLAMetadataBuilder.__init__()
- Fixed AttentionMaskBuilder() → AttentionMaskBuilder(self.device) in AscendAttentionMetadataBuilder.__init__()
Remove unused field
- Removed pcp_prefill_mask field from AscendPrefillContextParallelMetadata (never used in codebase)
- Updated related test assertions

Does this PR introduce any user-facing change?

No. This is an internal refactoring.

How was this patch tested?

✅ Local testing: No linter errors
✅ Unit tests for attention modules verified
⏳ CI pipeline

gemini-code-assist

Code Review

This pull request refactors the attention mechanism by removing the redundant pcp_prefill_mask and spec_attn_mask. The changes simplify the attention metadata structures and unify mask management by using attn_mask directly. This is a good cleanup that improves code clarity and should reduce memory usage. The implementation is straightforward and correct across all modified files. I have no concerns with these changes.

github-actions · 2025-12-10T06:42:55Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

github-actions · 2025-12-15T05:05:33Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2025-12-22T08:18:52Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2025-12-22T16:13:04Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2025-12-24T02:35:46Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

JC-ut0 · 2025-12-26T04:06:48Z

+            # SpecDecoding needs int8 mask for NPU operator
+            if attn_state == AscendAttentionState.SpecDecoding:
+                return self.attn_mask_builder.get_splitfuse_attn_mask()


Maybe we should consider another way to build this mask, since we are going to removing AscendAttentionState.SpecDecoding

github-actions · 2026-01-04T08:39:17Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2026-01-04T09:39:26Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2026-01-05T01:09:56Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2026-01-05T09:43:28Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

wujinyuan1 · 2026-01-06T06:40:08Z

                tail_attn_nomask_seqlens=tail_attn_nomask_seqlens,
-                q_full_idx=common_long_seq_metadata.q_full_idx,
-                pcp_prefill_mask=common_long_seq_metadata.pcp_prefill_mask,
-                pcp_allgather_restore_idx=common_long_seq_metadata.


pcp_allgather_restore_idx=common_long_seq_metadata.
pcp_allgather_restore_idx)
这个代码不能删除，请还原

wujinyuan1 · 2026-01-06T06:51:29Z

+        # Generate appropriate mask based on model type and PCP configuration
+        if self.model_config.use_mla and get_pcp_group().world_size > 1:
+            # MLA with PCP: use PCP-specific MLA mask
+            attn_mask = self.attn_mask_builder.get_pcp_mla_mask(


这个地方为什么会有mla的代码？

github-actions · 2026-01-06T08:49:57Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

This PR refactors the attention mask system to centralize mask generation and eliminate redundant mask storage in metadata structures. Changes: - Implement AttentionMaskBuilder as a proper singleton for mask management - Remove redundant attn_mask, spec_attn_mask, swa_mask from AscendCommonAttentionMetadata - Remove pcp_prefill_mask from PCP metadata (use attn_metadata.attn_mask instead) - Centralize mask generation logic in AttentionMaskBuilder - Update all attention backends to use unified mask builder - Add get_pcp_group mocks in unit tests to fix test failures - Update comments to reflect attn_mask terminology (instead of spec_attn_mask) Impact: - Reduces memory footprint by eliminating duplicate mask storage - Simplifies mask management logic across different attention scenarios - Maintains compatibility with PCP parallel processing requirements - All existing tests pass with updated mocking strategy Signed-off-by: lico67373 <918688502@qq.com> Co-authored-by: weijinqian0 <1184188277@qq.com>

gemini-code-assist bot reviewed Dec 10, 2025

View reviewed changes

LICO1314 force-pushed the remove-redundant-masks branch from 4b9de53 to 83a9289 Compare December 10, 2025 06:39

weijinqian0 mentioned this pull request Dec 10, 2025

[RFC]: Refactor Attention module #4629

Closed

LICO1314 force-pushed the remove-redundant-masks branch from 425f459 to 1d558f3 Compare December 10, 2025 07:02

github-actions bot added module:tests merge-conflicts labels Dec 10, 2025

LICO1314 force-pushed the remove-redundant-masks branch from 1d558f3 to 2af7b8b Compare December 18, 2025 09:17

github-actions bot removed the merge-conflicts label Dec 18, 2025

LICO1314 force-pushed the remove-redundant-masks branch from 2af7b8b to 790c180 Compare December 19, 2025 01:49

github-actions bot added the merge-conflicts label Dec 22, 2025

LICO1314 force-pushed the remove-redundant-masks branch from 790c180 to 70268a4 Compare December 22, 2025 08:45

github-actions bot added merge-conflicts and removed merge-conflicts labels Dec 22, 2025

LICO1314 force-pushed the remove-redundant-masks branch 2 times, most recently from 52bd693 to f1c87a9 Compare December 23, 2025 07:40

github-actions bot removed the merge-conflicts label Dec 23, 2025

LICO1314 force-pushed the remove-redundant-masks branch 2 times, most recently from be5a206 to 33779fc Compare December 23, 2025 08:54

github-actions bot added the merge-conflicts label Dec 24, 2025

JC-ut0 reviewed Dec 26, 2025

View reviewed changes

LICO1314 force-pushed the remove-redundant-masks branch from c565c68 to 9afa65c Compare December 29, 2025 02:42

github-actions bot removed the merge-conflicts label Dec 29, 2025

LICO1314 force-pushed the remove-redundant-masks branch from 9afa65c to f2ae472 Compare December 29, 2025 03:20

LICO1314 changed the title ~~[Refactor] Remove redundant pcp_prefill_mask and spec_attn_mask~~ [Refactor] Fix AttentionMaskBuilder singleton initialization with device parameter Dec 29, 2025

github-actions bot added the merge-conflicts label Dec 29, 2025

yiz-liu approved these changes Dec 31, 2025

View reviewed changes

wangxiyuan added the ready read for review label Dec 31, 2025

LICO1314 force-pushed the remove-redundant-masks branch 5 times, most recently from b092e3e to e5cc36c Compare January 3, 2026 07:09

weijinqian0 added the ready-for-test start test by label for PR label Jan 4, 2026

github-actions bot added the merge-conflicts label Jan 4, 2026

LICO1314 force-pushed the remove-redundant-masks branch from 9206985 to 493cd32 Compare January 4, 2026 09:03

github-actions bot removed the merge-conflicts label Jan 4, 2026

LICO1314 force-pushed the remove-redundant-masks branch 2 times, most recently from e7fc070 to 9206985 Compare January 4, 2026 09:37

github-actions bot added the merge-conflicts label Jan 4, 2026

LICO1314 force-pushed the remove-redundant-masks branch from 9206985 to a482960 Compare January 4, 2026 09:43

github-actions bot removed the merge-conflicts label Jan 4, 2026

weijinqian0 approved these changes Jan 4, 2026

View reviewed changes

github-actions bot added the merge-conflicts label Jan 5, 2026

LICO1314 force-pushed the remove-redundant-masks branch 3 times, most recently from a482960 to e1a5ac1 Compare January 5, 2026 04:59

wujinyuan1 reviewed Jan 6, 2026

View reviewed changes

LICO1314 mentioned this pull request Jan 12, 2026

[Cleanup] Remove dead code make_attention_mask function #5818

Merged

Conversation

LICO1314 commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Background

Changes

Related

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

github-actions bot commented Dec 10, 2025

Uh oh!

github-actions bot commented Dec 15, 2025

Uh oh!

github-actions bot commented Dec 22, 2025

Uh oh!

github-actions bot commented Dec 22, 2025

Uh oh!

github-actions bot commented Dec 24, 2025

Uh oh!

JC-ut0 Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 4, 2026

Uh oh!

github-actions bot commented Jan 4, 2026

Uh oh!

github-actions bot commented Jan 5, 2026

Uh oh!

github-actions bot commented Jan 5, 2026

Uh oh!

wujinyuan1 Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

wujinyuan1 Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

LICO1314 commented Dec 10, 2025 •

edited

Loading