[Refactor] Fix AttentionMaskBuilder singleton and remove redundant pcp_prefill_mask#4870
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the attention mechanism by removing the redundant pcp_prefill_mask and spec_attn_mask. The changes simplify the attention metadata structures and unify mask management by using attn_mask directly. This is a good cleanup that improves code clarity and should reduce memory usage. The implementation is straightforward and correct across all modified files. I have no concerns with these changes.
4b9de53 to
83a9289
Compare
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
425f459 to
1d558f3
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
1d558f3 to
2af7b8b
Compare
2af7b8b to
790c180
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
790c180 to
70268a4
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
52bd693 to
f1c87a9
Compare
be5a206 to
33779fc
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
| # SpecDecoding needs int8 mask for NPU operator | ||
| if attn_state == AscendAttentionState.SpecDecoding: | ||
| return self.attn_mask_builder.get_splitfuse_attn_mask() |
There was a problem hiding this comment.
Maybe we should consider another way to build this mask, since we are going to removing AscendAttentionState.SpecDecoding
c565c68 to
9afa65c
Compare
9afa65c to
f2ae472
Compare
b092e3e to
e5cc36c
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
9206985 to
493cd32
Compare
e7fc070 to
9206985
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
9206985 to
a482960
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
a482960 to
e1a5ac1
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
| tail_attn_nomask_seqlens=tail_attn_nomask_seqlens, | ||
| q_full_idx=common_long_seq_metadata.q_full_idx, | ||
| pcp_prefill_mask=common_long_seq_metadata.pcp_prefill_mask, | ||
| pcp_allgather_restore_idx=common_long_seq_metadata. |
There was a problem hiding this comment.
pcp_allgather_restore_idx=common_long_seq_metadata.
pcp_allgather_restore_idx)
这个代码不能删除,请还原
| # Generate appropriate mask based on model type and PCP configuration | ||
| if self.model_config.use_mla and get_pcp_group().world_size > 1: | ||
| # MLA with PCP: use PCP-specific MLA mask | ||
| attn_mask = self.attn_mask_builder.get_pcp_mla_mask( |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
This PR refactors the attention mask system to centralize mask generation and eliminate redundant mask storage in metadata structures. Changes: - Implement AttentionMaskBuilder as a proper singleton for mask management - Remove redundant attn_mask, spec_attn_mask, swa_mask from AscendCommonAttentionMetadata - Remove pcp_prefill_mask from PCP metadata (use attn_metadata.attn_mask instead) - Centralize mask generation logic in AttentionMaskBuilder - Update all attention backends to use unified mask builder - Add get_pcp_group mocks in unit tests to fix test failures - Update comments to reflect attn_mask terminology (instead of spec_attn_mask) Impact: - Reduces memory footprint by eliminating duplicate mask storage - Simplifies mask management logic across different attention scenarios - Maintains compatibility with PCP parallel processing requirements - All existing tests pass with updated mocking strategy Signed-off-by: lico67373 <918688502@qq.com> Co-authored-by: weijinqian0 <1184188277@qq.com>
What this PR does / why we need it?
This PR fixes the
AttentionMaskBuildersingleton initialization issue introduced in PR #4779 and removes the unusedpcp_prefill_maskfield.Background
After PR #4779 made
AttentionMaskBuildera singleton with@singletondecorator, the class constructor now requires adeviceparameter. However, two initialization sites were still using the old parameterless constructor, causing failures.Changes
Fix singleton initialization
AttentionMaskBuilder()→AttentionMaskBuilder(self.device)inAscendMLAMetadataBuilder.__init__()AttentionMaskBuilder()→AttentionMaskBuilder(self.device)inAscendAttentionMetadataBuilder.__init__()Remove unused field
pcp_prefill_maskfield fromAscendPrefillContextParallelMetadata(never used in codebase)Related
Does this PR introduce any user-facing change?
No. This is an internal refactoring.
How was this patch tested?