Skip to content

[Refactor] Fix AttentionMaskBuilder singleton and remove redundant pcp_prefill_mask#4870

Merged
weijinqian0 merged 1 commit intovllm-project:mainfrom
LICO1314:remove-redundant-masks
Jan 7, 2026
Merged

[Refactor] Fix AttentionMaskBuilder singleton and remove redundant pcp_prefill_mask#4870
weijinqian0 merged 1 commit intovllm-project:mainfrom
LICO1314:remove-redundant-masks

Conversation

@LICO1314
Copy link
Copy Markdown
Contributor

@LICO1314 LICO1314 commented Dec 10, 2025

What this PR does / why we need it?

This PR fixes the AttentionMaskBuilder singleton initialization issue introduced in PR #4779 and removes the unused pcp_prefill_mask field.

Background

After PR #4779 made AttentionMaskBuilder a singleton with @singleton decorator, the class constructor now requires a device parameter. However, two initialization sites were still using the old parameterless constructor, causing failures.

Changes

  1. Fix singleton initialization

    • Fixed AttentionMaskBuilder()AttentionMaskBuilder(self.device) in AscendMLAMetadataBuilder.__init__()
    • Fixed AttentionMaskBuilder()AttentionMaskBuilder(self.device) in AscendAttentionMetadataBuilder.__init__()
  2. Remove unused field

    • Removed pcp_prefill_mask field from AscendPrefillContextParallelMetadata (never used in codebase)
    • Updated related test assertions

Related

Does this PR introduce any user-facing change?

No. This is an internal refactoring.

How was this patch tested?

  • ✅ Local testing: No linter errors
  • ✅ Unit tests for attention modules verified
  • ⏳ CI pipeline

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the attention mechanism by removing the redundant pcp_prefill_mask and spec_attn_mask. The changes simplify the attention metadata structures and unify mask management by using attn_mask directly. This is a good cleanup that improves code clarity and should reduce memory usage. The implementation is straightforward and correct across all modified files. I have no concerns with these changes.

@LICO1314 LICO1314 force-pushed the remove-redundant-masks branch from 4b9de53 to 83a9289 Compare December 10, 2025 06:39
@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@LICO1314 LICO1314 force-pushed the remove-redundant-masks branch 2 times, most recently from 52bd693 to f1c87a9 Compare December 23, 2025 07:40
@LICO1314 LICO1314 force-pushed the remove-redundant-masks branch 2 times, most recently from be5a206 to 33779fc Compare December 23, 2025 08:54
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Comment thread vllm_ascend/worker/model_runner_v1.py Outdated
Comment on lines +484 to +486
# SpecDecoding needs int8 mask for NPU operator
if attn_state == AscendAttentionState.SpecDecoding:
return self.attn_mask_builder.get_splitfuse_attn_mask()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should consider another way to build this mask, since we are going to removing AscendAttentionState.SpecDecoding

@LICO1314 LICO1314 force-pushed the remove-redundant-masks branch from c565c68 to 9afa65c Compare December 29, 2025 02:42
@LICO1314 LICO1314 force-pushed the remove-redundant-masks branch from 9afa65c to f2ae472 Compare December 29, 2025 03:20
@LICO1314 LICO1314 changed the title [Refactor] Remove redundant pcp_prefill_mask and spec_attn_mask [Refactor] Fix AttentionMaskBuilder singleton initialization with device parameter Dec 29, 2025
@wangxiyuan wangxiyuan added the ready read for review label Dec 31, 2025
@LICO1314 LICO1314 force-pushed the remove-redundant-masks branch 5 times, most recently from b092e3e to e5cc36c Compare January 3, 2026 07:09
@weijinqian0 weijinqian0 added the ready-for-test start test by label for PR label Jan 4, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jan 4, 2026

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@LICO1314 LICO1314 force-pushed the remove-redundant-masks branch from 9206985 to 493cd32 Compare January 4, 2026 09:03
@LICO1314 LICO1314 force-pushed the remove-redundant-masks branch 2 times, most recently from e7fc070 to 9206985 Compare January 4, 2026 09:37
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jan 4, 2026

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jan 5, 2026

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@LICO1314 LICO1314 force-pushed the remove-redundant-masks branch 3 times, most recently from a482960 to e1a5ac1 Compare January 5, 2026 04:59
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jan 5, 2026

This pull request has conflicts, please resolve those before we can evaluate the pull request.

tail_attn_nomask_seqlens=tail_attn_nomask_seqlens,
q_full_idx=common_long_seq_metadata.q_full_idx,
pcp_prefill_mask=common_long_seq_metadata.pcp_prefill_mask,
pcp_allgather_restore_idx=common_long_seq_metadata.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pcp_allgather_restore_idx=common_long_seq_metadata.
pcp_allgather_restore_idx)
这个代码不能删除,请还原

# Generate appropriate mask based on model type and PCP configuration
if self.model_config.use_mla and get_pcp_group().world_size > 1:
# MLA with PCP: use PCP-specific MLA mask
attn_mask = self.attn_mask_builder.get_pcp_mla_mask(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个地方为什么会有mla的代码?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jan 6, 2026

This pull request has conflicts, please resolve those before we can evaluate the pull request.

This PR refactors the attention mask system to centralize mask generation
and eliminate redundant mask storage in metadata structures.

Changes:
- Implement AttentionMaskBuilder as a proper singleton for mask management
- Remove redundant attn_mask, spec_attn_mask, swa_mask from AscendCommonAttentionMetadata
- Remove pcp_prefill_mask from PCP metadata (use attn_metadata.attn_mask instead)
- Centralize mask generation logic in AttentionMaskBuilder
- Update all attention backends to use unified mask builder
- Add get_pcp_group mocks in unit tests to fix test failures
- Update comments to reflect attn_mask terminology (instead of spec_attn_mask)

Impact:
- Reduces memory footprint by eliminating duplicate mask storage
- Simplifies mask management logic across different attention scenarios
- Maintains compatibility with PCP parallel processing requirements
- All existing tests pass with updated mocking strategy

Signed-off-by: lico67373 <918688502@qq.com>
Co-authored-by: weijinqian0 <1184188277@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:tests ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants