Skip to content

[0.13.0][cherry-pick][CP&SP] Remove CP Redundant Variables after FIA operator enables for CANN 8.5 #6039

Merged
wangxiyuan merged 1 commit intovllm-project:releases/v0.13.0from
dsxsteven:releases/v0.13.0_0119_remove_redunant_cp_code
Jan 23, 2026
Merged

[0.13.0][cherry-pick][CP&SP] Remove CP Redundant Variables after FIA operator enables for CANN 8.5 #6039
wangxiyuan merged 1 commit intovllm-project:releases/v0.13.0from
dsxsteven:releases/v0.13.0_0119_remove_redunant_cp_code

Conversation

@dsxsteven
Copy link
Copy Markdown
Contributor

@dsxsteven dsxsteven commented Jan 20, 2026

What this PR does / why we need it?

PCP/DCP splits the kv-cache onto different cards. After introducing the parameter cp-kv-cache-interleave-size, the first size tokens will be cached at Card 0, and so on.
However, if there are too few tokens, some cards will not store the key-value pairs, resulting in values ​​of 0, corrupted values, and precision issues. Currently, additional operations are introduced to avoid this precision problem.

After we integrate FIA operator in mla_cp._forward_decode and CANN updates to 8.5.0, we now can remove these additional operations.
pick-from: #6013

Does this PR introduce any user-facing change?

How was this patch tested?

passed all CI by CANN 8.5.0

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request removes redundant code related to batch_seq_mask following an update to CANN 8.5. The changes are consistent across the codebase, simplifying the attention mechanism by relying on the new capabilities of the FIA operator. However, I've identified a critical issue in vllm_ascend/spec_decode/mtp_proposer.py where a numpy array is assigned to a field expecting a torch tensor, which will likely lead to a runtime error. Please see the detailed comment below.

batch_seq_mask = builder.batch_seq_mask_buf[:batch_seq_mask.
shape[0]]
cp_seq_len = torch.where(cp_seq_len == 0, 1, cp_seq_len)
attn_metadata_i.decode.cp_seq_len = cp_seq_len
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The cp_seq_len variable is a numpy array, but it is being assigned to attn_metadata_i.decode.cp_seq_len, which is defined as a torch.Tensor in the AscendMLADecodeMetadata dataclass. This type mismatch will likely cause a runtime error in downstream operations that expect a tensor. You should convert cp_seq_len to a torch.Tensor before the assignment, similar to the implementation in vllm_ascend/attention/context_parallel/mla_cp.py.

Suggested change
attn_metadata_i.decode.cp_seq_len = cp_seq_len
attn_metadata_i.decode.cp_seq_len = torch.tensor(cp_seq_len, dtype=torch.int32)

@dsxsteven dsxsteven changed the title remove redundant code after FIA operator enables for CANN 8.5 [0.13.0][Feat] Remove Redundant Variables after Integrate FIA operator in mla_cp._forward_decode Jan 20, 2026
@dsxsteven dsxsteven force-pushed the releases/v0.13.0_0119_remove_redunant_cp_code branch from b49f7e9 to e900057 Compare January 20, 2026 07:36
@weiguihua2 weiguihua2 added ready read for review ready-for-test start test by label for PR labels Jan 20, 2026
@dsxsteven dsxsteven changed the title [0.13.0][Feat] Remove Redundant Variables after Integrate FIA operator in mla_cp._forward_decode [0.13.0][Feat] Remove CP Redundant Variables after FIA operator enables for CANN 8.5 Jan 20, 2026
@dsxsteven
Copy link
Copy Markdown
Contributor Author

This PR should be merged after #6046

@dsxsteven dsxsteven force-pushed the releases/v0.13.0_0119_remove_redunant_cp_code branch from e900057 to 1ec5c89 Compare January 22, 2026 06:41
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@dsxsteven dsxsteven changed the title [0.13.0][Feat] Remove CP Redundant Variables after FIA operator enables for CANN 8.5 [0.13.0][cherry-pick][CP&SP] Remove CP Redundant Variables after FIA operator enables for CANN 8.5 Jan 23, 2026
@dsxsteven dsxsteven force-pushed the releases/v0.13.0_0119_remove_redunant_cp_code branch 2 times, most recently from 3466774 to ca172a0 Compare January 23, 2026 03:41
Signed-off-by: dsxsteven <dsxsteven@sina.com>
@wangxiyuan wangxiyuan merged commit 14a2e5d into vllm-project:releases/v0.13.0 Jan 23, 2026
12 checks passed
starmountain1997 pushed a commit to starmountain1997/vllm-ascend that referenced this pull request Jan 31, 2026
…operator enables for CANN 8.5 (vllm-project#6039)

### What this PR does / why we need it?
PCP/DCP splits the kv-cache onto different cards. After introducing the
parameter cp-kv-cache-interleave-size, the first size tokens will be
cached at Card 0, and so on.
However, if there are too few tokens, some cards will not store the
key-value pairs, resulting in values ​​of 0, corrupted values, and
precision issues. Currently, additional operations are introduced to
avoid this precision problem.

After we integrate FIA operator in mla_cp._forward_decode and CANN
updates to 8.5.0, we now can remove these additional operations.

pick-from: vllm-project#6013

### How was this patch tested?
passed all CI by CANN 8.5.0

Signed-off-by: dsxsteven <dsxsteven@sina.com>
@dsxsteven dsxsteven deleted the releases/v0.13.0_0119_remove_redunant_cp_code branch March 10, 2026 03:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants