[0.13.0][cherry-pick][CP&SP] Remove CP Redundant Variables after FIA operator enables for CANN 8.5 by dsxsteven · Pull Request #6039 · vllm-project/vllm-ascend

dsxsteven · 2026-01-20T07:13:38Z

What this PR does / why we need it?

PCP/DCP splits the kv-cache onto different cards. After introducing the parameter cp-kv-cache-interleave-size, the first size tokens will be cached at Card 0, and so on.
However, if there are too few tokens, some cards will not store the key-value pairs, resulting in values of 0, corrupted values, and precision issues. Currently, additional operations are introduced to avoid this precision problem.

After we integrate FIA operator in mla_cp._forward_decode and CANN updates to 8.5.0, we now can remove these additional operations.
pick-from: #6013

Does this PR introduce any user-facing change?

How was this patch tested?

passed all CI by CANN 8.5.0

gemini-code-assist

Code Review

This pull request removes redundant code related to batch_seq_mask following an update to CANN 8.5. The changes are consistent across the codebase, simplifying the attention mechanism by relying on the new capabilities of the FIA operator. However, I've identified a critical issue in vllm_ascend/spec_decode/mtp_proposer.py where a numpy array is assigned to a field expecting a torch tensor, which will likely lead to a runtime error. Please see the detailed comment below.

gemini-code-assist · 2026-01-20T07:15:47Z

-                batch_seq_mask = builder.batch_seq_mask_buf[:batch_seq_mask.
-                                                            shape[0]]
-                cp_seq_len = torch.where(cp_seq_len == 0, 1, cp_seq_len)
                attn_metadata_i.decode.cp_seq_len = cp_seq_len


The cp_seq_len variable is a numpy array, but it is being assigned to attn_metadata_i.decode.cp_seq_len, which is defined as a torch.Tensor in the AscendMLADecodeMetadata dataclass. This type mismatch will likely cause a runtime error in downstream operations that expect a tensor. You should convert cp_seq_len to a torch.Tensor before the assignment, similar to the implementation in vllm_ascend/attention/context_parallel/mla_cp.py.

Suggested change

attn_metadata_i.decode.cp_seq_len = cp_seq_len

attn_metadata_i.decode.cp_seq_len = torch.tensor(cp_seq_len, dtype=torch.int32)

dsxsteven · 2026-01-22T01:43:56Z

This PR should be merged after #6046

github-actions · 2026-01-22T12:05:28Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: dsxsteven <dsxsteven@sina.com>

…operator enables for CANN 8.5 (vllm-project#6039) ### What this PR does / why we need it? PCP/DCP splits the kv-cache onto different cards. After introducing the parameter cp-kv-cache-interleave-size, the first size tokens will be cached at Card 0, and so on. However, if there are too few tokens, some cards will not store the key-value pairs, resulting in values of 0, corrupted values, and precision issues. Currently, additional operations are introduced to avoid this precision problem. After we integrate FIA operator in mla_cp._forward_decode and CANN updates to 8.5.0, we now can remove these additional operations. pick-from: vllm-project#6013 ### How was this patch tested? passed all CI by CANN 8.5.0 Signed-off-by: dsxsteven <dsxsteven@sina.com>

gemini-code-assist bot reviewed Jan 20, 2026

View reviewed changes

dsxsteven changed the title ~~remove redundant code after FIA operator enables for CANN 8.5~~ [0.13.0][Feat] Remove Redundant Variables after Integrate FIA operator in mla_cp._forward_decode Jan 20, 2026

dsxsteven force-pushed the releases/v0.13.0_0119_remove_redunant_cp_code branch from b49f7e9 to e900057 Compare January 20, 2026 07:36

weiguihua2 added ready read for review ready-for-test start test by label for PR labels Jan 20, 2026

dsxsteven changed the title ~~[0.13.0][Feat] Remove Redundant Variables after Integrate FIA operator in mla_cp._forward_decode~~ [0.13.0][Feat] Remove CP Redundant Variables after FIA operator enables for CANN 8.5 Jan 20, 2026

dsxsteven force-pushed the releases/v0.13.0_0119_remove_redunant_cp_code branch from e900057 to 1ec5c89 Compare January 22, 2026 06:41

github-actions bot added the merge-conflicts label Jan 22, 2026

github-actions bot removed the merge-conflicts label Jan 22, 2026

dsxsteven changed the title ~~[0.13.0][Feat] Remove CP Redundant Variables after FIA operator enables for CANN 8.5~~ [0.13.0][cherry-pick][CP&SP] Remove CP Redundant Variables after FIA operator enables for CANN 8.5 Jan 23, 2026

dsxsteven force-pushed the releases/v0.13.0_0119_remove_redunant_cp_code branch 2 times, most recently from 3466774 to ca172a0 Compare January 23, 2026 03:41

remove redundant code after FIA operator enables for CANN 8.5

ca172a0

Signed-off-by: dsxsteven <dsxsteven@sina.com>

wangxiyuan merged commit 14a2e5d into vllm-project:releases/v0.13.0 Jan 23, 2026
12 checks passed

Yikun mentioned this pull request Feb 5, 2026

[v0.13.0rc2] FAQ / Feedback | 问题/反馈 #6186

Closed

dsxsteven deleted the releases/v0.13.0_0119_remove_redunant_cp_code branch March 10, 2026 03:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[0.13.0][cherry-pick][CP&SP] Remove CP Redundant Variables after FIA operator enables for CANN 8.5 #6039

[0.13.0][cherry-pick][CP&SP] Remove CP Redundant Variables after FIA operator enables for CANN 8.5 #6039
wangxiyuan merged 1 commit intovllm-project:releases/v0.13.0from
dsxsteven:releases/v0.13.0_0119_remove_redunant_cp_code

dsxsteven commented Jan 20, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 20, 2026

Uh oh!

dsxsteven commented Jan 22, 2026

Uh oh!

github-actions bot commented Jan 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	attn_metadata_i.decode.cp_seq_len = cp_seq_len
	attn_metadata_i.decode.cp_seq_len = torch.tensor(cp_seq_len, dtype=torch.int32)

Conversation

dsxsteven commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

dsxsteven commented Jan 22, 2026

Uh oh!

github-actions bot commented Jan 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dsxsteven commented Jan 20, 2026 •

edited

Loading