[0.13.0][Bugfix] fix pcp aclgraph qwen FIA bug by weiguihua2 · Pull Request #6038 · vllm-project/vllm-ascend

weiguihua2 · 2026-01-20T07:07:05Z

What this PR does / why we need it?

[releases/v0.13.0] In the pcp full graph Qwen model scenario, the inconsistency between the Q shape and actual q len of the FIA operator is fixed.
PR for main branch: #6037

Does this PR introduce any user-facing change?

No

How was this patch tested?

vLLM version: v0.13.0

Signed-off-by: weiguihua2 <weiguihua2@huawei.com>

gemini-code-assist

Code Review

This pull request aims to fix a bug related to actual_seq_lengths_q in the update_attn_dcp_pcp_params function. While removing the incorrect slicing logic is correct, removing the padding logic as well might introduce a new issue. The query tensor's shape is tied to runtime_shape, and actual_seq_lengths_q needs to have a matching length. I've provided a suggestion to re-introduce the padding logic without the incorrect slicing to prevent potential runtime errors due to shape mismatch. Please review the suggestion.

gemini-code-assist · 2026-01-20T07:08:46Z

-                actual_seq_lengths_q = actual_seq_lengths_q + [
-                    actual_seq_lengths_q[-1]
-                ] * (runtime_shape - len(actual_seq_lengths_q))
+            actual_seq_lengths_q = attn_metadata.actual_seq_lengths_q


This change removes the padding logic for actual_seq_lengths_q, which could lead to a shape mismatch at runtime. It's implied that attn_metadata.actual_seq_lengths_q should now have a length equal to runtime_shape.

However, the existing test test_update_attn_dcp_pcp_params suggests this might not always be the case (runtime_shape=4 vs. len(actual_seq_lengths_q)=2). This would cause a mismatch with the query tensor's shape inside the npu_fused_infer_attention_score operator, leading to a runtime error. The test only passes because the operator is mocked.

While the original slicing was likely buggy, the padding seems necessary. The suggestion below reintroduces the padding without the incorrect slicing.

Suggested change

actual_seq_lengths_q = attn_metadata.actual_seq_lengths_q

actual_seq_lengths_q = list(attn_metadata.actual_seq_lengths_q)

pad_len = runtime_shape - len(actual_seq_lengths_q)

if pad_len > 0:

actual_seq_lengths_q.extend([actual_seq_lengths_q[-1]] * pad_len)

…lm-ascend into FIA_v0.13.0 * 'releases/v0.13.0' of https://github.com/vllm-project/vllm-ascend: [0.13.0][Bugfix] Fix setting of `speculative_config.enforce_eager` for dsv32 (vllm-project#5958) [v0.13.0][Bugfix] Fix XliteModelRunner init failed when aclgraph is enabled (vllm-project#5887) [0.13.0][Bugfix] Fixed an problem related to embeddings sharing (vllm-project#5972) [Bugfix]Fixed precision issues caused by pooled request pooling (vllm-project#6057) [0.13.0][Bugfix] fix pcp aclgraph qwen FIA bug (vllm-project#6038) [0.13.0][cherry-pick][bugfix] fix bug of triton mrope (vllm-project#6009) 【0.13.0】【bugfix】Resolved memory deallocation failure in the pooling layer under re-computation workloads. (vllm-project#6056)

### What this PR does / why we need it? [releases/v0.13.0] In the pcp full graph Qwen model scenario, the inconsistency between the Q shape and actual q len of the FIA operator is fixed. PR for main branch: vllm-project#6037 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? vLLM version: v0.13.0 Signed-off-by: weiguihua2 <weiguihua2@huawei.com>

[Bugfix] fix pcp aclgraph qwen FIA bug

d8c572d

Signed-off-by: weiguihua2 <weiguihua2@huawei.com>

gemini-code-assist bot reviewed Jan 20, 2026

View reviewed changes

weiguihua2 changed the title ~~[Bugfix] fix pcp aclgraph qwen FIA bug~~ [0.13.0][Bugfix] fix pcp aclgraph qwen FIA bug Jan 20, 2026

weiguihua2 added ready read for review ready-for-test start test by label for PR labels Jan 20, 2026

zzzzwwjj approved these changes Jan 20, 2026

View reviewed changes

zzzzwwjj merged commit 5664020 into vllm-project:releases/v0.13.0 Jan 20, 2026
16 of 17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[0.13.0][Bugfix] fix pcp aclgraph qwen FIA bug#6038

[0.13.0][Bugfix] fix pcp aclgraph qwen FIA bug#6038
zzzzwwjj merged 1 commit intovllm-project:releases/v0.13.0from
weiguihua2:v0.13.0_2

weiguihua2 commented Jan 20, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-            actual_seq_lengths_q = attn_metadata.actual_seq_lengths_q
+            actual_seq_lengths_q = list(attn_metadata.actual_seq_lengths_q)
+            pad_len = runtime_shape - len(actual_seq_lengths_q)
+            if pad_len > 0:
+                actual_seq_lengths_q.extend([actual_seq_lengths_q[-1]] * pad_len)

Conversation

weiguihua2 commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

weiguihua2 commented Jan 20, 2026 •

edited

Loading