Skip to content

[0.13.0][Bugfix] fix pcp aclgraph qwen FIA bug#6038

Merged
zzzzwwjj merged 1 commit intovllm-project:releases/v0.13.0from
weiguihua2:v0.13.0_2
Jan 20, 2026
Merged

[0.13.0][Bugfix] fix pcp aclgraph qwen FIA bug#6038
zzzzwwjj merged 1 commit intovllm-project:releases/v0.13.0from
weiguihua2:v0.13.0_2

Conversation

@weiguihua2
Copy link
Copy Markdown
Collaborator

@weiguihua2 weiguihua2 commented Jan 20, 2026

What this PR does / why we need it?

[releases/v0.13.0] In the pcp full graph Qwen model scenario, the inconsistency between the Q shape and actual q len of the FIA operator is fixed.
PR for main branch: #6037

Does this PR introduce any user-facing change?

No

How was this patch tested?

vLLM version: v0.13.0

Signed-off-by: weiguihua2 <weiguihua2@huawei.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix a bug related to actual_seq_lengths_q in the update_attn_dcp_pcp_params function. While removing the incorrect slicing logic is correct, removing the padding logic as well might introduce a new issue. The query tensor's shape is tied to runtime_shape, and actual_seq_lengths_q needs to have a matching length. I've provided a suggestion to re-introduce the padding logic without the incorrect slicing to prevent potential runtime errors due to shape mismatch. Please review the suggestion.

actual_seq_lengths_q = actual_seq_lengths_q + [
actual_seq_lengths_q[-1]
] * (runtime_shape - len(actual_seq_lengths_q))
actual_seq_lengths_q = attn_metadata.actual_seq_lengths_q
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This change removes the padding logic for actual_seq_lengths_q, which could lead to a shape mismatch at runtime. It's implied that attn_metadata.actual_seq_lengths_q should now have a length equal to runtime_shape.

However, the existing test test_update_attn_dcp_pcp_params suggests this might not always be the case (runtime_shape=4 vs. len(actual_seq_lengths_q)=2). This would cause a mismatch with the query tensor's shape inside the npu_fused_infer_attention_score operator, leading to a runtime error. The test only passes because the operator is mocked.

While the original slicing was likely buggy, the padding seems necessary. The suggestion below reintroduces the padding without the incorrect slicing.

Suggested change
actual_seq_lengths_q = attn_metadata.actual_seq_lengths_q
actual_seq_lengths_q = list(attn_metadata.actual_seq_lengths_q)
pad_len = runtime_shape - len(actual_seq_lengths_q)
if pad_len > 0:
actual_seq_lengths_q.extend([actual_seq_lengths_q[-1]] * pad_len)

@weiguihua2 weiguihua2 changed the title [Bugfix] fix pcp aclgraph qwen FIA bug [0.13.0][Bugfix] fix pcp aclgraph qwen FIA bug Jan 20, 2026
@weiguihua2 weiguihua2 added ready read for review ready-for-test start test by label for PR labels Jan 20, 2026
@zzzzwwjj zzzzwwjj merged commit 5664020 into vllm-project:releases/v0.13.0 Jan 20, 2026
16 of 17 checks passed
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Jan 21, 2026
…lm-ascend into FIA_v0.13.0

* 'releases/v0.13.0' of https://github.com/vllm-project/vllm-ascend:
  [0.13.0][Bugfix] Fix setting of `speculative_config.enforce_eager` for dsv32 (vllm-project#5958)
  [v0.13.0][Bugfix] Fix XliteModelRunner init failed when aclgraph is enabled (vllm-project#5887)
  [0.13.0][Bugfix] Fixed an problem related to embeddings sharing (vllm-project#5972)
  [Bugfix]Fixed precision issues caused by pooled request pooling (vllm-project#6057)
  [0.13.0][Bugfix] fix pcp aclgraph qwen FIA bug (vllm-project#6038)
  [0.13.0][cherry-pick][bugfix] fix bug of triton mrope (vllm-project#6009)
  【0.13.0】【bugfix】Resolved memory deallocation failure in the pooling layer under re-computation workloads. (vllm-project#6056)
starmountain1997 pushed a commit to starmountain1997/vllm-ascend that referenced this pull request Jan 31, 2026
### What this PR does / why we need it?

[releases/v0.13.0] In the pcp full graph Qwen model scenario, the
inconsistency between the Q shape and actual q len of the FIA operator
is fixed.
PR for main branch:
vllm-project#6037

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
vLLM version: v0.13.0

Signed-off-by: weiguihua2 <weiguihua2@huawei.com>
tangtiangu pushed a commit to tangtiangu/jiusi-vllm-ascend that referenced this pull request Feb 24, 2026
### What this PR does / why we need it?

[releases/v0.13.0] In the pcp full graph Qwen model scenario, the
inconsistency between the Q shape and actual q len of the FIA operator
is fixed.
PR for main branch:
vllm-project#6037

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
vLLM version: v0.13.0

Signed-off-by: weiguihua2 <weiguihua2@huawei.com>
tangtiangu pushed a commit to tangtiangu/jiusi-vllm-ascend that referenced this pull request Feb 24, 2026
### What this PR does / why we need it?

[releases/v0.13.0] In the pcp full graph Qwen model scenario, the
inconsistency between the Q shape and actual q len of the FIA operator
is fixed.
PR for main branch:
vllm-project#6037

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
vLLM version: v0.13.0

Signed-off-by: weiguihua2 <weiguihua2@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants