[Bugfix] fix pcp qwen full graph FIA bug by weiguihua2 · Pull Request #6037 · vllm-project/vllm-ascend

weiguihua2 · 2026-01-20T06:54:37Z

What this PR does / why we need it?

In the pcp full graph Qwen model scenario, the inconsistency between the Q shape and actual q len of the FIA operator is fixed.

Does this PR introduce any user-facing change?

No

How was this patch tested?

vLLM version: v0.13.0
vLLM main: vllm-project/vllm@2c24bc6

github-actions · 2026-01-20T06:54:51Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request addresses a bug in update_attn_dcp_pcp_params related to handling actual_seq_lengths_q in pcp/dcp attention scenarios. The previous implementation incorrectly sliced and padded this list, which would cause issues in mixed prefill/decode batches and could raise an IndexError with prefill-only batches. The fix removes this logic, aligning the function with others like _update_attn_fia_params, under the assumption that attn_metadata.actual_seq_lengths_q already has the correct length corresponding to runtime_shape. This change is a clear improvement for correctness and robustness.

Signed-off-by: weiguihua2 <weiguihua2@huawei.com>

### What this PR does / why we need it? [releases/v0.13.0] In the pcp full graph Qwen model scenario, the inconsistency between the Q shape and actual q len of the FIA operator is fixed. PR for main branch: #6037 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? vLLM version: v0.13.0 Signed-off-by: weiguihua2 <weiguihua2@huawei.com>

…to FIA_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (24 commits) add dispath_ffn_combine_bf16 (vllm-project#5866) [BugFix] Fix input parameter bug of dispatch_gmm_combine_decode[RFC: issue 5476] (vllm-project#5932) [1/N][Feat] Xlite Qwen3 MoE Support (vllm-project#5951) [Bugfix] Fix setting of `speculative_config.enforce_eager` for dsv32 (vllm-project#5945) [bugfix][mm] change get_num_encoder_tokens to get_num_encoder_embeds in recompute_schedule.py (vllm-project#5132) [Bugfix] fix pcp qwen full graph FIA bug (vllm-project#6037) [Bugfix]Fixed precision issues caused by pooled request pooling (vllm-project#6049) 【main】【bugfix】Resolved memory deallocation failure in the pooling layer under re-computation workloads. (vllm-project#6045) [main][Bugfix] Fixed an problem related to embeddings sharing (vllm-project#5967) [Feature]refactor the npugraph_ex config, support online-infer with static kernel (vllm-project#5775) [CI][Lint] Show lint diff on failure (vllm-project#5956) [CI] Add wait logic for each individual case (vllm-project#6036) [CI] Add DeepSeek-V3.2-W8A8 nightly ci test (vllm-project#4633) model runner v2 support triton of penalty (vllm-project#5854) [Docs][Model] Support Qwen3-VL-Embedding & Qwen3-VL-Reranker (vllm-project#6034) [Tests] move qwen3 performance test from nightly to e2e (vllm-project#5980) [Bugfix] fix bug of pcp+mtp+async scheduler (vllm-project#5994) [Main2Main] Upgrade vllm commit to releases/v0.14.0 (vllm-project#5988) [Ops] Add layernorm for qwen3Next (vllm-project#5765) [Doc] Add layer_sharding additional config for DeepSeek-V3.2-W8A8 (vllm-project#5921) ...

### What this PR does / why we need it? In the pcp full graph Qwen model scenario, the inconsistency between the Q shape and actual q len of the FIA operator is fixed. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 Signed-off-by: weiguihua2 <weiguihua2@huawei.com> Signed-off-by: huangning1995 <huangning12@huawei.com>

This reverts commit c12791c.

### What this PR does / why we need it? In the pcp full graph Qwen model scenario, the inconsistency between the Q shape and actual q len of the FIA operator is fixed. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 Signed-off-by: weiguihua2 <weiguihua2@huawei.com>

### What this PR does / why we need it? [releases/v0.13.0] In the pcp full graph Qwen model scenario, the inconsistency between the Q shape and actual q len of the FIA operator is fixed. PR for main branch: vllm-project#6037 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? vLLM version: v0.13.0 Signed-off-by: weiguihua2 <weiguihua2@huawei.com>

### What this PR does / why we need it? In the pcp full graph Qwen model scenario, the inconsistency between the Q shape and actual q len of the FIA operator is fixed. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 Signed-off-by: weiguihua2 <weiguihua2@huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? In the pcp full graph Qwen model scenario, the inconsistency between the Q shape and actual q len of the FIA operator is fixed. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 Signed-off-by: weiguihua2 <weiguihua2@huawei.com>

### What this PR does / why we need it? In the pcp full graph Qwen model scenario, the inconsistency between the Q shape and actual q len of the FIA operator is fixed. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 Signed-off-by: weiguihua2 <weiguihua2@huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? In the pcp full graph Qwen model scenario, the inconsistency between the Q shape and actual q len of the FIA operator is fixed. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 Signed-off-by: weiguihua2 <weiguihua2@huawei.com>

weiguihua2 requested a review from yiz-liu as a code owner January 20, 2026 06:54

gemini-code-assist Bot reviewed Jan 20, 2026

View reviewed changes

weiguihua2 mentioned this pull request Jan 20, 2026

[0.13.0][Bugfix] fix pcp aclgraph qwen FIA bug #6038

Merged

wangxiyuan added ready read for review ready-for-test start test by label for PR labels Jan 20, 2026

wangxiyuan enabled auto-merge (squash) January 20, 2026 08:40

auto-merge was automatically disabled January 20, 2026 09:48
Head branch was pushed to by a user without write access

weiguihua2 force-pushed the new_main3 branch 5 times, most recently from 8ad7e76 to 2806b3e Compare January 20, 2026 12:58

[Bugfix] fix pcp qwen full graph FIA bug

5100cdb

Signed-off-by: weiguihua2 <weiguihua2@huawei.com>

weiguihua2 force-pushed the new_main3 branch from 2806b3e to 5100cdb Compare January 20, 2026 15:00

wangxiyuan approved these changes Jan 21, 2026

View reviewed changes

wangxiyuan merged commit b399117 into vllm-project:main Jan 21, 2026
20 checks passed

huangfeifei1995 added a commit to huangfeifei1995/vllm-ascend that referenced this pull request Jan 21, 2026

Revert "[Bugfix] fix pcp qwen full graph FIA bug (vllm-project#6037)"

18ee4ca

This reverts commit c12791c.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] fix pcp qwen full graph FIA bug#6037

[Bugfix] fix pcp qwen full graph FIA bug#6037
wangxiyuan merged 1 commit intovllm-project:mainfrom
weiguihua2:new_main3

weiguihua2 commented Jan 20, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jan 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

weiguihua2 commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions Bot commented Jan 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

weiguihua2 commented Jan 20, 2026 •

edited

Loading