Skip to content

[P/D][PCP]bugfix pcp force free twice caused logger error#6124

Merged
wangxiyuan merged 1 commit intovllm-project:mainfrom
wangxiaoteng888:bugfix_pcp_done
Jan 22, 2026
Merged

[P/D][PCP]bugfix pcp force free twice caused logger error#6124
wangxiyuan merged 1 commit intovllm-project:mainfrom
wangxiaoteng888:bugfix_pcp_done

Conversation

@wangxiaoteng888
Copy link
Copy Markdown
Contributor

@wangxiaoteng888 wangxiaoteng888 commented Jan 22, 2026

What this PR does / why we need it?

The issue of the D node mistakenly sending the pull-end signal twice, leading to the P node printing logger errors abnormally, has been resolved.

Does this PR introduce any user-facing change?

No

How was this patch tested?

By ci

Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a race condition in the KVCacheRecvingThread that could lead to resources being freed twice in Prefill Context Parallelism (PCP) scenarios. The bug occurred in multi-task requests where the run-once flag for _send_done_signal_to_free_remote_port was cleared prematurely. The fix involves reordering operations within the finally block of _handle_request to ensure the flag is checked before it's cleared, which correctly prevents the redundant execution. The change is logical and effectively resolves the issue.

Comment on lines +447 to +448
self._send_done_signal_to_free_remote_port(remote_request_id, remote_host,
remote_port_send_num)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Moving this call to the beginning of the finally block correctly resolves a race condition. For multi-task requests, the run-once flag (self.proc_not_transfer_request) was previously deleted on the final task before this function was called, causing its logic to execute twice. This change ensures the flag is checked before it's cleared when all_task_done is true, preventing the double execution.

@wangxiaoteng888 wangxiaoteng888 changed the title [P/D][PCP]bugfix pcp force free twice [P/D][PCP]bugfix pcp force free twice caused logger error Jan 22, 2026
@wangxiyuan wangxiyuan merged commit f2c0ced into vllm-project:main Jan 22, 2026
18 checks passed
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Jan 22, 2026
…to qwen3next_rebase

* 'main' of https://github.com/vllm-project/vllm-ascend: (51 commits)
  [Bugfix] Remove `use_aclgraph` in mtp_proposer and use `use_cuda_graph` (vllm-project#6032)
  [BugFix] fix 3vl dense model load quant weight (vllm-project#6100)
  [CP&SP] Integrate FIA operator in mla_cp._forward_decode (vllm-project#5641)
  [CI][Doc] Upgrade wheel building's CANN to 8.5.0 and update the Docs (vllm-project#6145)
  [CI]Install clang in dokerfile for triton ascend (vllm-project#4409)
  [Main] Upgrade PTA to 2.9.0 (vllm-project#6112)
  [Graph][Fusion] Add QKVNormRope and QKVNormRopeWithBias (vllm-project#5721)
  [P/D][PCP]bugfix pcp force free twice caused logger error (vllm-project#6124)
  [BugFix]converting pa get_workspace back to capturing (vllm-project#5833)
  [CI] optimize lint term (vllm-project#5986)
  [Bugfix] Fix Triton operator usage for multimodal models based on `the mrope_interleaved` parameter (vllm-project#6042)
  [bugfix][npugraph_ex]fix the model output type issue caused by manually modify FX graph (vllm-project#6015)
  [BugFix] Support setting tp=1 for the Eagle draft model to take effect (vllm-project#6097)
  [Misc] Bump mooncake version to v0.3.8.post1 (vllm-project#6110)
  [Feature]Enable DispatchGmmCombineDecode when eagle is moe with w8a8 or not moe [RFC: issue 5476] (vllm-project#5758)
  [bugfix] adapt_remote_request_id (vllm-project#6051)
  [Feature] Add support of new W4A4_LAOS_DYNAMIC quantization method (vllm-project#5143)
  [Feature] Support DSA-CP for Hybrid scenario (vllm-project#5702)
  [CI] Upgrade CANN to 8.5.0 (vllm-project#6070)
  Default enable MLAPO (vllm-project#5952)
  ...
wangxiyuan pushed a commit that referenced this pull request Jan 23, 2026
)

### What this PR does / why we need it?
The issue of the D node mistakenly sending the pull-end signal twice,
leading to the P node printing logger errors abnormally, has been
resolved.
pick-from: #6124

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By ci


Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
starmountain1997 pushed a commit to starmountain1997/vllm-ascend that referenced this pull request Jan 31, 2026
…ct#6124)

### What this PR does / why we need it?
The issue of the D node mistakenly sending the pull-end signal twice,
leading to the P node printing logger errors abnormally, has been
resolved.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By ci
- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@d682094

Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
starmountain1997 pushed a commit to starmountain1997/vllm-ascend that referenced this pull request Jan 31, 2026
…lm-project#6132)

### What this PR does / why we need it?
The issue of the D node mistakenly sending the pull-end signal twice,
leading to the P node printing logger errors abnormally, has been
resolved.
pick-from: vllm-project#6124

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By ci


Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
starmountain1997 pushed a commit to starmountain1997/vllm-ascend that referenced this pull request Jan 31, 2026
…ct#6124)

### What this PR does / why we need it?
The issue of the D node mistakenly sending the pull-end signal twice,
leading to the P node printing logger errors abnormally, has been
resolved.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By ci
- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@d682094

Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
…ct#6124)

### What this PR does / why we need it?
The issue of the D node mistakenly sending the pull-end signal twice,
leading to the P node printing logger errors abnormally, has been
resolved.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By ci
- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@d682094

Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
…ct#6124)

### What this PR does / why we need it?
The issue of the D node mistakenly sending the pull-end signal twice,
leading to the P node printing logger errors abnormally, has been
resolved.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By ci
- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@d682094

Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
…ct#6124)

### What this PR does / why we need it?
The issue of the D node mistakenly sending the pull-end signal twice,
leading to the P node printing logger errors abnormally, has been
resolved.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By ci
- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@d682094

Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
LCAIZJ pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Mar 7, 2026
…ct#6124)

### What this PR does / why we need it?
The issue of the D node mistakenly sending the pull-end signal twice,
leading to the P node printing logger errors abnormally, has been
resolved.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By ci
- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@d682094

Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants