Skip to content

[P/D] bugfix for p node force free requset#5431

Merged
wangxiyuan merged 5 commits intovllm-project:mainfrom
liziyu179:fix_delayfree_request
Jan 14, 2026
Merged

[P/D] bugfix for p node force free requset#5431
wangxiyuan merged 5 commits intovllm-project:mainfrom
liziyu179:fix_delayfree_request

Conversation

@liziyu179
Copy link
Copy Markdown
Collaborator

@liziyu179 liziyu179 commented Dec 27, 2025

What this PR does / why we need it?

Fix the bug where the P-node's schedule dead after it force-frees a request due to timeout and then receives the completed kv cache pulled by the D-node again. By add list to recode all requests.

Does this PR introduce any user-facing change?

How was this patch tested?

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix an issue with force-freeing requests. However, the changes introduce a critical race condition in KVCacheTaskTracker that can lead to a thread crash due to a KeyError and a memory leak, which would cause requests to be incorrectly force-freed later. The previous logic for handling the race condition between add_delayed_request and update_done_task_count has been removed, leading to this bug. I've provided a comment with a detailed explanation of the issue.

Comment on lines +118 to +122
if self.is_kv_producer:
self.finished_requests.add(request_id)
self._remove_delayed_requests(request_id)
else:
self.record_finished_requests.add(request_id)
self.finished_requests.add(request_id)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The refactoring of KVCacheTaskTracker across __init__, update_done_task_count, and add_delayed_request has introduced a critical race condition with two major issues:

  1. Potential Crash: In update_done_task_count, _remove_delayed_requests is now called unconditionally for a producer. This method uses dict.pop(), which will raise a KeyError if update_done_task_count runs before add_delayed_request for the same request_id. This will crash the KVCacheSendingThread.

  2. Memory Leak: The removal of record_finished_requests breaks the mechanism that handled this race condition. If update_done_task_count runs first (and is modified to not crash), and then add_delayed_request runs, the request is added to delayed_free_requests and will never be removed. This leads to a memory leak and the request being incorrectly force-freed upon timeout.

The previous implementation using record_finished_requests appeared to correctly handle this race condition. This logic should be restored to ensure correctness and prevent these critical issues.

@liziyu179 liziyu179 force-pushed the fix_delayfree_request branch from 72c52d6 to dff079d Compare December 27, 2025 08:19
@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Copy Markdown
Collaborator

@jianzs jianzs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide more details about the issue you're trying to resolve?

@LCAIZJ LCAIZJ requested a review from jianzs December 29, 2025 01:37

def update_done_task_count(self, request_id: str):
with self.done_task_lock:
self.finished_requests.add(request_id)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will placing this line of code back in its original position cause any issues?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will placing this line of code back in its original position cause any issues?

For a P node, if the request has been forced_free, then it will not be in delayed_free_requests. This indicates it was previously marked as finished and does not need to be marked again.

@liziyu179
Copy link
Copy Markdown
Collaborator Author

Could you provide more details about the issue you're trying to resolve?

When a request is forced_free by a P node, if it happens to be completed by being pulled by a D node, this request will enter the get_finished interface of MooncakeConnectorWorker twice, and will also enter the _update_from_kv_xfer_finished function of the Scheduler twice. The second time will cause an assertion failure because req_id is not in self.requests.

@LCAIZJ
Copy link
Copy Markdown
Collaborator

LCAIZJ commented Dec 30, 2025

Could you provide more details about the issue you're trying to resolve?

When a request is forced_free by a P node, if it happens to be completed by being pulled by a D node, this request will enter the get_finished interface of MooncakeConnectorWorker twice, and will also enter the _update_from_kv_xfer_finished function of the Scheduler twice. The second time will cause an assertion failure because req_id is not in self.requests.

If the timeout for forced release is longer than the timeout for aborting a request, this issue should not occur. That said, resolving it properly in the code logic would also be a good approach.

@LCAIZJ LCAIZJ changed the title [P/D] bugfix for frce free requset [P/D] bugfix for force free requset Dec 30, 2025
def add_delayed_request(self, request_id: str, delay_start_time: float):
"""Add a delayed free request."""
with self.done_task_lock:
if request_id not in self.record_finished_requests:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add_delayed_request occurs in the next forward. Therefore, it's possible that DONE_RECVING_MSG is received before add_delayed_request. In this case, this modification would force normally released requests to be forcibly released after timeout.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The add_delayed_request operation occurs within the execute_model function following the current scheduling round. Requests are forwarded to the D node only after the P node has completed its execution of execute_model. Consequently, any request received by the D node is necessarily present in the P node’s delayed_requests collection. Therefore, the situation you described is deemed not to exist.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liziyu179 We've previously considered the scenario described in this PR #2899, so please check whether the current changes are compatible with it.

@liziyu179 liziyu179 force-pushed the fix_delayfree_request branch from dff079d to 1c78ad1 Compare January 12, 2026 08:47
@liziyu179 liziyu179 changed the title [P/D] bugfix for force free requset [P/D] bugfix for p node force free requset Jan 12, 2026
self.record_finished_requests.add(request_id)
else:
self.record_finished_requests.add(request_id)
self.forced_free_requests.discard(request_id)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, you only add forced free requests but never remove them. Line 129 only removes requests that became abnormal after force free, which will lead to memory leaks over time.

…equest due to timeout and then receives the completed kv cache pulled by the D-node again.

Signed-off-by: liziyu <liziyu16@huawei.com>
@liziyu179 liziyu179 force-pushed the fix_delayfree_request branch from 1c78ad1 to 21bf2e6 Compare January 13, 2026 03:11
if request_id in self.reqs_to_process:
self.finished_requests.add(request_id)
self.reqs_to_process.discard(request_id)
self.delayed_free_requests.pop(request_id, None)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add an else branch to report the log error. An exception occurs when the req_id received by update_done_task_count is not in the process, indicating a precision issue.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add an else branch to report the log error. An exception occurs when the req_id received by update_done_task_count is not in the process, indicating a precision issue.

Yes, we need to remind users about this here.

self._prefill_pp_size - 1))

if self.kv_send_thread is not None:
for req_id, meta in metadata.requests.items():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prefill node might lack meta information? If you need the req_id, you need to bring it down from the scheduler side. I didn't see this operation in this PR.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prefill node might lack meta information? If you need the req_id, you need to bring it down from the scheduler side. I didn't see this operation in this PR.

You're right, we added a req_in_batch variable to pass the request from the scheduler to the worker.

Signed-off-by: liziyu <liziyu16@huawei.com>
Signed-off-by: liziyu <liziyu16@huawei.com>
Signed-off-by: liziyu <liziyu16@huawei.com>
@wangxiyuan
Copy link
Copy Markdown
Collaborator

@LCAIZJ Please merge if this change is fine

@wangxiyuan wangxiyuan added ready read for review ready-for-test start test by label for PR labels Jan 13, 2026
@wangxiyuan
Copy link
Copy Markdown
Collaborator

@jianzs cc

Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
@LCAIZJ
Copy link
Copy Markdown
Collaborator

LCAIZJ commented Jan 13, 2026

@LCAIZJ Please merge if this change is fine
LGTM! Merging now.

@LCAIZJ LCAIZJ self-requested a review January 13, 2026 15:10
@wangxiyuan
Copy link
Copy Markdown
Collaborator

force mege. The CI failure doesn't related to this PR.

@wangxiyuan wangxiyuan merged commit e1bed43 into vllm-project:main Jan 14, 2026
15 of 16 checks passed
liziyu179 added a commit to liziyu179/vllm-ascend that referenced this pull request Jan 14, 2026
### What this PR does / why we need it?
Fix the bug where the P-node's schedule dead after it force-frees a
request due to timeout and then receives the completed kv cache pulled
by the D-node again. By add list to recode all requests.


- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@81786c8

---------

Signed-off-by: liziyu <liziyu16@huawei.com>
Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
Co-authored-by: wangxiaoteng <wangxiaoteng@huawei.com>
LCAIZJ pushed a commit that referenced this pull request Jan 14, 2026
… (#5871)

### What this PR does / why we need it?
Fix the bug where the P-node's schedule dead after it force-frees a
request due to timeout and then receives the completed kv cache pulled
by the D-node again. By add list to recode all requests.


- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@81786c8

---------
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Signed-off-by: liziyu <liziyu16@huawei.com>
Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
Co-authored-by: wangxiaoteng <wangxiaoteng@huawei.com>
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Jan 14, 2026
…to eplb_refactor

* 'main' of https://github.com/vllm-project/vllm-ascend:
  [CI] Fix lint CI (vllm-project#5880)
  [Feature] implement eagle spec decoding for model runner v2 (vllm-project#5840)
  [Quantization] Support compressed tensors moe w8a8 int8 dynamic weight (vllm-project#5718)
  [EPLB][Bugfix] Get expert map from layers (vllm-project#5817)
  [Bugfix] Fixed an accuracy problem of sp with eagle3 (vllm-project#5816)
  [P/D] bugfix for p node force free requset (vllm-project#5431)
  [Lint]Style: Convert `example` to `ruff format` (vllm-project#5863)
  [Main2Main] Upgrade vllm commit to 0109 (vllm-project#5752)
  [Bugfix][P/D] fix layerwise connector for decoder tp size > num kv heads (vllm-project#5846)
  [Test][e2e][LoRA] Add more e2e tests to cover scenarios of LoRA (vllm-project#4075)
  [CustomOp][Perf] Merge Q/K split to simplify AscendApplyRotaryEmb for better performance (vllm-project#5799)
  [Lint]Style: Convert `root`, `benchmarks`, `tools` and `docs` to `ruff format` (vllm-project#5843)
  enable ep32 for dispatch_ffn_combine (vllm-project#5787)
@liziyu179 liziyu179 deleted the fix_delayfree_request branch January 14, 2026 08:43
aipaes pushed a commit to aipaes/vllm-ascend that referenced this pull request Jan 15, 2026
### What this PR does / why we need it?
Fix the bug where the P-node's schedule dead after it force-frees a
request due to timeout and then receives the completed kv cache pulled
by the D-node again. By add list to recode all requests.


- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@81786c8

---------

Signed-off-by: liziyu <liziyu16@huawei.com>
Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
Co-authored-by: wangxiaoteng <wangxiaoteng@huawei.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
### What this PR does / why we need it?
Fix the bug where the P-node's schedule dead after it force-frees a
request due to timeout and then receives the completed kv cache pulled
by the D-node again. By add list to recode all requests.

- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@81786c8

---------

Signed-off-by: liziyu <liziyu16@huawei.com>
Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
Co-authored-by: wangxiaoteng <wangxiaoteng@huawei.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
### What this PR does / why we need it?
Fix the bug where the P-node's schedule dead after it force-frees a
request due to timeout and then receives the completed kv cache pulled
by the D-node again. By add list to recode all requests.


- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@81786c8

---------

Signed-off-by: liziyu <liziyu16@huawei.com>
Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
Co-authored-by: wangxiaoteng <wangxiaoteng@huawei.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
### What this PR does / why we need it?
Fix the bug where the P-node's schedule dead after it force-frees a
request due to timeout and then receives the completed kv cache pulled
by the D-node again. By add list to recode all requests.

- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@81786c8

---------

Signed-off-by: liziyu <liziyu16@huawei.com>
Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
Co-authored-by: wangxiaoteng <wangxiaoteng@huawei.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
LCAIZJ pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Mar 7, 2026
### What this PR does / why we need it?
Fix the bug where the P-node's schedule dead after it force-frees a
request due to timeout and then receives the completed kv cache pulled
by the D-node again. By add list to recode all requests.


- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@81786c8

---------

Signed-off-by: liziyu <liziyu16@huawei.com>
Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
Co-authored-by: wangxiaoteng <wangxiaoteng@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants