[P/D] bugfix for p node force free requset#5431
[P/D] bugfix for p node force free requset#5431wangxiyuan merged 5 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request aims to fix an issue with force-freeing requests. However, the changes introduce a critical race condition in KVCacheTaskTracker that can lead to a thread crash due to a KeyError and a memory leak, which would cause requests to be incorrectly force-freed later. The previous logic for handling the race condition between add_delayed_request and update_done_task_count has been removed, leading to this bug. I've provided a comment with a detailed explanation of the issue.
| if self.is_kv_producer: | ||
| self.finished_requests.add(request_id) | ||
| self._remove_delayed_requests(request_id) | ||
| else: | ||
| self.record_finished_requests.add(request_id) | ||
| self.finished_requests.add(request_id) |
There was a problem hiding this comment.
The refactoring of KVCacheTaskTracker across __init__, update_done_task_count, and add_delayed_request has introduced a critical race condition with two major issues:
-
Potential Crash: In
update_done_task_count,_remove_delayed_requestsis now called unconditionally for a producer. This method usesdict.pop(), which will raise aKeyErrorifupdate_done_task_countruns beforeadd_delayed_requestfor the samerequest_id. This will crash theKVCacheSendingThread. -
Memory Leak: The removal of
record_finished_requestsbreaks the mechanism that handled this race condition. Ifupdate_done_task_countruns first (and is modified to not crash), and thenadd_delayed_requestruns, the request is added todelayed_free_requestsand will never be removed. This leads to a memory leak and the request being incorrectly force-freed upon timeout.
The previous implementation using record_finished_requests appeared to correctly handle this race condition. This logic should be restored to ensure correctness and prevent these critical issues.
72c52d6 to
dff079d
Compare
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
jianzs
left a comment
There was a problem hiding this comment.
Could you provide more details about the issue you're trying to resolve?
|
|
||
| def update_done_task_count(self, request_id: str): | ||
| with self.done_task_lock: | ||
| self.finished_requests.add(request_id) |
There was a problem hiding this comment.
Will placing this line of code back in its original position cause any issues?
There was a problem hiding this comment.
Will placing this line of code back in its original position cause any issues?
For a P node, if the request has been forced_free, then it will not be in delayed_free_requests. This indicates it was previously marked as finished and does not need to be marked again.
When a request is forced_free by a P node, if it happens to be completed by being pulled by a D node, this request will enter the get_finished interface of MooncakeConnectorWorker twice, and will also enter the _update_from_kv_xfer_finished function of the Scheduler twice. The second time will cause an assertion failure because req_id is not in self.requests. |
If the timeout for forced release is longer than the timeout for aborting a request, this issue should not occur. That said, resolving it properly in the code logic would also be a good approach. |
| def add_delayed_request(self, request_id: str, delay_start_time: float): | ||
| """Add a delayed free request.""" | ||
| with self.done_task_lock: | ||
| if request_id not in self.record_finished_requests: |
There was a problem hiding this comment.
add_delayed_request occurs in the next forward. Therefore, it's possible that DONE_RECVING_MSG is received before add_delayed_request. In this case, this modification would force normally released requests to be forcibly released after timeout.
There was a problem hiding this comment.
The add_delayed_request operation occurs within the execute_model function following the current scheduling round. Requests are forwarded to the D node only after the P node has completed its execution of execute_model. Consequently, any request received by the D node is necessarily present in the P node’s delayed_requests collection. Therefore, the situation you described is deemed not to exist.
There was a problem hiding this comment.
@liziyu179 We've previously considered the scenario described in this PR #2899, so please check whether the current changes are compatible with it.
dff079d to
1c78ad1
Compare
| self.record_finished_requests.add(request_id) | ||
| else: | ||
| self.record_finished_requests.add(request_id) | ||
| self.forced_free_requests.discard(request_id) |
There was a problem hiding this comment.
Here, you only add forced free requests but never remove them. Line 129 only removes requests that became abnormal after force free, which will lead to memory leaks over time.
…equest due to timeout and then receives the completed kv cache pulled by the D-node again. Signed-off-by: liziyu <liziyu16@huawei.com>
1c78ad1 to
21bf2e6
Compare
| if request_id in self.reqs_to_process: | ||
| self.finished_requests.add(request_id) | ||
| self.reqs_to_process.discard(request_id) | ||
| self.delayed_free_requests.pop(request_id, None) |
There was a problem hiding this comment.
You can add an else branch to report the log error. An exception occurs when the req_id received by update_done_task_count is not in the process, indicating a precision issue.
There was a problem hiding this comment.
You can add an else branch to report the log error. An exception occurs when the
req_idreceived byupdate_done_task_countis not in the process, indicating a precision issue.
Yes, we need to remind users about this here.
| self._prefill_pp_size - 1)) | ||
|
|
||
| if self.kv_send_thread is not None: | ||
| for req_id, meta in metadata.requests.items(): |
There was a problem hiding this comment.
The prefill node might lack meta information? If you need the req_id, you need to bring it down from the scheduler side. I didn't see this operation in this PR.
There was a problem hiding this comment.
The prefill node might lack meta information? If you need the req_id, you need to bring it down from the scheduler side. I didn't see this operation in this PR.
You're right, we added a req_in_batch variable to pass the request from the scheduler to the worker.
Signed-off-by: liziyu <liziyu16@huawei.com>
Signed-off-by: liziyu <liziyu16@huawei.com>
|
@LCAIZJ Please merge if this change is fine |
|
@jianzs cc |
Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
|
|
force mege. The CI failure doesn't related to this PR. |
### What this PR does / why we need it? Fix the bug where the P-node's schedule dead after it force-frees a request due to timeout and then receives the completed kv cache pulled by the D-node again. By add list to recode all requests. - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@81786c8 --------- Signed-off-by: liziyu <liziyu16@huawei.com> Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com> Co-authored-by: wangxiaoteng <wangxiaoteng@huawei.com>
… (#5871) ### What this PR does / why we need it? Fix the bug where the P-node's schedule dead after it force-frees a request due to timeout and then receives the completed kv cache pulled by the D-node again. By add list to recode all requests. - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@81786c8 --------- ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Signed-off-by: liziyu <liziyu16@huawei.com> Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com> Co-authored-by: wangxiaoteng <wangxiaoteng@huawei.com>
…to eplb_refactor * 'main' of https://github.com/vllm-project/vllm-ascend: [CI] Fix lint CI (vllm-project#5880) [Feature] implement eagle spec decoding for model runner v2 (vllm-project#5840) [Quantization] Support compressed tensors moe w8a8 int8 dynamic weight (vllm-project#5718) [EPLB][Bugfix] Get expert map from layers (vllm-project#5817) [Bugfix] Fixed an accuracy problem of sp with eagle3 (vllm-project#5816) [P/D] bugfix for p node force free requset (vllm-project#5431) [Lint]Style: Convert `example` to `ruff format` (vllm-project#5863) [Main2Main] Upgrade vllm commit to 0109 (vllm-project#5752) [Bugfix][P/D] fix layerwise connector for decoder tp size > num kv heads (vllm-project#5846) [Test][e2e][LoRA] Add more e2e tests to cover scenarios of LoRA (vllm-project#4075) [CustomOp][Perf] Merge Q/K split to simplify AscendApplyRotaryEmb for better performance (vllm-project#5799) [Lint]Style: Convert `root`, `benchmarks`, `tools` and `docs` to `ruff format` (vllm-project#5843) enable ep32 for dispatch_ffn_combine (vllm-project#5787)
### What this PR does / why we need it? Fix the bug where the P-node's schedule dead after it force-frees a request due to timeout and then receives the completed kv cache pulled by the D-node again. By add list to recode all requests. - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@81786c8 --------- Signed-off-by: liziyu <liziyu16@huawei.com> Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com> Co-authored-by: wangxiaoteng <wangxiaoteng@huawei.com>
### What this PR does / why we need it? Fix the bug where the P-node's schedule dead after it force-frees a request due to timeout and then receives the completed kv cache pulled by the D-node again. By add list to recode all requests. - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@81786c8 --------- Signed-off-by: liziyu <liziyu16@huawei.com> Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com> Co-authored-by: wangxiaoteng <wangxiaoteng@huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
### What this PR does / why we need it? Fix the bug where the P-node's schedule dead after it force-frees a request due to timeout and then receives the completed kv cache pulled by the D-node again. By add list to recode all requests. - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@81786c8 --------- Signed-off-by: liziyu <liziyu16@huawei.com> Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com> Co-authored-by: wangxiaoteng <wangxiaoteng@huawei.com>
### What this PR does / why we need it? Fix the bug where the P-node's schedule dead after it force-frees a request due to timeout and then receives the completed kv cache pulled by the D-node again. By add list to recode all requests. - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@81786c8 --------- Signed-off-by: liziyu <liziyu16@huawei.com> Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com> Co-authored-by: wangxiaoteng <wangxiaoteng@huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
### What this PR does / why we need it? Fix the bug where the P-node's schedule dead after it force-frees a request due to timeout and then receives the completed kv cache pulled by the D-node again. By add list to recode all requests. - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@81786c8 --------- Signed-off-by: liziyu <liziyu16@huawei.com> Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com> Co-authored-by: wangxiaoteng <wangxiaoteng@huawei.com>
What this PR does / why we need it?
Fix the bug where the P-node's schedule dead after it force-frees a request due to timeout and then receives the completed kv cache pulled by the D-node again. By add list to recode all requests.
Does this PR introduce any user-facing change?
How was this patch tested?