[BugFix] scheduler: Delay freeing blocks of aborted async loads#32255
[BugFix] scheduler: Delay freeing blocks of aborted async loads#32255markmc merged 1 commit intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request correctly fixes an issue where KV cache blocks for aborted requests waiting for remote KVs were not freed properly. The changes introduce a delay in freeing blocks until the remote KV transfer completes, which is the right approach. The new tests adequately cover this bug fix scenario. However, I've identified a potential critical issue in _update_from_kv_xfer_finished where a request might be freed twice if its ID appears in both finished_recving and finished_sending lists, potentially causing a crash. I've provided a suggestion to refactor this part to make it more robust.
502aec5 to
3fe1143
Compare
I don't think there's a use-case that the same request will be concurrently finished sending and recving. |
3fe1143 to
078eaf7
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
078eaf7 to
cfd80b1
Compare
markmc
left a comment
There was a problem hiding this comment.
Looks reasonable to me, but I think all of this logic is becoming extremely brittle and it is very difficult now to reassure yourself that you're not missing some corner case
The interplay of async vs sync scheduling, async vs sync loads, delayed KV block frees, recompute vs abort on KV load failure, streaming input requests, etc. etc. is getting a bit much
| scheduler.update_from_output(scheduler_output, model_runner_output) | ||
|
|
||
| # assert request is deleted | ||
| assert request.request_id not in scheduler.requests |
There was a problem hiding this comment.
I wondered whether it would be useful to check the blocks were freed, as the NIXL tests do:
req_to_blocks = scheduler.kv_cache_manager.coordinator.single_type_managers[0].req_to_blocks
assert req0_id not in req_to_blocks
```
but in either case we're verifying that `_free_blocks()` was called, so I guess not
(just noting for reference)
It would be nice to hear a bit more about how exactly this bug manifested ... some sort of crash or memory corruption when the offloading connector attempted to write to freed blocks? Adding logs showing this to the PR description would be helpful ... |
There was another bug (or more accurately bugs) I was trying to catch here: #29781 |
…ests This commit fixes the scheduler to delay the freeing of KV cache blocks of requests that are waiting for remote KVs to be loaded. Signed-off-by: Or Ozeri <oro@il.ibm.com>
cfd80b1 to
66df8d1
Compare
Ok, I guess even when it happens, you need the blocks to be reallocated to another request, and even then the only symptom might be garbage output? (Thanks for the explanation) |
Right, you will get corrupted KV data. |
…nc loads (vllm-project#32255) Fixes a not-yet-reported case where it was possible for blocks to be freed by an abort before an async transfer completed, resulting in corrupted KV data. Signed-off-by: Or Ozeri <oro@il.ibm.com> Signed-off-by: felix01.yu <felix01.yu@vipshop.com>
…nc loads (vllm-project#32255) Fixes a not-yet-reported case where it was possible for blocks to be freed by an abort before an async transfer completed, resulting in corrupted KV data. Signed-off-by: Or Ozeri <oro@il.ibm.com>
…nc loads (vllm-project#32255) Fixes a not-yet-reported case where it was possible for blocks to be freed by an abort before an async transfer completed, resulting in corrupted KV data. Signed-off-by: Or Ozeri <oro@il.ibm.com>
This PR fixes the scheduler to delay the freeing of KV cache blocks of aborted requests that are waiting for remote KVs to be loaded.