fix(kvbm): prevent scheduler crash from race between abort and request completion#6681
Draft
lg-epic wants to merge 1 commit intoai-dynamo:mainfrom
Draft
fix(kvbm): prevent scheduler crash from race between abort and request completion#6681lg-epic wants to merge 1 commit intoai-dynamo:mainfrom
lg-epic wants to merge 1 commit intoai-dynamo:mainfrom
Conversation
Contributor
|
👋 Hi lg-epic! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
ce72dd7 to
fd24328
Compare
9a3fc3e to
b36312e
Compare
…t completion When a request's abort and stop condition coincide in the same EngineCore.step(), _process_aborts_queue() removes the KVBM slot before update_from_output() calls request_finished() again. The missing slot caused request_finished() to return false, deleting the request from self.requests before _update_from_kv_xfer_finished() could process the worker's finished_sending — hitting assert req_id in self.requests. leader.rs: return true when slot is missing, keeping the request alive so finished_sending can be processed normally. worker.rs: replace panic!() with warn + signal completion when a slot disappears from the maybe_finished_offloading/onboarding sets between iterations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
0162715 to
5a5f08d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a race condition in the KVBM connector that crashes the vLLM EngineCore scheduler with
assert req_id in self.requestsin_update_from_kv_xfer_finished. The crash is deterministic under streaming load in aggregated inference mode with KVBM.Root Cause
The race is between
_process_aborts_queue()andupdate_from_output()within a singleEngineCore.step(). When a request's abort and stop condition coincide in the same step:execute_model()— worker reportsfinished_sending(KV offload complete)_process_aborts_queue()— abort arrives, callsrequest_finished(), slot removed, returnstrue(request stays inself.requests)update_from_output()— same request hit a stop condition, callsrequest_finished()again, slot is gone, returnsfalse→ vLLM deletes fromself.requestsupdate_from_output()continues —_update_from_kv_xfer_finished()processesfinished_sending→assert req_id in self.requests→ crashThis is specific to KVBM because KV blocks are offloaded incrementally during generation via
save_kv_layer(). The final offload can complete in the same step the request finishes, sofinished_sendingcontains the request at the same time it's being freed.Changes
truewhen slot is missing inrequest_finished(), keeping the request alive inself.requestsso_update_from_kv_xfer_finished()can processfinished_sendingand free blocks properly. Aligns with the existing TODO comment (L577-587) that the return value should always betrue.panic!()calls withtracing::warn!()+ signal completion when slots disappear frommaybe_finished_offloading/maybe_finished_onboardingsets between iterations.Error observed
```
AssertionError at scheduler.py:1950
assert req_id in self.requests
in _update_from_kv_xfer_finished
```
Preceded by:
```
WARN: request_finished called for request_id: ... but slot is not found
WARN: finished request received for unknown request_id; assuming never started
```
Environment
Dynamo v0.9.0, vLLM 0.15.1+cu130, KVBM connector in aggregated inference on L40S GPUs.
Related
Test plan