Skip to content

fix(kvbm): prevent scheduler crash from race between abort and request completion#6681

Draft
lg-epic wants to merge 1 commit intoai-dynamo:mainfrom
lg-epic:fix/kvbm-request-finished-race-condition
Draft

fix(kvbm): prevent scheduler crash from race between abort and request completion#6681
lg-epic wants to merge 1 commit intoai-dynamo:mainfrom
lg-epic:fix/kvbm-request-finished-race-condition

Conversation

@lg-epic
Copy link

@lg-epic lg-epic commented Feb 27, 2026

Summary

Fixes a race condition in the KVBM connector that crashes the vLLM EngineCore scheduler with assert req_id in self.requests in _update_from_kv_xfer_finished. The crash is deterministic under streaming load in aggregated inference mode with KVBM.

Root Cause

The race is between _process_aborts_queue() and update_from_output() within a single EngineCore.step(). When a request's abort and stop condition coincide in the same step:

  1. execute_model() — worker reports finished_sending (KV offload complete)
  2. _process_aborts_queue() — abort arrives, calls request_finished(), slot removed, returns true (request stays in self.requests)
  3. update_from_output() — same request hit a stop condition, calls request_finished() again, slot is gone, returns false → vLLM deletes from self.requests
  4. update_from_output() continues_update_from_kv_xfer_finished() processes finished_sendingassert req_id in self.requests → crash

This is specific to KVBM because KV blocks are offloaded incrementally during generation via save_kv_layer(). The final offload can complete in the same step the request finishes, so finished_sending contains the request at the same time it's being freed.

Changes

  • leader.rs: Return true when slot is missing in request_finished(), keeping the request alive in self.requests so _update_from_kv_xfer_finished() can process finished_sending and free blocks properly. Aligns with the existing TODO comment (L577-587) that the return value should always be true.
  • worker.rs: Replace two panic!() calls with tracing::warn!() + signal completion when slots disappear from maybe_finished_offloading/maybe_finished_onboarding sets between iterations.

Error observed

```
AssertionError at scheduler.py:1950
assert req_id in self.requests
in _update_from_kv_xfer_finished
```

Preceded by:
```
WARN: request_finished called for request_id: ... but slot is not found
WARN: finished request received for unknown request_id; assuming never started
```

Environment

Dynamo v0.9.0, vLLM 0.15.1+cu130, KVBM connector in aggregated inference on L40S GPUs.

Related

Test plan

  • Existing CI tests pass (Rust compile + unit tests)
  • Verify with aggregated inference workload under load that the assert req_id in self.requests crash no longer occurs
  • Verify no request leaks (requests stuck in self.requests forever) by monitoring scheduler queue depth under sustained traffic

@lg-epic lg-epic requested a review from a team as a code owner February 27, 2026 18:23
@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 27, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the fix label Feb 27, 2026
@github-actions
Copy link
Contributor

👋 Hi lg-epic! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions bot added the external-contribution Pull request is from an external contributor label Feb 27, 2026
@lg-epic lg-epic marked this pull request as draft February 27, 2026 18:23
@lg-epic lg-epic force-pushed the fix/kvbm-request-finished-race-condition branch 2 times, most recently from ce72dd7 to fd24328 Compare March 5, 2026 00:17
@lg-epic lg-epic force-pushed the fix/kvbm-request-finished-race-condition branch from 9a3fc3e to b36312e Compare March 23, 2026 14:13
…t completion

When a request's abort and stop condition coincide in the same
EngineCore.step(), _process_aborts_queue() removes the KVBM slot before
update_from_output() calls request_finished() again. The missing slot
caused request_finished() to return false, deleting the request from
self.requests before _update_from_kv_xfer_finished() could process
the worker's finished_sending — hitting assert req_id in self.requests.

leader.rs: return true when slot is missing, keeping the request alive
so finished_sending can be processed normally.

worker.rs: replace panic!() with warn + signal completion when a slot
disappears from the maybe_finished_offloading/onboarding sets between
iterations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@lg-epic lg-epic force-pushed the fix/kvbm-request-finished-race-condition branch from 0162715 to 5a5f08d Compare March 23, 2026 14:19
@lg-epic lg-epic changed the title fix(kvbm): prevent scheduler crash from stale request in finished_sending fix(kvbm): prevent scheduler crash from race between abort and request completion Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contribution Pull request is from an external contributor fix size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant