fix(kvbm): prevent scheduler crash from race between abort and request completion by lg-epic · Pull Request #6681 · ai-dynamo/dynamo

lg-epic · 2026-02-27T18:23:36Z

Summary

Fixes a race condition in the KVBM connector that crashes the vLLM EngineCore scheduler with assert req_id in self.requests in _update_from_kv_xfer_finished. The crash is deterministic under streaming load in aggregated inference mode with KVBM.

Root Cause

The race is between _process_aborts_queue() and update_from_output() within a single EngineCore.step(). When a request's abort and stop condition coincide in the same step:

execute_model() — worker reports finished_sending (KV offload complete)
_process_aborts_queue() — abort arrives, calls request_finished(), slot removed, returns true (request stays in self.requests)
update_from_output() — same request hit a stop condition, calls request_finished() again, slot is gone, returns false → vLLM deletes from self.requests
update_from_output() continues — _update_from_kv_xfer_finished() processes finished_sending → assert req_id in self.requests → crash

This is specific to KVBM because KV blocks are offloaded incrementally during generation via save_kv_layer(). The final offload can complete in the same step the request finishes, so finished_sending contains the request at the same time it's being freed.

Changes

leader.rs: Return true when slot is missing in request_finished(), keeping the request alive in self.requests so _update_from_kv_xfer_finished() can process finished_sending and free blocks properly. Aligns with the existing TODO comment (L577-587) that the return value should always be true.
worker.rs: Replace two panic!() calls with tracing::warn!() + signal completion when slots disappear from maybe_finished_offloading/maybe_finished_onboarding sets between iterations.

Error observed

```
AssertionError at scheduler.py:1950
assert req_id in self.requests
in _update_from_kv_xfer_finished
```

Preceded by:
```
WARN: request_finished called for request_id: ... but slot is not found
WARN: finished request received for unknown request_id; assuming never started
```

Environment

Dynamo v0.9.0, vLLM 0.15.1+cu130, KVBM connector in aggregated inference on L40S GPUs.

Test plan

Existing CI tests pass (Rust compile + unit tests)
Verify with aggregated inference workload under load that the assert req_id in self.requests crash no longer occurs
Verify no request leaks (requests stuck in self.requests forever) by monitoring scheduler queue depth under sustained traffic

copy-pr-bot · 2026-02-27T18:23:41Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2026-02-27T18:23:47Z

👋 Hi lg-epic! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

…t completion When a request's abort and stop condition coincide in the same EngineCore.step(), _process_aborts_queue() removes the KVBM slot before update_from_output() calls request_finished() again. The missing slot caused request_finished() to return false, deleting the request from self.requests before _update_from_kv_xfer_finished() could process the worker's finished_sending — hitting assert req_id in self.requests. leader.rs: return true when slot is missing, keeping the request alive so finished_sending can be processed normally. worker.rs: replace panic!() with warn + signal completion when a slot disappears from the maybe_finished_offloading/onboarding sets between iterations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

lg-epic requested a review from a team as a code owner February 27, 2026 18:23

pull-request-size bot added the size/M label Feb 27, 2026

github-actions bot added the fix label Feb 27, 2026

github-actions bot added the external-contribution Pull request is from an external contributor label Feb 27, 2026

lg-epic marked this pull request as draft February 27, 2026 18:23

lg-epic force-pushed the fix/kvbm-request-finished-race-condition branch 2 times, most recently from ce72dd7 to fd24328 Compare March 5, 2026 00:17

lg-epic force-pushed the fix/kvbm-request-finished-race-condition branch from 9a3fc3e to b36312e Compare March 23, 2026 14:13

lg-epic force-pushed the fix/kvbm-request-finished-race-condition branch from 0162715 to 5a5f08d Compare March 23, 2026 14:19

lg-epic changed the title ~~fix(kvbm): prevent scheduler crash from stale request in finished_sending~~ fix(kvbm): prevent scheduler crash from race between abort and request completion Mar 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(kvbm): prevent scheduler crash from race between abort and request completion#6681

fix(kvbm): prevent scheduler crash from race between abort and request completion#6681
lg-epic wants to merge 1 commit intoai-dynamo:mainfrom
lg-epic:fix/kvbm-request-finished-race-condition

lg-epic commented Feb 27, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Feb 27, 2026

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lg-epic commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Changes

Error observed

Environment

Related

Test plan

Uh oh!

copy-pr-bot bot commented Feb 27, 2026

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lg-epic commented Feb 27, 2026 •

edited

Loading