Skip to content

server: router fix model unload reload deadlock#22284

Draft
0cc4m wants to merge 15 commits into
masterfrom
0cc4m/server-router-fix-reload-deadlock
Draft

server: router fix model unload reload deadlock#22284
0cc4m wants to merge 15 commits into
masterfrom
0cc4m/server-router-fix-reload-deadlock

Conversation

@0cc4m
Copy link
Copy Markdown
Contributor

@0cc4m 0cc4m commented Apr 23, 2026

Overview

When you run a server in router mode and you get requests for 2 or more models simultaneously that cannot run at the same time (e.g. cause you only allow one model at a time, or cause they don't fit into memory together), then the server gets into a loop.

Assuming we have model A and model B:

  • A is loaded, A and B get requests
  • A is unloaded to make space for B
  • B is unloaded to make space for A
  • etc

This is because the router doesn't track whether a model is in use and just aggressively terminates models to load a new one, then terminates the new one before it even handled one request. This branch is one way to fix that. It's built on top of #21231 because I ran into this problem after solving the memory unloading case. Draft until that is merged.

Only a5355a0 is specific to this PR. Basically I added a counter that tracks open requests against a model and prevents unload until they reach 0, or until a timeout runs out (DEFAULT_STOP_TIMEOUT) in this case. I plan to look at more complex solutions later that load/unload more intelligently.

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: YES, Claude was used for assistance and to write tests.

@github-actions github-actions Bot added examples python python script changes server labels Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples python python script changes server

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants