feat: add timeout feature by overcuriousity · Pull Request #493 · mostlygeek/llama-swap

overcuriousity · 2026-01-30T14:24:52Z

I had the problem that some (especially thinking qwen3 models) never stop inferencing on certain tasks. this shall be a safeguard which kills an active request if it takes too long, preventing GPU overheat and resource blockades.

Summary by CodeRabbit

New Features
- Added per-model requestTimeout to limit single-request runtime and prevent runaway inferences (default 0 = disabled).
Documentation
- Clarified wording to "automatic unloading of models after idle timeout" and documented the new request timeout setting, defaults, and behavior.
Bug Fixes
- Improved stop/terminate behavior and timeout-driven termination to avoid hangs and races during startup/stop.
Tests
- Added tests for request timeout behavior and stop-during-startup scenarios.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-30T14:25:01Z

Walkthrough

Adds a per-model requestTimeout setting and enforces request-level timeouts in the proxy by cancelling timed-out requests and forcefully stopping the model process; introduces cmd lifecycle guards (cmdStarted, stop/start race fixes) and tests for timeout and start/stop race conditions.

Changes

Cohort / File(s)	Summary
Docs & examples `README.md`, `docs/configuration.md`, `config.example.yaml`	Documented new per-model `requestTimeout` (integer, default 0 = disabled) and guidance on behavior when a request exceeds the timeout.
Schema `config-schema.json`	Added `requestTimeout` property under per-model config (`root.properties.models.additionalProperties.properties.requestTimeout`) with minimum 0 and default 0.
Config model `proxy/config/model_config.go`	Added `RequestTimeout int` field to `ModelConfig` and set default 0 in YAML defaults.
Proxy implementation `proxy/process.go`	Introduced `cmdStarted` flag; tightened start/stop synchronization and deadlock avoidance; `ProxyRequest` gains request-level timeout context and goroutine to call `StopImmediately()` on deadline; updated stop/wait logic and logging.
Tests `proxy/process_timeout_test.go`, `proxy/process_test.go`	Added tests covering request timeout enforcement (mock streaming server) and start/stop race conditions (stop not hanging when start fails / during startup).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Fix race conditions in proxy.Process #349: Overlaps on proxy/process.go lifecycle, start/stop synchronization and cmd wait handling.
Change /unload to not wait for inflight requests (#125) #126: Modifies process shutdown/control flow in proxy/process.go, overlapping stop/StopImmediately behavior.
Add a concurrency limit to Process.ProxyRequest #123: Changes ProxyRequest and request lifecycle semantics; related to request timeout integration.

Suggested labels

enhancement

Suggested reviewers

mostlygeek

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'feat: add timeout feature' is vague and generic—it uses non-descriptive language that doesn't convey the specific nature of the change (request timeout protection for inference).	Consider a more specific title like 'feat: add request timeout protection to prevent runaway inference' that clearly indicates the feature purpose.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

mostlygeek · 2026-01-30T16:37:08Z

The conventional way to stop inference is for the client to drop the connection. This propagates through llama-swap to the upstream which is expected to interrupt inference.

I don't think killing the process is the right way. I prefer llama-swap to not be a work around for upstreams or clients that don't set limits on inference time.

overcuriousity · 2026-01-30T17:02:13Z

I see your point - out of scope.
Anyways, as some clients just dont have this option, and this is completely optional, it won´t hurt either. Its just a safe-guard for configurations which are affected by the described problem.

I will definitely keep it in my fork as it solves a problem I face.

Feel free to close PR, and thanks for the feedback!

The requestTimeout feature was not working because the timeout context was not connected to the HTTP request. When the timeout fired, it attempted to kill the process but the reverse proxy continued waiting for a response indefinitely. - Use context.WithTimeout() to create a timeout context for the HTTP request - Clone the request with the timeout context before proxying - When timeout fires, the HTTP request is immediately cancelled - Fix StopImmediately() to handle timeouts during model loading (StateStarting) - Add unit test to verify timeout behavior Before: requests would run for 60+ seconds despite requestTimeout: 20 After: requests terminate in exactly 20 seconds as configured Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Add brief mention of requestTimeout feature in the customizable features section of README. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@proxy/process_timeout_test.go`:
- Around line 15-98: The mock server handler in TestProcess_RequestTimeout
currently calls t.Fatal from its goroutine; instead create an error channel
(e.g. srvErrCh := make(chan error, 1) in the test), replace the t.Fatal/t.Fatalf
inside the handler (the http.Flusher type assertion and any other fatal
conditions) with sending an error to srvErrCh, and in the main test goroutine
after starting the server and before asserting timeouts, select or read from
srvErrCh and call t.Fatal/t.Fatalf there if an error was received; reference
TestProcess_RequestTimeout and the mock server handler/Flusher type check when
making the changes.

In `@proxy/process.go`:
- Around line 384-395: The stop path can hang if a timeout transitions from
StateStarting to StateStopping before cmd.Start completes because stopCommand
waits on cmdWaitChan which is only closed by waitForCmd after a successful
Start; update stopCommand (and/or swapState) to avoid waiting when the process
never actually started: add a guard that if CurrentState() == StateStarting and
the command hasn't been started (check p.cmd == nil or a started flag), then
skip waiting on cmdWaitChan (or immediately return) OR alternatively ensure
cmdWaitChan is closed when cmd.Start returns an error; reference swapState,
stopCommand, cmdWaitChan, waitForCmd, cmd.Start and StateStarting/StateStopping
when implementing the fix.

proxy/process_timeout_test.go

Prevent stopCommand() from hanging when a timeout transitions StateStarting to StateStopping before cmd.Start() completes. - Close cmdWaitChan when cmd.Start() fails - Add guard in stopCommand() to skip wait if process never started - Add tests for hang scenarios - Fix TestProcess_RequestTimeout to avoid calling t.Fatal from handler goroutine Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@proxy/process_test.go`:
- Around line 627-667: The assert.Error(t, err) must not run inside the
goroutine; change the goroutine that calls process.start() to send its returned
error into a buffered channel (e.g., errCh := make(chan error, 1)) and close
startDone there, then in the main test goroutine receive from errCh and call
assert.Error(t, err) there; keep the existing calls to
process.StopImmediately(), startDone/stopDone synchronization, and use
process.CurrentState(), StateStarting, StateStopped, and StateStopping as
before.

In `@proxy/process.go`:
- Around line 390-401: The Stop logic can fail if the process state changes
between p.CurrentState() and p.swapState(), causing ErrExpectedStateMismatch to
return and abort the stop; change the block around p.swapState(currentState,
StateStopping) to retry on ErrExpectedStateMismatch: loop by re-reading
currentState via p.CurrentState(), verify isValidTransition(currentState,
StateStopping) each iteration, attempt swapState, break on success or on any
non-ErrExpectedStateMismatch error (logging via p.proxyLogger.Infof as currently
done); follow the same retry pattern used in Start() so transient state flips
(e.g., StateStarting → StateReady) will be handled and the Stop will proceed.
- Around line 432-449: stopCommand reads p.cmd, p.cancelUpstream and
p.cmdWaitChan under p.cmdMutex.RLock(), but start assigns p.cmd and the related
fields without holding p.cmdMutex, causing a data race; fix by updating start to
acquire the write lock (p.cmdMutex.Lock()) while setting p.cmd, p.cancelUpstream
and p.cmdWaitChan (and release after assignment) so readers in stopCommand (and
any other readers using p.cmdMutex.RLock()) see a consistent, synchronized
state; ensure start uses the same mutex (p.cmdMutex) and covers all related
field assignments in the critical section.

proxy/process_test.go

proxy/process.go

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@proxy/process.go`:
- Around line 444-458: The current stopCommand flow can return without invoking
cancelUpstream() when cmdStarted is false, dropping the cancel signal; modify
stopCommand so that after checking cancelUpstream != nil you call
cancelUpstream() immediately (use p.cancelUpstream or the cancelUpstream
variable) and then only skip waiting on cmdWaitChan if cmdStarted is false; keep
the existing p.cmdMutex.RUnlock() usage and the debug log for the skipped wait,
but ensure cancellation is always invoked before any early return.
- Around line 254-286: The race occurs because p.cancelUpstream and
p.cmdWaitChan are only set after cmd.Start(), so stopCommand() can miss
cancellation; move the publication of p.cancelUpstream and p.cmdWaitChan (and
assignment of p.cmd) into p.cmdMutex-protected section before calling
p.cmd.Start(), then call p.cmd.Start(), and only under the same mutex set
p.cmdStarted = true on success (and if Start returns error still close
p.cmdWaitChan while holding the lock). Update references to cmdStarted,
cancelUpstream, cmdWaitChan and stopCommand() accordingly so stopCommand() can
observe cancelUpstream even if it races with Start().

proxy/process.go

Fix data race between Process.start() and Process.stopCommand() by using a cmdStarted flag instead of checking cmd.Process directly. Also fix hang when StopImmediately() is called during startup. Changes: - Add cmdStarted bool flag to track if Start() completed successfully - Initialize cancelUpstream and cmdWaitChan before Start() so stopCommand() can access them during startup - Always call cancelUpstream() in stopCommand() to cancel context, even if Start() hasn't completed - Only wait on cmdWaitChan if cmdStarted is true to avoid hanging - Reset cmdStarted=false when process exits - Change cancelUpstream==nil error to debug message (expected in tests using forceState()) - Use platform-appropriate sleep command in tests (timeout on Windows, sleep on Unix) Fixes race condition and Windows test failures. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Fix race condition in StopImmediately where state changes between CurrentState() and swapState() could cause stop to abort. Add retry loop to handle ErrExpectedStateMismatch. Fix test race condition where assert.Error was called inside goroutine. Move error assertion to main test goroutine using buffered channel. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@proxy/process.go`:
- Around line 549-575: The current code reuses requestCtx (from requestCtx,
cancel := context.WithTimeout(...)) for both cloning the incoming request and
for timeout monitoring, but WithTimeout inherits parent deadlines and can
misattribute parent-imposed DeadlineExceeded to your configured timeout; fix by
creating a separate timeout-only context (e.g., timeoutCtx, timeoutCancel :=
context.WithTimeout(r.Context(), timeoutDuration)) and use that timeoutCtx
exclusively in the goroutine to detect your configured timeout and call
p.StopImmediately(), while leaving requestCtx as the original r.Context() (or
the cloned requestCtx derived from r.Context() only when needed via
r.Clone(requestCtx)); ensure you defer timeoutCancel() and update uses of
timeoutCancel/requestCtx accordingly so p.config.RequestTimeout,
timeoutDuration, timeoutCtx, timeoutCancel, requestCtx, r.Clone(...) and
p.StopImmediately() are the right symbols to change.

proxy/process.go

Fix getSleepCommand to use full path to timeout.exe on Windows. This avoids conflict with GNU coreutils timeout command in Git Bash environments on CI. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Separate request context from timeout monitoring context to avoid misattributing parent-imposed deadlines. Create monitoring context from context.Background() to reliably detect configured timeout, while maintaining request timeout for proper cancellation. - requestCtx: with timeout for request cancellation - timeoutCtx: from Background() for monitoring only - prevents false positives from parent context deadlines Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

add timeout feature

3989c17

overcuriousity and others added 2 commits January 31, 2026 00:27

docs: add requestTimeout to README features list

0e86bbc

Add brief mention of requestTimeout feature in the customizable features section of README. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

overcuriousity marked this pull request as ready for review January 31, 2026 00:38

coderabbitai bot reviewed Jan 31, 2026

View reviewed changes

proxy/process_timeout_test.go Show resolved Hide resolved

coderabbitai bot reviewed Jan 31, 2026

View reviewed changes

proxy/process_test.go Show resolved Hide resolved

proxy/process.go Outdated Show resolved Hide resolved

proxy/process.go Show resolved Hide resolved

overcuriousity force-pushed the feat--timeout branch from 38e97f6 to c807c6a Compare January 31, 2026 18:25

coderabbitai bot reviewed Jan 31, 2026

View reviewed changes

proxy/process.go Show resolved Hide resolved

proxy/process.go Show resolved Hide resolved

overcuriousity force-pushed the feat--timeout branch from c807c6a to 4c42bf6 Compare January 31, 2026 18:35

overcuriousity force-pushed the feat--timeout branch from d1ab05e to 6c14013 Compare January 31, 2026 18:45

coderabbitai bot reviewed Jan 31, 2026

View reviewed changes

proxy/process.go Show resolved Hide resolved

Overcuriousity and others added 4 commits January 31, 2026 20:07

proxy: fix Windows timeout command conflict

a502ebd

Fix getSleepCommand to use full path to timeout.exe on Windows. This avoids conflict with GNU coreutils timeout command in Git Bash environments on CI. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Merge branch 'mostlygeek:main' into feat--timeout

960e78d

Merge branch 'mostlygeek:main' into feat--timeout

7d68a64

overcuriousity closed this Mar 5, 2026

overcuriousity deleted the feat--timeout branch March 5, 2026 12:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add timeout feature#493

feat: add timeout feature#493
overcuriousity wants to merge 10 commits intomostlygeek:mainfrom
overcuriousity:feat--timeout

overcuriousity commented Jan 30, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 30, 2026 •

edited

Loading

Uh oh!

mostlygeek commented Jan 30, 2026

Uh oh!

overcuriousity commented Jan 30, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

overcuriousity commented Jan 30, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

mostlygeek commented Jan 30, 2026

Uh oh!

overcuriousity commented Jan 30, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

overcuriousity commented Jan 30, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 30, 2026 •

edited

Loading