Skip to content

feat: add timeout feature#493

Closed
overcuriousity wants to merge 10 commits intomostlygeek:mainfrom
overcuriousity:feat--timeout
Closed

feat: add timeout feature#493
overcuriousity wants to merge 10 commits intomostlygeek:mainfrom
overcuriousity:feat--timeout

Conversation

@overcuriousity
Copy link
Copy Markdown

@overcuriousity overcuriousity commented Jan 30, 2026

I had the problem that some (especially thinking qwen3 models) never stop inferencing on certain tasks. this shall be a safeguard which kills an active request if it takes too long, preventing GPU overheat and resource blockades.

Summary by CodeRabbit

  • New Features

    • Added per-model requestTimeout to limit single-request runtime and prevent runaway inferences (default 0 = disabled).
  • Documentation

    • Clarified wording to "automatic unloading of models after idle timeout" and documented the new request timeout setting, defaults, and behavior.
  • Bug Fixes

    • Improved stop/terminate behavior and timeout-driven termination to avoid hangs and races during startup/stop.
  • Tests

    • Added tests for request timeout behavior and stop-during-startup scenarios.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 30, 2026

Walkthrough

Adds a per-model requestTimeout setting and enforces request-level timeouts in the proxy by cancelling timed-out requests and forcefully stopping the model process; introduces cmd lifecycle guards (cmdStarted, stop/start race fixes) and tests for timeout and start/stop race conditions.

Changes

Cohort / File(s) Summary
Docs & examples
README.md, docs/configuration.md, config.example.yaml
Documented new per-model requestTimeout (integer, default 0 = disabled) and guidance on behavior when a request exceeds the timeout.
Schema
config-schema.json
Added requestTimeout property under per-model config (root.properties.models.additionalProperties.properties.requestTimeout) with minimum 0 and default 0.
Config model
proxy/config/model_config.go
Added RequestTimeout int field to ModelConfig and set default 0 in YAML defaults.
Proxy implementation
proxy/process.go
Introduced cmdStarted flag; tightened start/stop synchronization and deadlock avoidance; ProxyRequest gains request-level timeout context and goroutine to call StopImmediately() on deadline; updated stop/wait logic and logging.
Tests
proxy/process_timeout_test.go, proxy/process_test.go
Added tests covering request timeout enforcement (mock streaming server) and start/stop race conditions (stop not hanging when start fails / during startup).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

enhancement

Suggested reviewers

  • mostlygeek
🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'feat: add timeout feature' is vague and generic—it uses non-descriptive language that doesn't convey the specific nature of the change (request timeout protection for inference). Consider a more specific title like 'feat: add request timeout protection to prevent runaway inference' that clearly indicates the feature purpose.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mostlygeek
Copy link
Copy Markdown
Owner

The conventional way to stop inference is for the client to drop the connection. This propagates through llama-swap to the upstream which is expected to interrupt inference.

I don't think killing the process is the right way. I prefer llama-swap to not be a work around for upstreams or clients that don't set limits on inference time.

@overcuriousity
Copy link
Copy Markdown
Author

I see your point - out of scope.
Anyways, as some clients just dont have this option, and this is completely optional, it won´t hurt either. Its just a safe-guard for configurations which are affected by the described problem.

I will definitely keep it in my fork as it solves a problem I face.

Feel free to close PR, and thanks for the feedback!

overcuriousity and others added 2 commits January 31, 2026 00:27
The requestTimeout feature was not working because the timeout
context was not connected to the HTTP request. When the timeout
fired, it attempted to kill the process but the reverse proxy
continued waiting for a response indefinitely.

- Use context.WithTimeout() to create a timeout context for the HTTP request
- Clone the request with the timeout context before proxying
- When timeout fires, the HTTP request is immediately cancelled
- Fix StopImmediately() to handle timeouts during model loading (StateStarting)
- Add unit test to verify timeout behavior

Before: requests would run for 60+ seconds despite requestTimeout: 20
After: requests terminate in exactly 20 seconds as configured

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add brief mention of requestTimeout feature in the customizable
features section of README.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@overcuriousity overcuriousity marked this pull request as ready for review January 31, 2026 00:38
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@proxy/process_timeout_test.go`:
- Around line 15-98: The mock server handler in TestProcess_RequestTimeout
currently calls t.Fatal from its goroutine; instead create an error channel
(e.g. srvErrCh := make(chan error, 1) in the test), replace the t.Fatal/t.Fatalf
inside the handler (the http.Flusher type assertion and any other fatal
conditions) with sending an error to srvErrCh, and in the main test goroutine
after starting the server and before asserting timeouts, select or read from
srvErrCh and call t.Fatal/t.Fatalf there if an error was received; reference
TestProcess_RequestTimeout and the mock server handler/Flusher type check when
making the changes.

In `@proxy/process.go`:
- Around line 384-395: The stop path can hang if a timeout transitions from
StateStarting to StateStopping before cmd.Start completes because stopCommand
waits on cmdWaitChan which is only closed by waitForCmd after a successful
Start; update stopCommand (and/or swapState) to avoid waiting when the process
never actually started: add a guard that if CurrentState() == StateStarting and
the command hasn't been started (check p.cmd == nil or a started flag), then
skip waiting on cmdWaitChan (or immediately return) OR alternatively ensure
cmdWaitChan is closed when cmd.Start returns an error; reference swapState,
stopCommand, cmdWaitChan, waitForCmd, cmd.Start and StateStarting/StateStopping
when implementing the fix.

Prevent stopCommand() from hanging when a timeout transitions
StateStarting to StateStopping before cmd.Start() completes.

- Close cmdWaitChan when cmd.Start() fails
- Add guard in stopCommand() to skip wait if process never started
- Add tests for hang scenarios
- Fix TestProcess_RequestTimeout to avoid calling t.Fatal from handler goroutine

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@proxy/process_test.go`:
- Around line 627-667: The assert.Error(t, err) must not run inside the
goroutine; change the goroutine that calls process.start() to send its returned
error into a buffered channel (e.g., errCh := make(chan error, 1)) and close
startDone there, then in the main test goroutine receive from errCh and call
assert.Error(t, err) there; keep the existing calls to
process.StopImmediately(), startDone/stopDone synchronization, and use
process.CurrentState(), StateStarting, StateStopped, and StateStopping as
before.

In `@proxy/process.go`:
- Around line 390-401: The Stop logic can fail if the process state changes
between p.CurrentState() and p.swapState(), causing ErrExpectedStateMismatch to
return and abort the stop; change the block around p.swapState(currentState,
StateStopping) to retry on ErrExpectedStateMismatch: loop by re-reading
currentState via p.CurrentState(), verify isValidTransition(currentState,
StateStopping) each iteration, attempt swapState, break on success or on any
non-ErrExpectedStateMismatch error (logging via p.proxyLogger.Infof as currently
done); follow the same retry pattern used in Start() so transient state flips
(e.g., StateStarting → StateReady) will be handled and the Stop will proceed.
- Around line 432-449: stopCommand reads p.cmd, p.cancelUpstream and
p.cmdWaitChan under p.cmdMutex.RLock(), but start assigns p.cmd and the related
fields without holding p.cmdMutex, causing a data race; fix by updating start to
acquire the write lock (p.cmdMutex.Lock()) while setting p.cmd, p.cancelUpstream
and p.cmdWaitChan (and release after assignment) so readers in stopCommand (and
any other readers using p.cmdMutex.RLock()) see a consistent, synchronized
state; ensure start uses the same mutex (p.cmdMutex) and covers all related
field assignments in the critical section.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@proxy/process.go`:
- Around line 444-458: The current stopCommand flow can return without invoking
cancelUpstream() when cmdStarted is false, dropping the cancel signal; modify
stopCommand so that after checking cancelUpstream != nil you call
cancelUpstream() immediately (use p.cancelUpstream or the cancelUpstream
variable) and then only skip waiting on cmdWaitChan if cmdStarted is false; keep
the existing p.cmdMutex.RUnlock() usage and the debug log for the skipped wait,
but ensure cancellation is always invoked before any early return.
- Around line 254-286: The race occurs because p.cancelUpstream and
p.cmdWaitChan are only set after cmd.Start(), so stopCommand() can miss
cancellation; move the publication of p.cancelUpstream and p.cmdWaitChan (and
assignment of p.cmd) into p.cmdMutex-protected section before calling
p.cmd.Start(), then call p.cmd.Start(), and only under the same mutex set
p.cmdStarted = true on success (and if Start returns error still close
p.cmdWaitChan while holding the lock). Update references to cmdStarted,
cancelUpstream, cmdWaitChan and stopCommand() accordingly so stopCommand() can
observe cancelUpstream even if it races with Start().

Fix data race between Process.start() and Process.stopCommand() by using a cmdStarted flag instead of checking cmd.Process directly. Also fix hang when StopImmediately() is called during startup.

Changes:
- Add cmdStarted bool flag to track if Start() completed successfully
- Initialize cancelUpstream and cmdWaitChan before Start() so stopCommand() can access them during startup
- Always call cancelUpstream() in stopCommand() to cancel context, even if Start() hasn't completed
- Only wait on cmdWaitChan if cmdStarted is true to avoid hanging
- Reset cmdStarted=false when process exits
- Change cancelUpstream==nil error to debug message (expected in tests using forceState())
- Use platform-appropriate sleep command in tests (timeout on Windows, sleep on Unix)

Fixes race condition and Windows test failures.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fix race condition in StopImmediately where state changes between
CurrentState() and swapState() could cause stop to abort. Add retry
loop to handle ErrExpectedStateMismatch.

Fix test race condition where assert.Error was called inside goroutine.
Move error assertion to main test goroutine using buffered channel.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@proxy/process.go`:
- Around line 549-575: The current code reuses requestCtx (from requestCtx,
cancel := context.WithTimeout(...)) for both cloning the incoming request and
for timeout monitoring, but WithTimeout inherits parent deadlines and can
misattribute parent-imposed DeadlineExceeded to your configured timeout; fix by
creating a separate timeout-only context (e.g., timeoutCtx, timeoutCancel :=
context.WithTimeout(r.Context(), timeoutDuration)) and use that timeoutCtx
exclusively in the goroutine to detect your configured timeout and call
p.StopImmediately(), while leaving requestCtx as the original r.Context() (or
the cloned requestCtx derived from r.Context() only when needed via
r.Clone(requestCtx)); ensure you defer timeoutCancel() and update uses of
timeoutCancel/requestCtx accordingly so p.config.RequestTimeout,
timeoutDuration, timeoutCtx, timeoutCancel, requestCtx, r.Clone(...) and
p.StopImmediately() are the right symbols to change.

Overcuriousity and others added 4 commits January 31, 2026 20:07
Fix getSleepCommand to use full path to timeout.exe on Windows. This avoids conflict with GNU coreutils timeout command in Git Bash environments on CI.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Separate request context from timeout monitoring context to avoid misattributing parent-imposed deadlines. Create monitoring context from context.Background() to reliably detect configured timeout, while maintaining request timeout for proper cancellation.

- requestCtx: with timeout for request cancellation
- timeoutCtx: from Background() for monitoring only
- prevents false positives from parent context deadlines

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@overcuriousity overcuriousity deleted the feat--timeout branch March 5, 2026 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants