Conversation
WalkthroughAdds a per-model Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
The conventional way to stop inference is for the client to drop the connection. This propagates through llama-swap to the upstream which is expected to interrupt inference. I don't think killing the process is the right way. I prefer llama-swap to not be a work around for upstreams or clients that don't set limits on inference time. |
|
I see your point - out of scope. I will definitely keep it in my fork as it solves a problem I face. Feel free to close PR, and thanks for the feedback! |
The requestTimeout feature was not working because the timeout context was not connected to the HTTP request. When the timeout fired, it attempted to kill the process but the reverse proxy continued waiting for a response indefinitely. - Use context.WithTimeout() to create a timeout context for the HTTP request - Clone the request with the timeout context before proxying - When timeout fires, the HTTP request is immediately cancelled - Fix StopImmediately() to handle timeouts during model loading (StateStarting) - Add unit test to verify timeout behavior Before: requests would run for 60+ seconds despite requestTimeout: 20 After: requests terminate in exactly 20 seconds as configured Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add brief mention of requestTimeout feature in the customizable features section of README. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@proxy/process_timeout_test.go`:
- Around line 15-98: The mock server handler in TestProcess_RequestTimeout
currently calls t.Fatal from its goroutine; instead create an error channel
(e.g. srvErrCh := make(chan error, 1) in the test), replace the t.Fatal/t.Fatalf
inside the handler (the http.Flusher type assertion and any other fatal
conditions) with sending an error to srvErrCh, and in the main test goroutine
after starting the server and before asserting timeouts, select or read from
srvErrCh and call t.Fatal/t.Fatalf there if an error was received; reference
TestProcess_RequestTimeout and the mock server handler/Flusher type check when
making the changes.
In `@proxy/process.go`:
- Around line 384-395: The stop path can hang if a timeout transitions from
StateStarting to StateStopping before cmd.Start completes because stopCommand
waits on cmdWaitChan which is only closed by waitForCmd after a successful
Start; update stopCommand (and/or swapState) to avoid waiting when the process
never actually started: add a guard that if CurrentState() == StateStarting and
the command hasn't been started (check p.cmd == nil or a started flag), then
skip waiting on cmdWaitChan (or immediately return) OR alternatively ensure
cmdWaitChan is closed when cmd.Start returns an error; reference swapState,
stopCommand, cmdWaitChan, waitForCmd, cmd.Start and StateStarting/StateStopping
when implementing the fix.
Prevent stopCommand() from hanging when a timeout transitions StateStarting to StateStopping before cmd.Start() completes. - Close cmdWaitChan when cmd.Start() fails - Add guard in stopCommand() to skip wait if process never started - Add tests for hang scenarios - Fix TestProcess_RequestTimeout to avoid calling t.Fatal from handler goroutine Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@proxy/process_test.go`:
- Around line 627-667: The assert.Error(t, err) must not run inside the
goroutine; change the goroutine that calls process.start() to send its returned
error into a buffered channel (e.g., errCh := make(chan error, 1)) and close
startDone there, then in the main test goroutine receive from errCh and call
assert.Error(t, err) there; keep the existing calls to
process.StopImmediately(), startDone/stopDone synchronization, and use
process.CurrentState(), StateStarting, StateStopped, and StateStopping as
before.
In `@proxy/process.go`:
- Around line 390-401: The Stop logic can fail if the process state changes
between p.CurrentState() and p.swapState(), causing ErrExpectedStateMismatch to
return and abort the stop; change the block around p.swapState(currentState,
StateStopping) to retry on ErrExpectedStateMismatch: loop by re-reading
currentState via p.CurrentState(), verify isValidTransition(currentState,
StateStopping) each iteration, attempt swapState, break on success or on any
non-ErrExpectedStateMismatch error (logging via p.proxyLogger.Infof as currently
done); follow the same retry pattern used in Start() so transient state flips
(e.g., StateStarting → StateReady) will be handled and the Stop will proceed.
- Around line 432-449: stopCommand reads p.cmd, p.cancelUpstream and
p.cmdWaitChan under p.cmdMutex.RLock(), but start assigns p.cmd and the related
fields without holding p.cmdMutex, causing a data race; fix by updating start to
acquire the write lock (p.cmdMutex.Lock()) while setting p.cmd, p.cancelUpstream
and p.cmdWaitChan (and release after assignment) so readers in stopCommand (and
any other readers using p.cmdMutex.RLock()) see a consistent, synchronized
state; ensure start uses the same mutex (p.cmdMutex) and covers all related
field assignments in the critical section.
38e97f6 to
c807c6a
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@proxy/process.go`:
- Around line 444-458: The current stopCommand flow can return without invoking
cancelUpstream() when cmdStarted is false, dropping the cancel signal; modify
stopCommand so that after checking cancelUpstream != nil you call
cancelUpstream() immediately (use p.cancelUpstream or the cancelUpstream
variable) and then only skip waiting on cmdWaitChan if cmdStarted is false; keep
the existing p.cmdMutex.RUnlock() usage and the debug log for the skipped wait,
but ensure cancellation is always invoked before any early return.
- Around line 254-286: The race occurs because p.cancelUpstream and
p.cmdWaitChan are only set after cmd.Start(), so stopCommand() can miss
cancellation; move the publication of p.cancelUpstream and p.cmdWaitChan (and
assignment of p.cmd) into p.cmdMutex-protected section before calling
p.cmd.Start(), then call p.cmd.Start(), and only under the same mutex set
p.cmdStarted = true on success (and if Start returns error still close
p.cmdWaitChan while holding the lock). Update references to cmdStarted,
cancelUpstream, cmdWaitChan and stopCommand() accordingly so stopCommand() can
observe cancelUpstream even if it races with Start().
c807c6a to
4c42bf6
Compare
Fix data race between Process.start() and Process.stopCommand() by using a cmdStarted flag instead of checking cmd.Process directly. Also fix hang when StopImmediately() is called during startup. Changes: - Add cmdStarted bool flag to track if Start() completed successfully - Initialize cancelUpstream and cmdWaitChan before Start() so stopCommand() can access them during startup - Always call cancelUpstream() in stopCommand() to cancel context, even if Start() hasn't completed - Only wait on cmdWaitChan if cmdStarted is true to avoid hanging - Reset cmdStarted=false when process exits - Change cancelUpstream==nil error to debug message (expected in tests using forceState()) - Use platform-appropriate sleep command in tests (timeout on Windows, sleep on Unix) Fixes race condition and Windows test failures. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
d1ab05e to
6c14013
Compare
Fix race condition in StopImmediately where state changes between CurrentState() and swapState() could cause stop to abort. Add retry loop to handle ErrExpectedStateMismatch. Fix test race condition where assert.Error was called inside goroutine. Move error assertion to main test goroutine using buffered channel. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@proxy/process.go`:
- Around line 549-575: The current code reuses requestCtx (from requestCtx,
cancel := context.WithTimeout(...)) for both cloning the incoming request and
for timeout monitoring, but WithTimeout inherits parent deadlines and can
misattribute parent-imposed DeadlineExceeded to your configured timeout; fix by
creating a separate timeout-only context (e.g., timeoutCtx, timeoutCancel :=
context.WithTimeout(r.Context(), timeoutDuration)) and use that timeoutCtx
exclusively in the goroutine to detect your configured timeout and call
p.StopImmediately(), while leaving requestCtx as the original r.Context() (or
the cloned requestCtx derived from r.Context() only when needed via
r.Clone(requestCtx)); ensure you defer timeoutCancel() and update uses of
timeoutCancel/requestCtx accordingly so p.config.RequestTimeout,
timeoutDuration, timeoutCtx, timeoutCancel, requestCtx, r.Clone(...) and
p.StopImmediately() are the right symbols to change.
Fix getSleepCommand to use full path to timeout.exe on Windows. This avoids conflict with GNU coreutils timeout command in Git Bash environments on CI. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Separate request context from timeout monitoring context to avoid misattributing parent-imposed deadlines. Create monitoring context from context.Background() to reliably detect configured timeout, while maintaining request timeout for proper cancellation. - requestCtx: with timeout for request cancellation - timeoutCtx: from Background() for monitoring only - prevents false positives from parent context deadlines Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
I had the problem that some (especially thinking qwen3 models) never stop inferencing on certain tasks. this shall be a safeguard which kills an active request if it takes too long, preventing GPU overheat and resource blockades.
Summary by CodeRabbit
New Features
Documentation
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.