Stream loading state when swapping models#371
Conversation
WalkthroughAdds a global Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant ProxyHandler as Proxy Handler
participant Proc as Process Manager
participant SSE as statusResponseWriter
participant Upstream as Upstream LLM
Client->>ProxyHandler: POST ... (body includes "stream": true)
ProxyHandler->>Proc: Check model Ready?
alt Not Ready && config.SendLoadingState == true
ProxyHandler->>SSE: create SSE writer & send headers
activate SSE
loop while Not Ready & not cancelled
SSE->>Client: SSE message (loading state / remark)
SSE-->>SSE: wait / tick
end
deactivate SSE
end
ProxyHandler->>Upstream: forward request to upstream (reverse proxy)
Upstream->>Client: final streamed response
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
🧰 Additional context used📓 Path-based instructions (1)**/*.go📄 CodeRabbit inference engine (CLAUDE.md)
Files:
🧠 Learnings (1)📚 Learning: 2025-10-29T05:26:34.956ZApplied to files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
proxy/proxymanager.go (1)
504-508: Typo in error message“could not ready request body” → “could not read request body”.
- pm.sendErrorResponse(c, http.StatusBadRequest, "could not ready request body") + pm.sendErrorResponse(c, http.StatusBadRequest, "could not read request body")
🧹 Nitpick comments (5)
proxy/config/model_config.go (1)
39-41: YAML: consider omitting nulls for cleaner round‑tripsSince SendLoadingState is a pointer, add omitempty to avoid writing sendLoadingState: null on marshal when unset.
Apply:
- SendLoadingState *bool `yaml:"sendLoadingState"` + SendLoadingState *bool `yaml:"sendLoadingState,omitempty"`config.example.yaml (1)
38-45: Clarify field name to match API (“reasoning_content”)Docs say “reasoning (thinking) field”. If streaming goes into reasoning_content, name it explicitly to avoid confusion.
-# sendLoadingState: inject loading status updates into the reasoning (thinking) -# field +# sendLoadingState: inject loading status updates into the reasoning_content fieldConfirm the exact response field your stream writer uses and align terminology here.
proxy/process.go (3)
496-496: Rename variable for clarity.The variable name
swapCtxis misleading. This context controls the loading status streaming goroutine, not model swapping.Apply this diff:
- swapCtx, cancelLoadCtx := context.WithCancel(r.Context()) + loadingCtx, cancelLoadCtx := context.WithCancel(r.Context())And update line 505:
- go srw.statusUpdates(swapCtx) + go srw.statusUpdates(loadingCtx)
541-551: Consider simplifying the nested conditional logic.The proxy handling logic could be flattened for better readability.
Apply this diff:
- // - if srw != nil && p.reverseProxy != nil { - // wait for it to complete - <-srw.complete.Done() - p.reverseProxy.ServeHTTP(srw, r) - } else { - if p.reverseProxy != nil { - p.reverseProxy.ServeHTTP(w, r) - } else { - http.Error(w, fmt.Sprintf("No reverse proxy available for %s", p.ID), http.StatusInternalServerError) - } + if p.reverseProxy == nil { + http.Error(w, fmt.Sprintf("No reverse proxy available for %s", p.ID), http.StatusInternalServerError) + return + } + + if srw != nil { + // wait for it to complete + <-srw.complete.Done() + p.reverseProxy.ServeHTTP(srw, r) + } else { + p.reverseProxy.ServeHTTP(w, r) }
781-789: Consider defining SSE message structures at package level.The inline struct definitions for the SSE message format could be moved to package level for better reusability and discoverability.
Define these at package level (after the loadingRemarks variable):
type sseMessageDelta struct { ReasoningContent string `json:"reasoning_content"` } type sseMessageChoice struct { Delta sseMessageDelta `json:"delta"` } type sseMessage struct { Choices []sseMessageChoice `json:"choices"` }Then simplify the sendData method:
func (s *statusResponseWriter) sendData(data string) { - // Create the proper SSE JSON structure - type Delta struct { - ReasoningContent string `json:"reasoning_content"` - } - type Choice struct { - Delta Delta `json:"delta"` - } - type SSEMessage struct { - Choices []Choice `json:"choices"` - } - - msg := SSEMessage{ - Choices: []Choice{ + msg := sseMessage{ + Choices: []sseMessageChoice{ { - Delta: Delta{ + Delta: sseMessageDelta{ ReasoningContent: data, }, }, }, }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
config.example.yaml(2 hunks)proxy/config/config.go(2 hunks)proxy/config/config_posix_test.go(2 hunks)proxy/config/config_windows_test.go(1 hunks)proxy/config/model_config.go(1 hunks)proxy/config/model_config_test.go(1 hunks)proxy/process.go(4 hunks)proxy/proxymanager.go(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go
📄 CodeRabbit inference engine (CLAUDE.md)
Fix all staticcheck-reported issues in Go code
Files:
proxy/config/model_config.goproxy/config/config_windows_test.goproxy/config/config_posix_test.goproxy/proxymanager.goproxy/process.goproxy/config/config.goproxy/config/model_config_test.go
🧬 Code graph analysis (3)
proxy/config/config_windows_test.go (2)
proxy/config/config.go (2)
Config(112-135)MacroList(24-24)proxy/config/model_config.go (1)
ModelConfig(10-41)
proxy/config/config_posix_test.go (1)
proxy/config/model_config.go (1)
ModelConfig(10-41)
proxy/config/model_config_test.go (1)
proxy/config/config.go (1)
LoadConfigFromReader(164-402)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: run-tests
- GitHub Check: run-tests
🔇 Additional comments (11)
proxy/config/config.go (1)
133-135: Config flag addition looks goodField placement and YAML tag are consistent with existing config style.
proxy/config/model_config_test.go (1)
55-74: Good coverage of global vs per‑model defaultingTest asserts both override (false) and inherited (true) cases; aligns with config loader behavior.
config.example.yaml (1)
195-198: Per‑model override example looks goodExample override is clear and consistent with loader behavior.
proxy/config/config_windows_test.go (2)
155-156: Non‑nil per‑model pointer setup in expectations is appropriateMatches loader’s behavior of setting per‑model pointers.
163-201: Windows expectations updated correctlyTop‑level and per‑model SendLoadingState expectations align with new config behavior.
proxy/config/config_posix_test.go (2)
163-164: Non‑nil per‑model pointer setup in expectations is appropriateConsistent with loader defaulting approach.
176-211: POSIX expectations updated correctlyTop‑level and per‑model SendLoadingState expectations align with new behavior.
proxy/proxymanager.go (1)
560-565: The suggested fix is incomplete and cannot be applied as written.The review references
ctxKeyStreamingandctxKeyModelwhich do not exist in the codebase. Additionally, the diff only shows updating thecontext.WithValue()calls inproxy/proxymanager.go(lines 562-563), but misses the correspondingcontext.Value()retrieval inproxy/process.go:502that reads the streaming value.If this diff is applied without:
- First defining
ctxKeyStreamingandctxKeyModelstruct-type constants, and- Updating the retrieval call at
proxy/process.go:502to use the new keysthe code will fail at runtime because the keys won't match between where values are stored and where they're retrieved.
proxy/process.go (3)
5-5: LGTM!The new imports are appropriate:
encoding/jsonfor SSE message serialization andmath/randfor non-cryptographic randomness in loading message selection and timing.Also applies to: 8-8
637-698: LGTM!The loading remarks provide good user experience during model loading with appropriate humor and variety.
814-836: LGTM!The helper methods properly delegate to the underlying writer with appropriate safeguards:
WriteHeaderprevents duplicate writes with thehasWrittenflagFlushsafely checks if the underlying writer supports flushing
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (4)
proxy/process.go (4)
715-732: Pass realModelName from request context to display correct model name.Line 730 displays
p.IDbut should show the user-facing model name stored in the request context atproxyCtxKey("model")(consistent with the streaming flag extraction pattern on line 502).The past review comment suggested modifying
newStatusResponseWriterto acceptrealModelNameor the request context. Apply a similar fix here:-func newStatusResponseWriter(p *Process, w http.ResponseWriter) *statusResponseWriter { +func newStatusResponseWriter(p *Process, w http.ResponseWriter, modelName string) *statusResponseWriter { c, cx := context.WithCancel(context.Background()) s := &statusResponseWriter{ writer: w, process: p, complete: c, cancel: cx, } s.start = time.Now() s.Header().Set("Content-Type", "text/event-stream") // SSE s.Header().Set("Cache-Control", "no-cache") // no-cache s.Header().Set("Connection", "keep-alive") // keep-alive s.WriteHeader(http.StatusOK) // send status code 200 s.sendLine("━━━━━") - s.sendLine(fmt.Sprintf("llama-swap loading model: %s", p.ID)) + s.sendLine(fmt.Sprintf("llama-swap loading model: %s", modelName)) return s }Then update the caller on line 504:
+ realModelName, _ := r.Context().Value(proxyCtxKey("model")).(string) isStreaming, _ := r.Context().Value(proxyCtxKey("streaming")).(bool) if p.config.SendLoadingState != nil && *p.config.SendLoadingState && isStreaming { - srw = newStatusResponseWriter(p, w) + srw = newStatusResponseWriter(p, w, realModelName) go srw.statusUpdates(swapCtx)
752-754: Correct the random duration range.The comment states "5-10 seconds" but
2+rand.Intn(4)produces 2-5 seconds.Apply this diff:
// Pick a random duration between 5-10 seconds for when to send a remark - nextRemarkIn := time.Duration(2+rand.Intn(4)) * time.Second + nextRemarkIn := time.Duration(5+rand.Intn(6)) * time.Second
773-774: Correct the random duration range.The comment says "5-10 seconds" but
5+rand.Intn(10)produces 5-14 seconds.Apply this diff:
// Pick a new random duration for the next remark - nextRemarkIn = time.Duration(5+rand.Intn(10)) * time.Second + nextRemarkIn = time.Duration(5+rand.Intn(6)) * time.Second
815-818: Check errors from Write calls.The three consecutive
Writecalls don't check for errors, which could mask write failures and result in incomplete SSE messages being sent to clients.Apply this diff:
// Write SSE formatted data - s.writer.Write([]byte("data: ")) - s.writer.Write(jsonData) - s.writer.Write([]byte("\n\n")) + if _, err := s.writer.Write([]byte("data: ")); err != nil { + s.process.proxyLogger.Errorf("<%s> Failed to write SSE prefix: %v", s.process.ID, err) + return + } + if _, err := s.writer.Write(jsonData); err != nil { + s.process.proxyLogger.Errorf("<%s> Failed to write SSE data: %v", s.process.ID, err) + return + } + if _, err := s.writer.Write([]byte("\n\n")); err != nil { + s.process.proxyLogger.Errorf("<%s> Failed to write SSE suffix: %v", s.process.ID, err) + return + } s.Flush()
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
proxy/process.go(4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go
📄 CodeRabbit inference engine (CLAUDE.md)
Fix all staticcheck-reported issues in Go code
Files:
proxy/process.go
🧬 Code graph analysis (1)
proxy/process.go (3)
proxy/proxymanager_api.go (1)
pm(101-198)proxy/proxymanager_test.go (2)
TestProxyManager_StreamingEndpointsReturnNoBufferingHeader(1013-1060)TestProxyManager_ProxiedStreamingEndpointReturnsNoBufferingHeader(1062-1085)proxy/metrics_monitor_test.go (1)
TestMetricsMonitor_StreamingResponse(566-651)
🔇 Additional comments (6)
proxy/process.go (6)
5-5: LGTM: Appropriate imports for streaming functionality.The
encoding/jsonandmath/randimports are correctly added to support SSE message formatting and randomized loading remarks.Also applies to: 8-8
493-509: LGTM: Context lifecycle properly managed.The context handling correctly ensures
cancelLoadCtxis called in all paths (error at line 513, or after process start at line 526). The goroutine receives cancellation signals viaswapCtx.
511-521: LGTM: Streaming-aware error handling.The error handling correctly distinguishes between streaming (line 516, sends error via SSE) and non-streaming (line 518, HTTP error) responses. Context is properly cancelled on line 513.
643-704: LGTM: Entertaining loading messages enhance UX.The loading remarks provide user feedback during potentially long model load times, addressing the "dead air" concern from issue #366.
756-757: Excellent: Ticker resource leak fixed!The
defer ticker.Stop()on line 757 correctly addresses the resource leak flagged in the previous review.
821-843: LGTM: Interface methods correctly delegate to underlying writer.The
WriteHeaderguard (line 830) prevents double writes, and theFlushmethod properly checks for thehttp.Flusherinterface. These methods correctly implement thehttp.ResponseWriterinterface pattern.
Swapping models can take a long time and leave a lot of silence while the model is loading. Rather than silently load the model in the background, this PR allows llama-swap to send status updates in the reasoning_content of a streaming chat response. Fixes: mostlygeek#366
Swapping models can take a long time and leave a lot of silence while the model is loading. Rather than silently load the model in the background, this PR allows llama-swap to send status updates in the reasoning_content of a streaming chat response. Fixes: mostlygeek#366
Swapping models can take a long time and leave a lot of silence while the model is loading. Rather than silently load the model in the background, this PR allows llama-swap to send status updates in the
reasoning_contentof a streaming chat response.Fixes #366
Summary by CodeRabbit
New Features
Tests