Conversation
WalkthroughAdds sleep/wake model lifecycle: config schema and defaults for global/per-endpoint timeouts, HTTPEndpoint and SleepMode in model config, process states/transitions (sleepPending, asleep, waking) with sleep/wake request sequencing and HTTP helpers, ProcessGroup/API/UI endpoints and actions, tests, docs, example responder, and a CI step conditional change. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant PM as ProxyManager
participant PG as ProcessGroup
participant P as Process
participant Svc as ModelService
Client->>PM: POST /api/models/sleep/{model}
PM->>PG: SleepProcess(modelID)
activate PG
PG->>P: isSleepEnabled()
activate P
P-->>PG: supported
deactivate P
PG->>P: Sleep()
activate P
P->>P: state = sleepPending
loop each sleepEndpoint (in order)
P->>Svc: HTTP <method> <endpoint> (body, timeout)
Svc-->>P: 200 / error
end
P->>P: state = asleep
deactivate P
PG-->>PM: result
deactivate PG
PM-->>Client: 200 OK / error
note over PM,P: Later, request arrives for model
Client->>PM: request
PM->>PG: ensure model ready (makeReady)
PG->>P: MakeIdle()/wake()
activate P
P->>P: state = waking
loop each wakeEndpoint (in order)
P->>Svc: HTTP <method> <endpoint> (body, timeout)
Svc-->>P: 200 / error
end
P->>P: state = ready
deactivate P
PM-->>Client: proceed with request
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45–60 minutes Areas to focus review on:
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
proxy/process.go (1)
718-721: Proxy gating for sleep states is mostly correct; good thatSleepPendingis rejectedIn
ProxyRequest, blocking new traffic when the state isStateSleepPending(while allowing requests to triggermakeReady()fromasleep/waking) matches the intended semantics and avoids trying to wake a process that is mid‑sleep. Once theSleepPendinghandling inmakeReady/wakeis tightened as suggested earlier, the overall proxy behavior around sleep/wake transitions should be solid.Also applies to: 756-772
🧹 Nitpick comments (15)
cmd/simple-responder/simple-responder.go (1)
11-11: Sleep/wake test endpoints look good; consider minimal error handling in/collective_rpc.The added handlers match the vLLM sleep/wake examples and give nice coverage hooks. One small improvement: in
/collective_rpcyou ignore theio.ReadAllerror, which will treat any read failure as “no reload_weights” and return 400. Even in a test helper, a quick error check would make behavior clearer:- r.POST("/collective_rpc", func(c *gin.Context) { - body, _ := io.ReadAll(c.Request.Body) - if strings.Contains(string(body), "reload_weights") { - c.Status(http.StatusOK) - } else { - c.Status(http.StatusBadRequest) - } - }) + r.POST("/collective_rpc", func(c *gin.Context) { + body, err := io.ReadAll(c.Request.Body) + if err != nil { + c.Status(http.StatusInternalServerError) + return + } + if strings.Contains(string(body), "reload_weights") { + c.Status(http.StatusOK) + } else { + c.Status(http.StatusBadRequest) + } + })Also applies to: 268-292
config-schema.json (1)
51-62: Schema additions align with implementation and examples; maybe call out macro support.The new global
sleepRequestTimeout/wakeRequestTimeoutfields and thesleepEndpoints/wakeEndpointsitem schemas look consistent with the Go config defaults andconfig.example.yamlsemantics (min ≥1 globally, per-endpoint timeout 0 → inherit global).additionalProperties: falseon endpoint objects is a nice guardrail.If you want to make the schema self‑documenting wrt macros, you could extend the
endpoint/bodydescriptions to mention that macro substitution (e.g.${PORT},${MODEL_ID}) is allowed here, matching the comments inconfig.example.yaml. Otherwise this looks ready to go.Also applies to: 230-297
proxy/config/config_posix_test.go (1)
215-218: POSIX config expectations updated correctly for sleep/wake defaults.The added
SleepRequestTimeout: 10andWakeRequestTimeout: 10fields in the expectedConfigmirror the new defaults and keep the snapshot test honest alongsideHealthCheckTimeoutandMetricsMaxInMemory.If you want even tighter coverage, you could also assert these two fields in
TestConfig_DefaultValuesPosix, but the current expectations here are already a solid safeguard.config.example.yaml (1)
24-35: Example config for sleep/wake and vLLM modes is clear and matches the code.The new
sleepRequestTimeout/wakeRequestTimeoutcomments and values, plus thevllm-sleep-level1andvllm-sleep-level2model examples, do a nice job of documenting:
- Global vs per-endpoint timeout behavior.
- Level 1 vs Level 2 semantics.
- Multi-step wake flows (
/wake_up→/collective_rpcwithreload_weights→/reset_prefix_cache).Everything lines up with the schema and the simple-responder endpoints. If you want to avoid any ambiguity, you might tweak the comments to say “requests are sent to the model’s
proxybase URL + endpoint”, but that’s purely a clarity nit.Also applies to: 258-335
proxy/config/config_test.go (1)
764-808: Sleep/wake config tests are comprehensive and well-aligned with config semanticsThese tests nicely cover global vs per-endpoint timeouts, macro substitution (including MODEL_ID/PORT), multi-step wake sequences, unknown-macro errors, defaults, and model-level macro precedence for sleep/wake endpoints. As an optional enhancement, you could add a couple of negative tests that exercise the new validation paths in
ModelConfig.UnmarshalYAML(e.g., missing wakeEndpoints when sleepEndpoints are set, invalid HTTP method, or negative timeout) to pin those error messages too.Also applies to: 805-836, 838-871, 873-923, 925-945, 947-970, 972-1008
ui/src/contexts/APIProvider.tsx (1)
4-14: UI sleep model wiring matches backend API and state modelExtending
ModelStatus, addingsleepEnabled, and wiringsleepModelinto the context all look consistent with the new/api/models/sleep/*modelendpoint and the process state strings. Error handling and memoization follow existing patterns. If you ever introduce model IDs with URL-reserved characters (beyond/), you may want to considerencodeURIComponenthere, but for the current*modelwildcard usage this is fine.Also applies to: 16-29, 256-269, 270-286
proxy/process_test.go (2)
241-275: Sleep/wake state transitions in swapState table look coherentThe added cases for
StateSleepPending,StateAsleep, andStateWaking(including a couple of invalid transitions) align with a sensible lifecycle: Ready → SleepPending → Asleep → Waking → Ready, with exits to Stopping/Stopped where appropriate. As an optional follow‑up, you could add explicit negative cases like SleepPending→Ready or Asleep→SleepPending if those are intentionally forbidden, to lock the state machine down further.
581-703: New sleep/wake process tests provide good coverage; just confirm Sleep semanticsThe new tests cover the main scenarios well: a basic sleep/wake cycle, multi-step wake sequences, preferring Sleep over Stop in
MakeIdle, and falling back tostart()when wake fails. One thing to double‑check is thatProcess.Sleep()is synchronous up toStateAsleep; if it ever becomes asynchronous (e.g., settingStateSleepPendingand completing in a goroutine), these tests may become flaky and would benefit from a small helper that waits/polls until the target state (or timeout) instead of asserting immediately.proxy/proxymanager_api.go (2)
15-22: Model API additions line up with UI expectations; consider a config-based fallback for SleepEnabledExtending
ModelwithSleepEnabledand mapping the new sleep-related states (StateSleepPending,StateAsleep,StateWaking) to"sleepPending" | "asleep" | "waking"keeps the API nicely in sync with the UI’sModelStatusunion andsleepEnabledflag. Right nowsleepEnabledis only set when a non‑nil process exists; if there’s any path where a model’s process isn’t instantiated yet, you could optionally fall back to the config’s sleep/wake endpoints to still advertise the capability:- if processGroup != nil { - process := processGroup.processes[modelID] - if process != nil { - var stateStr string + if processGroup != nil { + process := processGroup.processes[modelID] + if process != nil { + var stateStr string switch process.CurrentState() { @@ - } - state = stateStr - sleepEnabled = process.isSleepEnabled() - } + } + state = stateStr + sleepEnabled = process.isSleepEnabled() + } else { + cfg := pm.config.Models[modelID] + sleepEnabled = len(cfg.SleepEndpoints) > 0 && len(cfg.WakeEndpoints) > 0 + } + } else { + cfg := pm.config.Models[modelID] + sleepEnabled = len(cfg.SleepEndpoints) > 0 && len(cfg.WakeEndpoints) > 0 }Purely optional, but it would make
sleepEnabledreflect configuration even before a process is created.Also applies to: 24-34, 52-84, 86-93
243-263: Sleep handler mirrors unload handler and cleanly delegates to SleepProcess
apiSleepSingleModelHandlerfollows the same pattern asapiUnloadSingleModelHandler: resolve aliases, find the process group, delegate toSleepProcess, and surface errors viasendErrorResponse. That keeps the API consistent and centralizes the actual sleep logic in the process group. If you later distinguish between “sleep not configured” vs. internal failures inSleepProcess, you might consider returning a 400/409 for the former instead of 500, but the current behavior is reasonable and consistent with the unload path.proxy/config/model_config.go (1)
11-17: Sleep/wake endpoint modeling and validation look solidDefining
HTTPEndpointand addingSleepEndpoints/WakeEndpointstoModelConfig, withUnmarshalYAMLenforcing “if one is set, both must be set” andvalidateEndpointhandling required endpoint, method normalization/whitelisting, and non‑negative timeouts, gives you a clean, early-validation story for sleep/wake config. This lines up well with the new tests around timeouts and macro substitution. As an optional future tweak, you could extendvalidMethodsif you ever need verbs likeDELETEorHEAD, or add JSON validation forBodyif you want to catch malformed payloads at config load time, but for the current use cases this is more than adequate.Also applies to: 19-33, 56-110, 111-135
docs/configuration.md (2)
75-85: Minor wording nit: hyphenate “event-driven”In the features table row for
hooks, consider changing “event driven functionality” to “event‑driven functionality” for grammatical correctness and to silence the linter.
124-135: Sleep/wake configuration docs align with implementation; maybe call out per-endpoint timeout inheritance more explicitlyThe descriptions for
sleepRequestTimeout/wakeRequestTimeoutand the vLLMsleepEndpoints/wakeEndpointsexamples correctly match the implementation: timeouts are in seconds, default to 10 at the config level, and per‑endpointtimeout: 0inherits the respective global timeout. The macro usage (${PORT},${MODEL_ID}) and sequential execution notes are also accurate. If you want to reduce surprises, you could add a short note that negative per‑endpoint timeouts are not validated and effectively behave as “no timeout”, so users should stick to0or positive values.Also applies to: 321-398
proxy/process.go (1)
463-505: HTTP sleep/wake helpers are sound; you may want stricter timeout validationThe
sendSleepRequests/sendWakeRequestssequencing,buildFullURLpath resolution againstconfig.Proxy, and the sharedsendHTTPRequesthelper all look correct and match the config semantics (per‑endpointTimeoutin seconds, dial timeout viahttpDialTimeout, and a hard requirement on HTTP 200). BecauseLoadConfigFromReaderonly normalizes the global sleep/wake timeouts and later fills in per‑endpointTimeoutwhen it is exactly0, an explicitly negative per‑endpoint timeout will pass through and behave as “no client timeout”. If you want to keep the configuration surface strictly in “seconds with default inheritance”, you could clamp endpoint timeouts to>= 1as well and/or reject negative values during config load.Also applies to: 507-511, 628-691, 693-704
proxy/config/config.go (1)
256-313: Macro handling for sleep/wake endpoints is well-integrated; consider tightening validation for Body contentThe flow for endpoints looks good:
- User and model macros are applied to
SleepEndpoints/WakeEndpoints(Endpoint + Body) alongside other string fields.${PORT}is substituted only whencmdorproxyrequires it, and then also propagated into the endpoint arrays.validateEndpointMacrosensures no unresolved macros (including reservedPORT/MODEL_ID) remain in Endpoint/Body.- Finally, per-endpoint
Timeout == 0is filled from the global sleep/wake timeout values.Two small considerations:
validateEndpointMacroswill treat any${...}sequence in the JSONbodyas an error, even if the intention is to send that literal string downstream. That’s consistent with how macros work elsewhere, but may be worth calling out in docs to avoid surprises.As with process.go, endpoint
Timeoutvalues that are explicitly negative will skip the== 0defaulting and end up as “no timeout” in the HTTP client. If you’d like stricter config guarantees, you could clamp or reject negative endpoint timeouts during load.Also applies to: 314-355, 381-387, 410-421, 625-640
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (18)
.github/workflows/go-ci-windows.yml(0 hunks)README.md(1 hunks)cmd/simple-responder/simple-responder.go(2 hunks)config-schema.json(2 hunks)config.example.yaml(2 hunks)docs/configuration.md(3 hunks)proxy/config/config.go(8 hunks)proxy/config/config_posix_test.go(1 hunks)proxy/config/config_test.go(1 hunks)proxy/config/config_windows_test.go(1 hunks)proxy/config/model_config.go(3 hunks)proxy/process.go(15 hunks)proxy/process_test.go(3 hunks)proxy/processgroup.go(2 hunks)proxy/proxymanager_api.go(5 hunks)ui/src/contexts/APIProvider.tsx(4 hunks)ui/src/index.css(3 hunks)ui/src/pages/Models.tsx(3 hunks)
💤 Files with no reviewable changes (1)
- .github/workflows/go-ci-windows.yml
🧰 Additional context used
🪛 LanguageTool
docs/configuration.md
[uncategorized] ~81-~81: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...me | | hooks | event driven functionality | | `...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
🔇 Additional comments (10)
README.md (1)
32-39: Docs update clearly exposes the new sleep API and behavior.The
/models/sleep/:model_idbullet and the “Fast model switching with sleep/wake support” feature line accurately reflect the new lifecycle and configuration; no changes needed.proxy/config/config_windows_test.go (1)
204-207: Windows config snapshot now includes sleep/wake timeouts, matching POSIX.The expected
Configfor Windows correctly addsSleepRequestTimeout: 10andWakeRequestTimeout: 10next toHealthCheckTimeoutandMetricsMaxInMemory, keeping platform-specific tests in sync with the new defaults.ui/src/index.css (1)
136-138: New status and button utility classes are consistent with existing design system.The added
.status-badgeutility, sleep/wake/shutdown status variants, and.btn--actionmin‑width all follow the existing naming and token usage; they should drop into the models UI cleanly. Just make sure consumers combine.statuswith.status-badgewhere you expect the min width.Also applies to: 153-165, 179-181
proxy/processgroup.go (2)
68-70: UsingMakeIdle()on the previous process fits the new sleep-friendly swap behavior.Switching from hard-stopping the prior model to calling
lastProcess.MakeIdle()whenswapis enabled lets the underlyingProcessdecide whether to sleep, stop, or no-op based on its config/state, which aligns with the new lifecycle. Given thatprocessesis populated once inNewProcessGroupand entries aren’t removed, indexingpg.processes[pg.lastUsedProcess]under the lock is safe.
115-133:SleepProcessis simple and correctly synchronized; behavior matches the public API.The new
SleepProcessmethod:
- Guards access to
pg.processeswith the group lock.- Cleanly errors for unknown models and for models without sleep support (
isSleepEnabled()).- Calls
process.Sleep()outside the lock, avoiding holding the group mutex over a potentially long network operation.This matches the
/models/sleep/:model_idAPI semantics and integrates without introducing obvious races withProxyRequest/StopProcesses. No changes needed here.proxy/process_test.go (1)
91-112: Updated error expectation correctly reflects makeReady pathSwitching the expectation from
"unable to start process"to"unable to makeReady process"keeps this test in sync with the new startup/makeReady semantics while still verifying the error is surfaced as a 502.ui/src/pages/Models.tsx (1)
40-40: Sleep/wake actions wiring looks consistent; confirm wake semantics forloadModelThe conditional actions per
model.stateandmodel.sleepEnabledare coherent, and the disabled button fallback for other states is a good guard. One thing to double-check is thatloadModel(model.id)is indeed the intended wake path fromasleep(i.e., that the backend “load” endpoint now calls the unified make‑ready logic and correctly handles both stopped → ready and asleep → ready). If a distinct wake endpoint exists, it may be clearer to route the “Wake” button through that instead.Also applies to: 149-151, 167-202
proxy/process.go (2)
27-43: Lifecycle and state machine extensions look coherentThe introduction of
StateSleepPending,StateAsleep, andStateWakingplus the per‑transition waitgroups (waitStarting,waitSleeping,waitWaking) is wired correctly:swapStateonly bumps the waitgroup when the CAS succeeds, and each ofstart,Sleep, andwakepairs that with a singledefer ...Done(), so there’s no underflow or leak. The expandedisValidTransitionrules are consistent with the intended graph, andMakeIdle()sensibly prefersSleep()when endpoints are configured, falling back toStop()otherwise. The TTL monitor also correctly treats sleeping/waking states as “active” for the purposes ofUnloadAfter, which matches the idea that a sleeping model is still eligible for eventual full unload.Also applies to: 80-88, 170-203, 205-223, 252-260, 388-418, 420-462
344-347: Health check refactor viabuildFullURL/checkHealthEndpointis an improvementSwitching the health‑check path to go through
buildFullURLandcheckHealthEndpoint(which reuses the same HTTP helper as sleep/wake) removes ad‑hoc URL concatenation and centralizes timeout behavior. The logging still uses the precomputedhealthURL, and early errors on invalid proxy URL or endpoint parsing are surfaced cleanly.Also applies to: 364-377
proxy/config/config.go (1)
112-141: Global sleep/wake timeout defaults and validation are consistentAdding
SleepRequestTimeoutandWakeRequestTimeouttoConfig, defaulting both to 10 and clamping them to a minimum of 1 second inLoadConfigFromReader, is a clean extension of the existing health check timeout pattern. This matches the documentation and ensures that, even if YAML omits or sets these fields to 0, the HTTP clients for sleep/wake calls won’t run with an accidental zero timeout.Also applies to: 170-185, 191-205
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
ui/src/pages/Models.tsx (1)
200-202: Status badge ties neatly into extended state setRendering state via
status status--${model.state}gives a unified hook to style the new sleep-related states (sleepPending,asleep,waking) alongside the existing ones. Just ensure corresponding CSS exists for the new modifiers; otherwise they’ll still degrade gracefully as plain text.proxy/process.go (1)
647-691: Consider validating JSON body before sending HTTP requests.When
endpoint.Bodyis provided,sendHTTPRequestsets theContent-Typetoapplication/jsonbut doesn't validate that the body is well-formed JSON. If configuration contains invalid JSON or unexpanded macros, the request will fail at the inference server with potentially cryptic errors.Consider adding basic JSON validation:
var bodyReader io.Reader if endpoint.Body != "" { + // Validate it's valid JSON + if !json.Valid([]byte(endpoint.Body)) { + return fmt.Errorf("invalid JSON in request body: %s", endpoint.Body) + } bodyReader = strings.NewReader(endpoint.Body) }This provides earlier, clearer error messages during sleep/wake operations.
config-schema.json (2)
270-303: Schema description states "required" but doesn't enforce it programmatically.Line 302 describes
wakeEndpointsas "Required when sleepMode is 'enable'", but the JSON schema doesn't enforce this constraint. The field is not in therequiredarray and defaults to an empty array[].This means:
- Users could enable sleep mode without providing wake endpoints
- Validation must happen elsewhere (likely in Go code)
- Users won't get immediate schema validation feedback
Consider either:
- Option A: Add conditional schema validation using
if/thento enforce the requirement:"if": { "properties": { "sleepMode": { "const": "enable" } } }, "then": { "required": ["wakeEndpoints", "sleepEndpoints"], "properties": { "wakeEndpoints": { "minItems": 1 }, "sleepEndpoints": { "minItems": 1 } } }- Option B: Clarify the description to state: "Required when sleepMode is 'enable' (validated at runtime, not by schema)."
This applies to both
wakeEndpointsandsleepEndpoints(lines 236-269, 270-303).
242-245: Clarify "URL path" terminology in endpoint descriptions.The descriptions use "URL path" but the examples include query parameters (e.g.,
/sleep?level=1). Technically, query parameters are part of the URL but not the path component.Consider updating the descriptions to be more precise:
- Line 245: Change "URL path for the sleep endpoint" → "URL endpoint for sleep"
- Line 279: Change "URL path for the wake endpoint" → "URL endpoint for wake"
Or explicitly state: "URL path for the endpoint (may include query parameters)."
This helps users understand that query strings, fragments, and other URL components are supported.
Also applies to: 276-279
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
config-schema.json(2 hunks)config.example.yaml(2 hunks)docs/configuration.md(3 hunks)proxy/config/config_posix_test.go(4 hunks)proxy/config/model_config.go(4 hunks)proxy/process.go(15 hunks)proxy/process_test.go(3 hunks)proxy/proxymanager_api.go(5 hunks)ui/src/contexts/APIProvider.tsx(4 hunks)ui/src/pages/Models.tsx(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- proxy/config/config_posix_test.go
🧰 Additional context used
🧬 Code graph analysis (4)
proxy/proxymanager_api.go (2)
proxy/config/model_config.go (1)
SleepMode(20-20)proxy/process.go (3)
StateSleepPending(37-37)StateAsleep(38-38)StateWaking(39-39)
ui/src/pages/Models.tsx (1)
ui/src/contexts/APIProvider.tsx (1)
useAPI(291-297)
proxy/process.go (1)
proxy/config/model_config.go (3)
SleepMode(20-20)SleepModeEnable(23-23)HTTPEndpoint(12-17)
proxy/process_test.go (2)
proxy/process.go (8)
StateReady(30-30)StateSleepPending(37-37)StateAsleep(38-38)StateStopping(31-31)StateStopped(28-28)StateWaking(39-39)ErrInvalidStateTransition(167-167)NewProcess(99-143)proxy/config/model_config.go (3)
SleepMode(20-20)SleepModeEnable(23-23)HTTPEndpoint(12-17)
🪛 LanguageTool
docs/configuration.md
[uncategorized] ~81-~81: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...me | | hooks | event driven functionality | | `...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
🔇 Additional comments (15)
config.example.yaml (2)
24-35: Global sleep/wake timeout docs look consistent with implementationThe global
sleepRequestTimeout/wakeRequestTimeoutoptions and their described defaults/override semantics line up with the HTTPEndpointTimeoutfield and the intended “0 = use global” behavior. Nothing to change here.
258-344: vLLM sleep-mode examples are well-scoped and match config schemaThe level 1 and level 2
vllm-sleep-*examples correctly usesleepMode: enableand thesleepEndpoints/wakeEndpointsstructure (endpoint, method, optional body/timeout). They also reflect the intended multi-step wake flow for level 2. Good, self-contained examples.proxy/proxymanager_api.go (2)
15-22: Model status + SleepMode exposure align with UI expectationsSurfacing
SleepModeon the APIModeltype and mapping the new process states (sleepPending,asleep,waking) to string values is consistent with the TSModelStatusunion and UI logic. The caststring(pm.config.Models[modelID].SleepMode)will yield"enable"/"disable"as expected for the React checks.Also applies to: 52-81, 84-91
24-35: New /api/models/sleep endpoint mirrors unload semantics cleanlyThe
apiSleepSingleModelHandlercorrectly resolves aliases viaRealModelName, looks up the process group, callsSleepProcess, and returns appropriate HTTP status codes. RoutingPOST /models/sleep/*modelthrough this handler matches the UI’s/api/models/sleep/${model}usage and keeps behavior parallel to the unload handler.Also applies to: 241-261
docs/configuration.md (2)
124-135: Global sleep/wake timeout documentation matches config behaviorThe descriptions for
sleepRequestTimeoutandwakeRequestTimeout(defaults, global scope, and per-endpoint overrides) are consistent with the configuration fields and HTTPEndpointTimeoutsemantics. No changes needed.
321-407: vLLM sleep-mode docs aligned with example config and process behaviorThe level 1 and level 2 vLLM sleep-mode examples here mirror
config.example.yamland accurately describe the single-step vs multi-step wake flows. The notes about requiredsleepMode: enable, endpoint arrays, and optional per-endpointtimeoutare precise and helpful.ui/src/pages/Models.tsx (2)
40-46: sleepModel wiring from API context is straightforwardIncluding
sleepModelalongside other model actions fromuseAPI()keeps the component interface consistent with existing load/unload operations. No issues here.
149-151: Action matrix for stopped/ready/asleep states is coherent and guardedThe conditional rendering in the Actions column (Load for
stopped, Wake+Unload forasleepvialoadModel/unloadSingleModel, Sleep+Unload forreadywhensleepMode === "enable", and a disabled state button otherwise) matches the backend state machine and avoids issuing conflicting commands while in transitional states. The layout (w-40+min-w-36flex container) should keep multiple buttons readable.Also applies to: 167-202
ui/src/contexts/APIProvider.tsx (2)
4-14: Types extended correctly for new lifecycle and SleepModeThe
ModelStatusunion now covers all server states, includingsleepPending,asleep, andwaking, plusunknown. AddingsleepMode: stringtoModeland the correspondingsleepModelentry toAPIProviderTypekeeps the public context surface in sync with the backendModelJSON.Also applies to: 16-29
256-269: sleepModel implementation follows existing fetch/error pattern
sleepModelreuses the established fetch pattern (POST, status check, throwing on error) and is correctly included in the memoized context value and dependency list. SSE model-status events will refresh UI state after a sleep request, so no additional local state juggling is necessary.Also applies to: 270-286
proxy/process_test.go (3)
100-112: Updated error expectation matches new startup pathChanging the assertion to look for
"unable to makeReady process"keepsTestProcess_BrokenModelConfigaligned with the new error wording from the startup/makeReady path, while still verifying that a broken command surfaces a clear 502 with details on the second attempt.
241-275: swapState table now properly exercises sleep/wake transitionsThe additional cases for
Ready ↔ SleepPending ↔ Asleep ↔ Waking(including invalid direct Ready↔Asleep transitions) give good coverage of the extended state machine and ensureErrInvalidStateTransitionandErrExpectedStateMismatchare enforced consistently for the new states.
581-707: New sleep/wake tests cover core flows and fallback behaviorThe added tests (basic sleep/wake, multi-step wake sequence, using Sleep from
MakeIdle, and wake failure falling back tostart()) collectively exercise the main control paths around SleepMode and HTTPEndpoint sequences. They mirror how configuration is mutated in other tests and should catch regressions in the state transitions or endpoint sequencing logic.proxy/config/model_config.go (2)
5-9: SleepMode and HTTPEndpoint integration into ModelConfig is coherentIntroducing
HTTPEndpointandSleepModeand wiring them intoModelConfigwithSleepModeDisableas the default gives a clear, explicit toggle for sleep/wake behavior. The defaults inUnmarshalYAMLavoid surprising implicit activation, and struct tags (sleepMode,sleepEndpoints,wakeEndpoints) match the YAML examples and docs.Also applies to: 11-25, 27-46, 69-85
98-130: Endpoint and SleepMode validation are strict and user-friendlyThe
UnmarshalYAMLvalidation:
- Ensures
sleepModeis only"enable"or"disable".- Requires both
sleepEndpointsandwakeEndpointswhen enabled.- Applies
validateEndpointto every endpoint, which:
- Requires a non-empty path.
- Defaults method to POST and normalizes to uppercase, allowing only GET/POST/PUT/PATCH.
- Enforces non-negative timeouts (allowing
0to mean “use global timeout”).This should catch most misconfigurations early with precise error messages (including array indices). Nicely done.
Also applies to: 132-155
Add Sleep/Wake Support for Fast Model Switching
Summary
Adds sleep/wake functionality for fast model switching. Instead of killing and restarting processes, models can be put to sleep and woken up on demand, dramatically reducing swap times.
Designed for vLLM's sleep mode (blog post), but will work with any future inference servers that implement HTTP-based sleep/wake endpoints.
What Changed
State Machine:
StateSleepPending,StateAsleep,StateWakingConfiguration:
Endpoints are called sequentially. Per-endpoint timeouts override global defaults.
Features:
sleepEndpoints/wakeEndpointsconfigured)POST /models/sleep/:model_idSummary by CodeRabbit
New Features
Documentation
Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.