Sleep mode by napmany · Pull Request #4 · napmany/llmsnap

napmany · 2025-11-19T12:37:18Z

Add Sleep/Wake Support for Fast Model Switching

Summary

Adds sleep/wake functionality for fast model switching. Instead of killing and restarting processes, models can be put to sleep and woken up on demand, dramatically reducing swap times.

Designed for vLLM's sleep mode (blog post), but will work with any future inference servers that implement HTTP-based sleep/wake endpoints.

What Changed

State Machine:

Added 3 new states: StateSleepPending, StateAsleep, StateWaking
Automatic sleep during model swapping when configured
Falls back to traditional stop/start if sleep fails

Configuration:

# Global defaults
sleepRequestTimeout: 20
wakeRequestTimeout: 20

models:
  vllm-model:
    sleepMode: enable

    sleepEndpoints:
      - endpoint: /sleep?level=2
        method: POST
        timeout: 15  # optional override

    # Level 2 requires multi-step wake sequence
    wakeEndpoints:
      - endpoint: /wake_up
        method: POST
      - endpoint: /collective_rpc
        method: POST
        body: '{"method": "reload_weights"}'
      - endpoint: /reset_prefix_cache
        method: POST

Endpoints are called sequentially. Per-endpoint timeouts override global defaults.

Features:

Automatic sleep on model swap (when sleepEndpoints/wakeEndpoints configured)
API endpoint: POST /models/sleep/:model_id
UI displays sleep related states

Summary by CodeRabbit

New Features
- Sleep/wake model management for faster model switching (memory offload instead of full restart)
- Per-model configurable sleep/wake endpoint sequences with global and per-endpoint timeouts
- Multi-step wake flows and API endpoint to trigger model sleep
- UI: sleep/wake action buttons and new status badges (sleepPending, waking, asleep)
Documentation
- Added configuration docs, examples and defaults for sleep/wake behavior and timeouts
Tests
- Extensive tests covering sleep/wake lifecycle, sequencing, macro substitution and error cases
Chores
- CI step adjusted to always run simple-responder creation step

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-11-20T16:04:21Z

Walkthrough

Adds sleep/wake model lifecycle: config schema and defaults for global/per-endpoint timeouts, HTTPEndpoint and SleepMode in model config, process states/transitions (sleepPending, asleep, waking) with sleep/wake request sequencing and HTTP helpers, ProcessGroup/API/UI endpoints and actions, tests, docs, example responder, and a CI step conditional change.

Changes

Cohort / File(s)	Summary
CI workflow `.github/workflows/go-ci-windows.yml`	Removed conditional guard so "Create simple-responder" always runs; "Save Simple Responder" still conditional on cache-hit.
Documentation & examples `README.md`, `docs/configuration.md`, `config.example.yaml`, `config-schema.json`	Added sleep/wake docs and examples; introduced global `sleepRequestTimeout`/`wakeRequestTimeout`, per-model `sleepMode`, `sleepEndpoints` and `wakeEndpoints` with method/body/timeout semantics and examples (vLLM Level 1/2).
Simple responder `cmd/simple-responder/simple-responder.go`	Added HTTP endpoints for testing sleep/wake flows: `/sleep`, `/wake_up`, `/wake_up_fail`, `/collective_rpc` (body check), and `/reset_prefix_cache`.
Config structs & validation `proxy/config/config.go`, `proxy/config/model_config.go`, `config-schema.json`	Added global timeout fields to Config; new `HTTPEndpoint` type and `SleepMode` enum; ModelConfig gains `SleepMode`, `SleepEndpoints`, `WakeEndpoints`; added validation, normalization (method defaults/allowed), per-endpoint timeout semantics (0 → global), macro substitution and endpoint macro validation.
Config tests `proxy/config/config_test.go`, `proxy/config/config_posix_test.go`, `proxy/config/config_windows_test.go`	Updated/added tests to assert new timeout defaults, macro substitution in sleep/wake endpoints, per-endpoint timeout inheritance, multi-step wake sequences, and unknown-macro error cases; test fixtures updated to include SleepMode.
Process lifecycle & HTTP helpers `proxy/process.go`, `proxy/process_test.go`	Added states `sleepPending`, `asleep`, `waking`; wait-group sync fields; new public `MakeIdle()`; sleep/wake flows (`Sleep()`, `wake()`, `sendSleepRequests()`, `sendWakeRequests()`); `buildFullURL()` and `sendHTTPRequest()` with per-endpoint timeout handling; state transitions and tests updated/added.
Process group & API `proxy/processgroup.go`, `proxy/proxymanager_api.go`	Swap behavior changed to call `MakeIdle()` on previous model instead of `Stop()`; added `SleepProcess(modelID string)` on ProcessGroup; API adds `POST /api/models/sleep/:model`, exposes `sleepMode` and sleep-related states.
Frontend `ui/src/contexts/APIProvider.tsx`, `ui/src/pages/Models.tsx`, `ui/src/index.css`	Exposed `sleepMode` and `sleepModel()` API, extended ModelStatus with `sleepPending`/`asleep`/`waking`; Models page shows status badges and Sleep/Wake/Unload actions; added CSS classes for new statuses and action button.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant PM as ProxyManager
    participant PG as ProcessGroup
    participant P as Process
    participant Svc as ModelService

    Client->>PM: POST /api/models/sleep/{model}
    PM->>PG: SleepProcess(modelID)
    activate PG
    PG->>P: isSleepEnabled()
    activate P
    P-->>PG: supported
    deactivate P
    PG->>P: Sleep()
    activate P
    P->>P: state = sleepPending
    loop each sleepEndpoint (in order)
        P->>Svc: HTTP <method> <endpoint> (body, timeout)
        Svc-->>P: 200 / error
    end
    P->>P: state = asleep
    deactivate P
    PG-->>PM: result
    deactivate PG
    PM-->>Client: 200 OK / error

    note over PM,P: Later, request arrives for model
    Client->>PM: request
    PM->>PG: ensure model ready (makeReady)
    PG->>P: MakeIdle()/wake()
    activate P
    P->>P: state = waking
    loop each wakeEndpoint (in order)
        P->>Svc: HTTP <method> <endpoint> (body, timeout)
        Svc-->>P: 200 / error
    end
    P->>P: state = ready
    deactivate P
    PM-->>Client: proceed with request

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45–60 minutes

Areas to focus review on:

proxy/process.go: state machine transitions, wait-group usage, concurrency/deadlock and correct handling of Stop vs sleep paths.
proxy/config/*.go: macro substitution and validateEndpointMacros correctness, timeout inheritance (0 → global) and PORT/MODEL_ID handling.
sendHTTPRequest: connect vs overall timeout semantics and body handling.
processgroup.go behavioral change from Stop() -> MakeIdle(): ensure resource lifecycle and swap logic are safe.
Tests: verify wake failure fallbacks, multi-step sequencing, and error messages for unknown macros.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 34.62% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Title check	❓ Inconclusive	The title 'Sleep mode' is vague and generic, using a non-descriptive term that doesn't convey the specific changes (implementation of sleep/wake functionality for fast model switching).	Consider a more descriptive title like 'Implement sleep/wake functionality for fast model switching' that clearly summarizes the main change.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch sleep-wake

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ca40d9 and db180c2.

📒 Files selected for processing (1)

proxy/config/config_windows_test.go (4 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

proxy/config/config_windows_test.go

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: run-tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

napmany · 2025-11-20T16:08:38Z

@coderabbitai review

coderabbitai · 2025-11-20T16:08:43Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

proxy/process.go (1)

718-721: Proxy gating for sleep states is mostly correct; good that SleepPending is rejected

In ProxyRequest, blocking new traffic when the state is StateSleepPending (while allowing requests to trigger makeReady() from asleep/waking) matches the intended semantics and avoids trying to wake a process that is mid‑sleep. Once the SleepPending handling in makeReady/wake is tightened as suggested earlier, the overall proxy behavior around sleep/wake transitions should be solid.

Also applies to: 756-772

🧹 Nitpick comments (15)

cmd/simple-responder/simple-responder.go (1)
11-11: Sleep/wake test endpoints look good; consider minimal error handling in /collective_rpc.

The added handlers match the vLLM sleep/wake examples and give nice coverage hooks. One small improvement: in /collective_rpc you ignore the io.ReadAll error, which will treat any read failure as “no reload_weights” and return 400. Even in a test helper, a quick error check would make behavior clearer:
-	r.POST("/collective_rpc", func(c *gin.Context) {
-		body, _ := io.ReadAll(c.Request.Body)
-		if strings.Contains(string(body), "reload_weights") {
-			c.Status(http.StatusOK)
-		} else {
-			c.Status(http.StatusBadRequest)
-		}
-	})
+	r.POST("/collective_rpc", func(c *gin.Context) {
+		body, err := io.ReadAll(c.Request.Body)
+		if err != nil {
+			c.Status(http.StatusInternalServerError)
+			return
+		}
+		if strings.Contains(string(body), "reload_weights") {
+			c.Status(http.StatusOK)
+		} else {
+			c.Status(http.StatusBadRequest)
+		}
+	})
Also applies to: 268-292
config-schema.json (1)

51-62: Schema additions align with implementation and examples; maybe call out macro support.

The new global sleepRequestTimeout / wakeRequestTimeout fields and the sleepEndpoints / wakeEndpoints item schemas look consistent with the Go config defaults and config.example.yaml semantics (min ≥1 globally, per-endpoint timeout 0 → inherit global). additionalProperties: false on endpoint objects is a nice guardrail.

If you want to make the schema self‑documenting wrt macros, you could extend the endpoint/body descriptions to mention that macro substitution (e.g. ${PORT}, ${MODEL_ID}) is allowed here, matching the comments in config.example.yaml. Otherwise this looks ready to go.

Also applies to: 230-297

proxy/config/config_posix_test.go (1)

215-218: POSIX config expectations updated correctly for sleep/wake defaults.

The added SleepRequestTimeout: 10 and WakeRequestTimeout: 10 fields in the expected Config mirror the new defaults and keep the snapshot test honest alongside HealthCheckTimeout and MetricsMaxInMemory.

If you want even tighter coverage, you could also assert these two fields in TestConfig_DefaultValuesPosix, but the current expectations here are already a solid safeguard.

config.example.yaml (1)

24-35: Example config for sleep/wake and vLLM modes is clear and matches the code.

The new sleepRequestTimeout/wakeRequestTimeout comments and values, plus the vllm-sleep-level1 and vllm-sleep-level2 model examples, do a nice job of documenting:

Global vs per-endpoint timeout behavior.

Level 1 vs Level 2 semantics.

Multi-step wake flows (/wake_up → /collective_rpc with reload_weights → /reset_prefix_cache).

Everything lines up with the schema and the simple-responder endpoints. If you want to avoid any ambiguity, you might tweak the comments to say “requests are sent to the model’s proxy base URL + endpoint”, but that’s purely a clarity nit.

Also applies to: 258-335

proxy/config/config_test.go (1)

764-808: Sleep/wake config tests are comprehensive and well-aligned with config semantics

These tests nicely cover global vs per-endpoint timeouts, macro substitution (including MODEL_ID/PORT), multi-step wake sequences, unknown-macro errors, defaults, and model-level macro precedence for sleep/wake endpoints. As an optional enhancement, you could add a couple of negative tests that exercise the new validation paths in ModelConfig.UnmarshalYAML (e.g., missing wakeEndpoints when sleepEndpoints are set, invalid HTTP method, or negative timeout) to pin those error messages too.

Also applies to: 805-836, 838-871, 873-923, 925-945, 947-970, 972-1008

ui/src/contexts/APIProvider.tsx (1)

4-14: UI sleep model wiring matches backend API and state model

Extending ModelStatus, adding sleepEnabled, and wiring sleepModel into the context all look consistent with the new /api/models/sleep/*model endpoint and the process state strings. Error handling and memoization follow existing patterns. If you ever introduce model IDs with URL-reserved characters (beyond /), you may want to consider encodeURIComponent here, but for the current *model wildcard usage this is fine.

Also applies to: 16-29, 256-269, 270-286

proxy/process_test.go (2)

241-275: Sleep/wake state transitions in swapState table look coherent

The added cases for StateSleepPending, StateAsleep, and StateWaking (including a couple of invalid transitions) align with a sensible lifecycle: Ready → SleepPending → Asleep → Waking → Ready, with exits to Stopping/Stopped where appropriate. As an optional follow‑up, you could add explicit negative cases like SleepPending→Ready or Asleep→SleepPending if those are intentionally forbidden, to lock the state machine down further.

581-703: New sleep/wake process tests provide good coverage; just confirm Sleep semantics

The new tests cover the main scenarios well: a basic sleep/wake cycle, multi-step wake sequences, preferring Sleep over Stop in MakeIdle, and falling back to start() when wake fails. One thing to double‑check is that Process.Sleep() is synchronous up to StateAsleep; if it ever becomes asynchronous (e.g., setting StateSleepPending and completing in a goroutine), these tests may become flaky and would benefit from a small helper that waits/polls until the target state (or timeout) instead of asserting immediately.
proxy/proxymanager_api.go (2)
15-22: Model API additions line up with UI expectations; consider a config-based fallback for SleepEnabled

Extending Model with SleepEnabled and mapping the new sleep-related states (StateSleepPending, StateAsleep, StateWaking) to "sleepPending" | "asleep" | "waking" keeps the API nicely in sync with the UI’s ModelStatus union and sleepEnabled flag. Right now sleepEnabled is only set when a non‑nil process exists; if there’s any path where a model’s process isn’t instantiated yet, you could optionally fall back to the config’s sleep/wake endpoints to still advertise the capability:
-        if processGroup != nil {
-            process := processGroup.processes[modelID]
-            if process != nil {
-                var stateStr string
+        if processGroup != nil {
+            process := processGroup.processes[modelID]
+            if process != nil {
+                var stateStr string
                 switch process.CurrentState() {
@@
-                }
-                state = stateStr
-                sleepEnabled = process.isSleepEnabled()
-            }
+                }
+                state = stateStr
+                sleepEnabled = process.isSleepEnabled()
+            } else {
+                cfg := pm.config.Models[modelID]
+                sleepEnabled = len(cfg.SleepEndpoints) > 0 && len(cfg.WakeEndpoints) > 0
+            }
+        } else {
+            cfg := pm.config.Models[modelID]
+            sleepEnabled = len(cfg.SleepEndpoints) > 0 && len(cfg.WakeEndpoints) > 0
         }
Purely optional, but it would make sleepEnabled reflect configuration even before a process is created.

Also applies to: 24-34, 52-84, 86-93

243-263: Sleep handler mirrors unload handler and cleanly delegates to SleepProcess

apiSleepSingleModelHandler follows the same pattern as apiUnloadSingleModelHandler: resolve aliases, find the process group, delegate to SleepProcess, and surface errors via sendErrorResponse. That keeps the API consistent and centralizes the actual sleep logic in the process group. If you later distinguish between “sleep not configured” vs. internal failures in SleepProcess, you might consider returning a 400/409 for the former instead of 500, but the current behavior is reasonable and consistent with the unload path.
proxy/config/model_config.go (1)

11-17: Sleep/wake endpoint modeling and validation look solid

Defining HTTPEndpoint and adding SleepEndpoints/WakeEndpoints to ModelConfig, with UnmarshalYAML enforcing “if one is set, both must be set” and validateEndpoint handling required endpoint, method normalization/whitelisting, and non‑negative timeouts, gives you a clean, early-validation story for sleep/wake config. This lines up well with the new tests around timeouts and macro substitution. As an optional future tweak, you could extend validMethods if you ever need verbs like DELETE or HEAD, or add JSON validation for Body if you want to catch malformed payloads at config load time, but for the current use cases this is more than adequate.

Also applies to: 19-33, 56-110, 111-135

docs/configuration.md (2)

75-85: Minor wording nit: hyphenate “event-driven”

In the features table row for hooks, consider changing “event driven functionality” to “event‑driven functionality” for grammatical correctness and to silence the linter.

124-135: Sleep/wake configuration docs align with implementation; maybe call out per-endpoint timeout inheritance more explicitly

The descriptions for sleepRequestTimeout / wakeRequestTimeout and the vLLM sleepEndpoints / wakeEndpoints examples correctly match the implementation: timeouts are in seconds, default to 10 at the config level, and per‑endpoint timeout: 0 inherits the respective global timeout. The macro usage (${PORT}, ${MODEL_ID}) and sequential execution notes are also accurate. If you want to reduce surprises, you could add a short note that negative per‑endpoint timeouts are not validated and effectively behave as “no timeout”, so users should stick to 0 or positive values.

Also applies to: 321-398

proxy/process.go (1)

463-505: HTTP sleep/wake helpers are sound; you may want stricter timeout validation

The sendSleepRequests / sendWakeRequests sequencing, buildFullURL path resolution against config.Proxy, and the shared sendHTTPRequest helper all look correct and match the config semantics (per‑endpoint Timeout in seconds, dial timeout via httpDialTimeout, and a hard requirement on HTTP 200). Because LoadConfigFromReader only normalizes the global sleep/wake timeouts and later fills in per‑endpoint Timeout when it is exactly 0, an explicitly negative per‑endpoint timeout will pass through and behave as “no client timeout”. If you want to keep the configuration surface strictly in “seconds with default inheritance”, you could clamp endpoint timeouts to >= 1 as well and/or reject negative values during config load.

Also applies to: 507-511, 628-691, 693-704

proxy/config/config.go (1)

256-313: Macro handling for sleep/wake endpoints is well-integrated; consider tightening validation for Body content

The flow for endpoints looks good:

User and model macros are applied to SleepEndpoints / WakeEndpoints (Endpoint + Body) alongside other string fields.

${PORT} is substituted only when cmd or proxy requires it, and then also propagated into the endpoint arrays.

validateEndpointMacros ensures no unresolved macros (including reserved PORT / MODEL_ID) remain in Endpoint/Body.

Finally, per-endpoint Timeout == 0 is filled from the global sleep/wake timeout values.

Two small considerations:

validateEndpointMacros will treat any ${...} sequence in the JSON body as an error, even if the intention is to send that literal string downstream. That’s consistent with how macros work elsewhere, but may be worth calling out in docs to avoid surprises.

As with process.go, endpoint Timeout values that are explicitly negative will skip the == 0 defaulting and end up as “no timeout” in the HTTP client. If you’d like stricter config guarantees, you could clamp or reject negative endpoint timeouts during load.

Also applies to: 314-355, 381-387, 410-421, 625-640

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 86e9b93 and f6b9320.

📒 Files selected for processing (18)

.github/workflows/go-ci-windows.yml (0 hunks)
README.md (1 hunks)
cmd/simple-responder/simple-responder.go (2 hunks)
config-schema.json (2 hunks)
config.example.yaml (2 hunks)
docs/configuration.md (3 hunks)
proxy/config/config.go (8 hunks)
proxy/config/config_posix_test.go (1 hunks)
proxy/config/config_test.go (1 hunks)
proxy/config/config_windows_test.go (1 hunks)
proxy/config/model_config.go (3 hunks)
proxy/process.go (15 hunks)
proxy/process_test.go (3 hunks)
proxy/processgroup.go (2 hunks)
proxy/proxymanager_api.go (5 hunks)
ui/src/contexts/APIProvider.tsx (4 hunks)
ui/src/index.css (3 hunks)
ui/src/pages/Models.tsx (3 hunks)

💤 Files with no reviewable changes (1)

.github/workflows/go-ci-windows.yml

🧰 Additional context used

🪛 LanguageTool

docs/configuration.md

(EN_COMPOUND_ADJECTIVE_INTERNAL)

🔇 Additional comments (10)

README.md (1)

32-39: Docs update clearly exposes the new sleep API and behavior.

The /models/sleep/:model_id bullet and the “Fast model switching with sleep/wake support” feature line accurately reflect the new lifecycle and configuration; no changes needed.

proxy/config/config_windows_test.go (1)

204-207: Windows config snapshot now includes sleep/wake timeouts, matching POSIX.

The expected Config for Windows correctly adds SleepRequestTimeout: 10 and WakeRequestTimeout: 10 next to HealthCheckTimeout and MetricsMaxInMemory, keeping platform-specific tests in sync with the new defaults.

ui/src/index.css (1)

136-138: New status and button utility classes are consistent with existing design system.

The added .status-badge utility, sleep/wake/shutdown status variants, and .btn--action min‑width all follow the existing naming and token usage; they should drop into the models UI cleanly. Just make sure consumers combine .status with .status-badge where you expect the min width.

Also applies to: 153-165, 179-181

proxy/processgroup.go (2)

68-70: Using MakeIdle() on the previous process fits the new sleep-friendly swap behavior.

Switching from hard-stopping the prior model to calling lastProcess.MakeIdle() when swap is enabled lets the underlying Process decide whether to sleep, stop, or no-op based on its config/state, which aligns with the new lifecycle. Given that processes is populated once in NewProcessGroup and entries aren’t removed, indexing pg.processes[pg.lastUsedProcess] under the lock is safe.

115-133: SleepProcess is simple and correctly synchronized; behavior matches the public API.

The new SleepProcess method:

Guards access to pg.processes with the group lock.

Cleanly errors for unknown models and for models without sleep support (isSleepEnabled()).

Calls process.Sleep() outside the lock, avoiding holding the group mutex over a potentially long network operation.

This matches the /models/sleep/:model_id API semantics and integrates without introducing obvious races with ProxyRequest / StopProcesses. No changes needed here.

proxy/process_test.go (1)

91-112: Updated error expectation correctly reflects makeReady path

Switching the expectation from "unable to start process" to "unable to makeReady process" keeps this test in sync with the new startup/makeReady semantics while still verifying the error is surfaced as a 502.

ui/src/pages/Models.tsx (1)

40-40: Sleep/wake actions wiring looks consistent; confirm wake semantics for loadModel

The conditional actions per model.state and model.sleepEnabled are coherent, and the disabled button fallback for other states is a good guard. One thing to double-check is that loadModel(model.id) is indeed the intended wake path from asleep (i.e., that the backend “load” endpoint now calls the unified make‑ready logic and correctly handles both stopped → ready and asleep → ready). If a distinct wake endpoint exists, it may be clearer to route the “Wake” button through that instead.

Also applies to: 149-151, 167-202

proxy/process.go (2)

27-43: Lifecycle and state machine extensions look coherent

The introduction of StateSleepPending, StateAsleep, and StateWaking plus the per‑transition waitgroups (waitStarting, waitSleeping, waitWaking) is wired correctly: swapState only bumps the waitgroup when the CAS succeeds, and each of start, Sleep, and wake pairs that with a single defer ...Done(), so there’s no underflow or leak. The expanded isValidTransition rules are consistent with the intended graph, and MakeIdle() sensibly prefers Sleep() when endpoints are configured, falling back to Stop() otherwise. The TTL monitor also correctly treats sleeping/waking states as “active” for the purposes of UnloadAfter, which matches the idea that a sleeping model is still eligible for eventual full unload.

Also applies to: 80-88, 170-203, 205-223, 252-260, 388-418, 420-462

344-347: Health check refactor via buildFullURL/checkHealthEndpoint is an improvement

Switching the health‑check path to go through buildFullURL and checkHealthEndpoint (which reuses the same HTTP helper as sleep/wake) removes ad‑hoc URL concatenation and centralizes timeout behavior. The logging still uses the precomputed healthURL, and early errors on invalid proxy URL or endpoint parsing are surfaced cleanly.

Also applies to: 364-377

proxy/config/config.go (1)

112-141: Global sleep/wake timeout defaults and validation are consistent

Adding SleepRequestTimeout and WakeRequestTimeout to Config, defaulting both to 10 and clamping them to a minimum of 1 second in LoadConfigFromReader, is a clean extension of the existing health check timeout pattern. This matches the documentation and ensures that, even if YAML omits or sets these fields to 0, the HTTP clients for sleep/wake calls won’t run with an accidental zero timeout.

Also applies to: 170-185, 191-205

proxy/process.go

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (4)

ui/src/pages/Models.tsx (1)

200-202: Status badge ties neatly into extended state set

Rendering state via status status--${model.state} gives a unified hook to style the new sleep-related states (sleepPending, asleep, waking) alongside the existing ones. Just ensure corresponding CSS exists for the new modifiers; otherwise they’ll still degrade gracefully as plain text.
proxy/process.go (1)
647-691: Consider validating JSON body before sending HTTP requests.

When endpoint.Body is provided, sendHTTPRequest sets the Content-Type to application/json but doesn't validate that the body is well-formed JSON. If configuration contains invalid JSON or unexpanded macros, the request will fail at the inference server with potentially cryptic errors.

Consider adding basic JSON validation:
 var bodyReader io.Reader
 if endpoint.Body != "" {
+	// Validate it's valid JSON
+	if !json.Valid([]byte(endpoint.Body)) {
+		return fmt.Errorf("invalid JSON in request body: %s", endpoint.Body)
+	}
 	bodyReader = strings.NewReader(endpoint.Body)
 }
This provides earlier, clearer error messages during sleep/wake operations.
config-schema.json (2)
270-303: Schema description states "required" but doesn't enforce it programmatically.

Line 302 describes wakeEndpoints as "Required when sleepMode is 'enable'", but the JSON schema doesn't enforce this constraint. The field is not in the required array and defaults to an empty array [].

This means:

Users could enable sleep mode without providing wake endpoints

Validation must happen elsewhere (likely in Go code)

Users won't get immediate schema validation feedback

Consider either:
Option A: Add conditional schema validation using if/then to enforce the requirement:
"if": {
  "properties": { "sleepMode": { "const": "enable" } }
},
"then": {
  "required": ["wakeEndpoints", "sleepEndpoints"],
  "properties": {
    "wakeEndpoints": { "minItems": 1 },
    "sleepEndpoints": { "minItems": 1 }
  }
}
Option B: Clarify the description to state: "Required when sleepMode is 'enable' (validated at runtime, not by schema)."
This applies to both wakeEndpoints and sleepEndpoints (lines 236-269, 270-303).

242-245: Clarify "URL path" terminology in endpoint descriptions.

The descriptions use "URL path" but the examples include query parameters (e.g., /sleep?level=1). Technically, query parameters are part of the URL but not the path component.

Consider updating the descriptions to be more precise:

Line 245: Change "URL path for the sleep endpoint" → "URL endpoint for sleep"

Line 279: Change "URL path for the wake endpoint" → "URL endpoint for wake"

Or explicitly state: "URL path for the endpoint (may include query parameters)."

This helps users understand that query strings, fragments, and other URL components are supported.

Also applies to: 276-279

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f6b9320 and 0ca40d9.

📒 Files selected for processing (10)

config-schema.json (2 hunks)
config.example.yaml (2 hunks)
docs/configuration.md (3 hunks)
proxy/config/config_posix_test.go (4 hunks)
proxy/config/model_config.go (4 hunks)
proxy/process.go (15 hunks)
proxy/process_test.go (3 hunks)
proxy/proxymanager_api.go (5 hunks)
ui/src/contexts/APIProvider.tsx (4 hunks)
ui/src/pages/Models.tsx (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

proxy/config/config_posix_test.go

🧰 Additional context used

🧬 Code graph analysis (4)

proxy/proxymanager_api.go (2)

proxy/config/model_config.go (1)

SleepMode (20-20)

proxy/process.go (3)

StateSleepPending (37-37)

StateAsleep (38-38)

StateWaking (39-39)

ui/src/pages/Models.tsx (1)

ui/src/contexts/APIProvider.tsx (1)

useAPI (291-297)

proxy/process.go (1)

proxy/config/model_config.go (3)

SleepMode (20-20)

SleepModeEnable (23-23)

HTTPEndpoint (12-17)

proxy/process_test.go (2)

proxy/process.go (8)

StateReady (30-30)

StateSleepPending (37-37)

StateAsleep (38-38)

StateStopping (31-31)

StateStopped (28-28)

StateWaking (39-39)

ErrInvalidStateTransition (167-167)

NewProcess (99-143)

proxy/config/model_config.go (3)

SleepMode (20-20)

SleepModeEnable (23-23)

HTTPEndpoint (12-17)

🪛 LanguageTool

docs/configuration.md

(EN_COMPOUND_ADJECTIVE_INTERNAL)

🔇 Additional comments (15)

config.example.yaml (2)

24-35: Global sleep/wake timeout docs look consistent with implementation

The global sleepRequestTimeout / wakeRequestTimeout options and their described defaults/override semantics line up with the HTTPEndpoint Timeout field and the intended “0 = use global” behavior. Nothing to change here.

258-344: vLLM sleep-mode examples are well-scoped and match config schema

The level 1 and level 2 vllm-sleep-* examples correctly use sleepMode: enable and the sleepEndpoints / wakeEndpoints structure (endpoint, method, optional body/timeout). They also reflect the intended multi-step wake flow for level 2. Good, self-contained examples.

proxy/proxymanager_api.go (2)

15-22: Model status + SleepMode exposure align with UI expectations

Surfacing SleepMode on the API Model type and mapping the new process states (sleepPending, asleep, waking) to string values is consistent with the TS ModelStatus union and UI logic. The cast string(pm.config.Models[modelID].SleepMode) will yield "enable" / "disable" as expected for the React checks.

Also applies to: 52-81, 84-91

24-35: New /api/models/sleep endpoint mirrors unload semantics cleanly

The apiSleepSingleModelHandler correctly resolves aliases via RealModelName, looks up the process group, calls SleepProcess, and returns appropriate HTTP status codes. Routing POST /models/sleep/*model through this handler matches the UI’s /api/models/sleep/${model} usage and keeps behavior parallel to the unload handler.

Also applies to: 241-261

docs/configuration.md (2)

124-135: Global sleep/wake timeout documentation matches config behavior

The descriptions for sleepRequestTimeout and wakeRequestTimeout (defaults, global scope, and per-endpoint overrides) are consistent with the configuration fields and HTTPEndpoint Timeout semantics. No changes needed.

321-407: vLLM sleep-mode docs aligned with example config and process behavior

The level 1 and level 2 vLLM sleep-mode examples here mirror config.example.yaml and accurately describe the single-step vs multi-step wake flows. The notes about required sleepMode: enable, endpoint arrays, and optional per-endpoint timeout are precise and helpful.

ui/src/pages/Models.tsx (2)

40-46: sleepModel wiring from API context is straightforward

Including sleepModel alongside other model actions from useAPI() keeps the component interface consistent with existing load/unload operations. No issues here.

149-151: Action matrix for stopped/ready/asleep states is coherent and guarded

The conditional rendering in the Actions column (Load for stopped, Wake+Unload for asleep via loadModel / unloadSingleModel, Sleep+Unload for ready when sleepMode === "enable", and a disabled state button otherwise) matches the backend state machine and avoids issuing conflicting commands while in transitional states. The layout (w-40 + min-w-36 flex container) should keep multiple buttons readable.

Also applies to: 167-202

ui/src/contexts/APIProvider.tsx (2)

4-14: Types extended correctly for new lifecycle and SleepMode

The ModelStatus union now covers all server states, including sleepPending, asleep, and waking, plus unknown. Adding sleepMode: string to Model and the corresponding sleepModel entry to APIProviderType keeps the public context surface in sync with the backend Model JSON.

Also applies to: 16-29

256-269: sleepModel implementation follows existing fetch/error pattern

sleepModel reuses the established fetch pattern (POST, status check, throwing on error) and is correctly included in the memoized context value and dependency list. SSE model-status events will refresh UI state after a sleep request, so no additional local state juggling is necessary.

Also applies to: 270-286

proxy/process_test.go (3)

100-112: Updated error expectation matches new startup path

Changing the assertion to look for "unable to makeReady process" keeps TestProcess_BrokenModelConfig aligned with the new error wording from the startup/makeReady path, while still verifying that a broken command surfaces a clear 502 with details on the second attempt.

241-275: swapState table now properly exercises sleep/wake transitions

The additional cases for Ready ↔ SleepPending ↔ Asleep ↔ Waking (including invalid direct Ready↔Asleep transitions) give good coverage of the extended state machine and ensure ErrInvalidStateTransition and ErrExpectedStateMismatch are enforced consistently for the new states.

581-707: New sleep/wake tests cover core flows and fallback behavior

The added tests (basic sleep/wake, multi-step wake sequence, using Sleep from MakeIdle, and wake failure falling back to start()) collectively exercise the main control paths around SleepMode and HTTPEndpoint sequences. They mirror how configuration is mutated in other tests and should catch regressions in the state transitions or endpoint sequencing logic.

proxy/config/model_config.go (2)

5-9: SleepMode and HTTPEndpoint integration into ModelConfig is coherent

Introducing HTTPEndpoint and SleepMode and wiring them into ModelConfig with SleepModeDisable as the default gives a clear, explicit toggle for sleep/wake behavior. The defaults in UnmarshalYAML avoid surprising implicit activation, and struct tags (sleepMode, sleepEndpoints, wakeEndpoints) match the YAML examples and docs.

Also applies to: 11-25, 27-46, 69-85

98-130: Endpoint and SleepMode validation are strict and user-friendly

The UnmarshalYAML validation:

Ensures sleepMode is only "enable" or "disable".

Requires both sleepEndpoints and wakeEndpoints when enabled.

Applies validateEndpoint to every endpoint, which:

Requires a non-empty path.

Defaults method to POST and normalizes to uppercase, allowing only GET/POST/PUT/PATCH.

Enforces non-negative timeouts (allowing 0 to mean “use global timeout”).

This should catch most misconfigurations early with precise error messages (including array indices). Nicely done.

Also applies to: 132-155

docs/configuration.md

napmany added 5 commits November 19, 2025 13:33

add sleep mode

1519d38

fix go-ci-windows simple-responder build

e81ef6f

min values for SleepRequestTimeout and WakeRequestTimeout

cba053a

fix process state check in makeReady func

78d33ff

fix wake() fall back to start()

f6b9320

napmany changed the title ~~Sleep wake~~ Sleep mode Nov 20, 2025

coderabbitai bot reviewed Nov 20, 2025

View reviewed changes

proxy/process.go Show resolved Hide resolved

explicit sleep mode config

0ca40d9

coderabbitai bot reviewed Nov 20, 2025

View reviewed changes

docs/configuration.md Show resolved Hide resolved

fix config_windows_test

db180c2

napmany merged commit 14bce12 into main Nov 20, 2025
4 checks passed

napmany deleted the sleep-wake branch November 20, 2025 19:08

coderabbitai bot mentioned this pull request Dec 23, 2025

Sync with llama-swap v178 #11

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sleep mode#4

Sleep mode#4
napmany merged 7 commits intomainfrom
sleep-wake

napmany commented Nov 19, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 20, 2025 •

edited

Loading

Uh oh!

napmany commented Nov 20, 2025

Uh oh!

coderabbitai bot commented Nov 20, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

napmany commented Nov 19, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add Sleep/Wake Support for Fast Model Switching

Summary

What Changed

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

napmany commented Nov 20, 2025

Uh oh!

coderabbitai bot commented Nov 20, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

napmany commented Nov 19, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 20, 2025 •

edited

Loading