Conversation
WalkthroughAdds configurable model sleep/wake lifecycle: config schema and defaults for per-endpoint timeouts, HTTPEndpoint model fields, process state machine changes and HTTP sequencing for multi-step sleep/wake, new API/UI controls to trigger sleep, test coverage, and documentation updates. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant API as ProxyManager API
participant PG as ProcessGroup
participant P as Process
participant Remote as Remote Model Endpoint
Note over Client,Remote `#D6EAF8`: Sleep flow (high-level)
Client->>API: POST /api/models/sleep/{model}
API->>PG: SleepProcess(modelID)
PG->>P: Sleep()
P->>P: isSleepEnabled()
alt enabled
P->>Remote: sendSleepRequests() (step1..N)
Remote-->>P: 200 OK
P->>P: swapState Ready→sleepPending→asleep
P-->>PG: success
PG-->>API: 200 OK
else not enabled
P-->>PG: error
PG-->>API: 400/ERR
end
sequenceDiagram
participant Client
participant API as ProxyManager API
participant PG as ProcessGroup
participant P as Process
participant Remote as Remote Model Endpoint
Note over Client,Remote `#F9EBEA`: Wake on request (high-level)
Client->>API: Any request for model while asleep
API->>PG: makeReady()
PG->>P: wake() / makeReady()
P->>Remote: sendWakeRequests() (step1..N)
alt all OK
Remote-->>P: 200 OK
P->>P: swapState waking→ready
P-->>PG: Ready
PG-->>API: Forward client request
API-->>Client: Response
else failure
Remote-->>P: error
P->>P: fallback to full start (stop/start)
P-->>PG: Stopped/Error
PG-->>API: Error to client
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🧰 Additional context used🧠 Learnings (2)📚 Learning: 2025-10-29T05:26:34.964ZApplied to files:
📚 Learning: 2025-08-29T05:03:26.638ZApplied to files:
🧬 Code graph analysis (2)proxy/process_test.go (2)
proxy/process.go (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (16)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
proxy/config/config.go (1)
112-141: Global sleep/wake timeout config fields lack non-negative validation
SleepRequestTimeoutandWakeRequestTimeoutare added toConfigwith defaults set inLoadConfigFromReader(lines 179–180). However, unlikeHealthCheckTimeout(lines 191–194) andStartPort(lines 196–198), these timeouts receive no validation. A negative value in YAML would pass unmarshal, propagate as-is through theintfield, and later convert to a negativetime.Durationat line 660 (time.Duration(endpoint.Timeout) * time.Second), potentially causing unexpected HTTP client behavior.Add bounds validation after line 194—e.g., clamp to 0 or reject values < 0 with a config error—to match the pattern used for
HealthCheckTimeoutand prevent silent propagation of invalid timeouts.proxy/process.go (1)
653-697: Validate global timeout defaults are non-zero to prevent indefinite hangsThe concern is valid: if
sleepRequestTimeoutorwakeRequestTimeoutare configured to0in YAML, endpoints with unspecified timeouts get normalized to these zero values, resulting inhttp.Client.Timeout = 0(no timeout) and potential indefinite hangs.While the code validates individual endpoint timeouts cannot be negative (proxy/config/model_config.go:131) and normalizes zero endpoints to global defaults (proxy/config/config.go:403-410), there is no validation preventing the global timeout values themselves from being zero or negative.
Recommend adding validation for
SleepRequestTimeoutandWakeRequestTimeoutin the config loading phase (proxy/config/config.go, after line 194) to ensure they're positive, similar to the existingHealthCheckTimeoutminimum enforcement:if config.HealthCheckTimeout < 15 { config.HealthCheckTimeout = 15 } + +if config.SleepRequestTimeout <= 0 { + config.SleepRequestTimeout = 10 +} +if config.WakeRequestTimeout <= 0 { + config.WakeRequestTimeout = 10 +}
🧹 Nitpick comments (11)
cmd/simple-responder/simple-responder.go (1)
11-11: Sleep/wake test endpoints are wired correctly; consider minimal error handlingThe added endpoints give you the right knobs for exercising sleep, wake, failure, and collective RPC behavior; using
strings.Containsfor"reload_weights"is sufficient for this test helper.If you want a bit more robustness without much noise, you could at least check the read error:
- r.POST("/collective_rpc", func(c *gin.Context) { - body, _ := io.ReadAll(c.Request.Body) - if strings.Contains(string(body), "reload_weights") { + r.POST("/collective_rpc", func(c *gin.Context) { + body, err := io.ReadAll(c.Request.Body) + if err != nil { + c.Status(http.StatusBadRequest) + return + } + if strings.Contains(string(body), "reload_weights") { c.Status(http.StatusOK) } else { c.Status(http.StatusBadRequest) } })Purely optional given this is a local test responder.
Also applies to: 268-292
ui/src/pages/Models.tsx (1)
40-40: Action column logic correctly reflects sleep/wake lifecyclePulling
sleepModelfromuseAPIand mapping actions bymodel.state/sleepEnabledgives a clear, predictable UI:
stopped→ Load.asleep→ Wake (vialoadModel) + Unload.ready→ optional Sleep (whensleepEnabled) + Unload.- Transitional/other states → disabled button.
This matches the new state machine behavior and prevents conflicting actions while transitions are in progress. If you ever want to polish UX further, a tiny wrapper that awaits these async calls and disables the row while pending would avoid double-clicks, but it’s not required for correctness.
Also applies to: 167-199
docs/configuration.md (1)
75-85: Minor wording tweak: hyphenate “event‑driven”In the features table, consider changing “event driven functionality” to “event‑driven functionality” for standard phrasing.
proxy/processgroup.go (1)
115-133: SleepProcess helper is safe and avoids holding the group lock during sleepThe flow—lookup under lock, capability check, unlock, then
process.Sleep()—avoids blocking the group mutex on a potentially long-running operation and is safe given the process map is not mutated after initialization.You might optionally enrich the error to include the model ID, e.g.
"model %s does not support sleep mode", to ease debugging when called via the API.proxy/process_test.go (2)
581-610: Basic Sleep/Wake test exercises happy path but not HTTP failure modes
TestProcess_SleepAndWakeBasiccorrectly validates the Ready→Asleep→Ready flow viaSleep()andwake(), but it only asserts state changes and not that the configured sleep/wake endpoints are actually hit or that HTTP failures are surfaced.You might consider a follow-up test that uses a test responder configured to fail one of the sleep or wake endpoints and asserts that
Sleep()/wake()behave as expected (e.g., falling back toStopImmediately()and ending inStateStopped), similar toTestProcess_WakeFailureFallsBackToStart, but for sleep as well.
641-670: Swap-related sleep test only checks final state, not side effects
TestProcess_SleepInsteadOfStopWithSwapconfirms thatMakeIdle()uses sleep (ending inStateAsleep) when sleep/wake is configured, which matches the intended behavior during swaps. It does not, however, assert that the upstream process remains running (as opposed to being stopped), so if you want stronger guarantees around “sleep vs stop” semantics, you could add assertions based on process PID or log content.proxy/config/config.go (2)
176-185: Default sleep/wake timeouts are reasonable but 0 means “no timeout”The defaults of 10 seconds for both
SleepRequestTimeoutandWakeRequestTimeoutare sensible for local sleep/wake endpoints. Note that because endpoints withtimeout: 0inherit these values, but a global timeout of 0 will in turn leave endpoints at 0, this effectively means “no HTTP client timeout” for sleep/wake operations.If unbounded sleep/wake HTTP requests are undesirable, you may want to treat
0as “use default 10” (or enforce a minimum) and only allow explicit large values rather than “no timeout”.
371-377: Endpoint macro validation is strict and returns early on first issue
validateEndpointMacrosensures that any remaining${...}in sleep/wake endpoint paths or bodies after substitution are treated as errors, with reserved macros (PORT,MODEL_ID) required to have been fully expanded. This matches the existing validation for other fields. Note that it reports only the first offending macro; if you want more user-friendly diagnostics, a future enhancement could aggregate all errors in a single pass.proxy/config/config_test.go (1)
873-923: Multi-step wake config test encodes default wake timeout as a contract
TestConfig_SleepWakeMultiStepWakeverifies a vLLM-style multi-step wake sequence and asserts that unspecified wake timeouts default to 10 seconds, effectively pinning the default defined in config.go. This is helpful, but be aware that changing the default in code will require updating this test as well.If you anticipate changing default timeouts in the future, you might factor the default value into a shared constant used both in code and tests to avoid magic numbers.
proxy/process.go (2)
438-453: StopImmediately now respects current state but repeats an older race patternUsing
initState := p.CurrentState()and feeding that intoswapState(initState, StateStopping)broadensStopImmediatelyto work from all valid origin states (Starting, Ready, SleepPending, Asleep, Waking). However, as withStop(),Stop()waits forinFlightRequestsbefore transitioning intoStateStopping, which leaves a small window where new requests can start betweenWait()returning and the state change, and those requests will not be counted in the wait.If you want stronger guarantees, consider flipping the order to transition to
StateStoppingfirst (soProxyRequestrejects new work) and thenWait()for existing in-flight requests to drain before callingStopImmediately(). This would close the TOCTOU window for both stop and TTL-triggered unloads.
634-651: buildFullURL composes sleep/wake/health URLs robustly, but assumes a valid proxy base
buildFullURLcorrectly usesurl.ParseandResolveReferenceto combine the proxy base with per-endpoint paths, handling absolute vs relative endpoint paths. Ifconfig.Proxyis empty or malformed, though, all sleep/wake/health requests will fail at runtime with parse errors rather than being rejected at config load time.You may want to strengthen config validation (in config.go) to reject models that define sleep/wake endpoints without a valid, non-empty
proxyURL, so misconfiguration is caught early.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (18)
.github/workflows/go-ci-windows.yml(0 hunks)README.md(1 hunks)cmd/simple-responder/simple-responder.go(2 hunks)config-schema.json(2 hunks)config.example.yaml(2 hunks)docs/configuration.md(3 hunks)proxy/config/config.go(7 hunks)proxy/config/config_posix_test.go(1 hunks)proxy/config/config_test.go(1 hunks)proxy/config/config_windows_test.go(1 hunks)proxy/config/model_config.go(3 hunks)proxy/process.go(15 hunks)proxy/process_test.go(3 hunks)proxy/processgroup.go(2 hunks)proxy/proxymanager_api.go(5 hunks)ui/src/contexts/APIProvider.tsx(4 hunks)ui/src/index.css(3 hunks)ui/src/pages/Models.tsx(3 hunks)
💤 Files with no reviewable changes (1)
- .github/workflows/go-ci-windows.yml
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-10-29T05:26:34.964Z
Learnt from: mostlygeek
Repo: mostlygeek/llama-swap PR: 371
File: proxy/process.go:0-0
Timestamp: 2025-10-29T05:26:34.964Z
Learning: In proxy/process.go, the loading message "llama-swap loading model: {name}" intentionally uses p.ID (Process.ID) rather than the realModelName from the request context. This is the correct design choice.
Applied to files:
proxy/proxymanager_api.goproxy/process_test.goproxy/processgroup.goproxy/process.go
📚 Learning: 2025-10-07T05:41:52.728Z
Learnt from: mostlygeek
Repo: mostlygeek/llama-swap PR: 337
File: proxy/config/config.go:268-291
Timestamp: 2025-10-07T05:41:52.728Z
Learning: In the mostlygeek/llama-swap repository's proxy/config/config.go, macro-in-macro substitution requires that referenced macros be declared earlier in the YAML file (declaration order matters). A macro can only reference macros that appear before it in the configuration. The YAML order is preserved by the UnmarshalYAML implementation for MacroList, and the LIFO substitution approach is intentional.
Applied to files:
proxy/config/config.go
📚 Learning: 2025-08-29T05:03:26.638Z
Learnt from: mostlygeek
Repo: mostlygeek/llama-swap PR: 278
File: proxy/process.go:368-379
Timestamp: 2025-08-29T05:03:26.638Z
Learning: For health check implementations in the llama-swap codebase, prefer simple and explicit HTTP client configurations over performance optimizations like preserving DefaultTransport settings, as health checks only need to verify HTTP 200 responses and clarity is more important than HTTP/2 support or connection pooling.
Applied to files:
proxy/process.go
🧬 Code graph analysis (6)
proxy/proxymanager_api.go (2)
proxy/process.go (3)
StateSleepPending(37-37)StateAsleep(38-38)StateWaking(39-39)ui/src/contexts/APIProvider.tsx (1)
Model(7-14)
proxy/config/config.go (1)
proxy/config/model_config.go (2)
ModelConfig(19-54)HTTPEndpoint(12-17)
proxy/config/config_test.go (1)
proxy/config/config.go (1)
LoadConfigFromReader(170-453)
ui/src/pages/Models.tsx (1)
ui/src/contexts/APIProvider.tsx (1)
useAPI(291-297)
proxy/process_test.go (2)
proxy/process.go (8)
StateReady(30-30)StateSleepPending(37-37)StateAsleep(38-38)StateStopping(31-31)StateStopped(28-28)StateWaking(39-39)ErrInvalidStateTransition(167-167)NewProcess(99-143)proxy/config/model_config.go (1)
HTTPEndpoint(12-17)
proxy/process.go (1)
proxy/config/model_config.go (1)
HTTPEndpoint(12-17)
🪛 LanguageTool
docs/configuration.md
[uncategorized] ~81-~81: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...me | | hooks | event driven functionality | | `...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: run-tests
🔇 Additional comments (42)
README.md (2)
32-32: Documenting/models/sleep/:model_idlooks consistentThe new route description matches the intended behavior and clearly calls out the sleep/wake configuration requirement. No changes needed.
39-39: Feature bullet matches new sleep/wake capabilityThe “Fast model switching with sleep/wake support” text is accurate and gives a good mental model (offloading vs full restart). Reads well as-is.
ui/src/index.css (1)
136-138: New status and button utility styles align with the sleep/wake statesThe added
.status-badgeand thestatus--sleepPending/status--waking/status--asleep/status--shutdownmodifiers line up with the extendedModelStatusunion, and.btn--actiongives a consistent min width for the action buttons. Styling is cohesive with the existing status and button patterns.Also applies to: 153-165, 179-181
ui/src/contexts/APIProvider.tsx (2)
4-4: API types cleanly model new sleep-related state and capabilitiesExtending
ModelStatuswith sleep-related values and addingsleepEnabledtoModel, plus exposingsleepModelonAPIProviderType, keeps the type surface in sync with the backend behavior and the Models UI. The typing is coherent and should make misuse of unsupported states harder.Also applies to: 7-14, 16-29
256-268: sleepModel wiring is consistent with existing load/unload helpersThe
sleepModelhelper mirrors the patterns used byunloadAllModels/loadModel: POST to the new/api/models/sleep/{model}endpoint, throw on non-OK, log before rethrowing, and is correctly threaded into the memoized context value and deps. Looks good and ready for the UI consumer.Also applies to: 277-285
ui/src/pages/Models.tsx (1)
149-151: Status column uses the new badge styles consistentlyAdding the “Actions” header and switching the state cell to
status-badge text-center status status--${model.state}ties neatly into the new CSS variants. All definedModelStatusvalues now have a visual treatment, with unknowns still falling back to the base.statusstyling.Also applies to: 200-202
config.example.yaml (2)
24-35: Global sleep/wake timeout defaults look consistentThe new
sleepRequestTimeout/wakeRequestTimeoutsettings and comments are clear and line up with the schema semantics (global default, per-endpoint override, used during swap). No issues from a config/UX perspective.
258-335: vLLM sleep mode examples are well-scoped and match the config surfaceThe
vllm-sleep-level1andvllm-sleep-level2examples correctly illustrate:
- Use of
sleepEndpoints/wakeEndpoints- Per-endpoint HTTP method/body/timeout
- Macro usage and level 1 vs level 2 behavior
These are good, concrete templates for users to copy without surprises.
config-schema.json (2)
51-62: Root sleep/wake timeout schema matches intended behavior
sleepRequestTimeout/wakeRequestTimeoutare correctly modeled as integers withminimum: 1,default: 10, and descriptions that explain the global vs per-endpoint behavior. This aligns with the usage shown in YAML examples and tests.
230-297: sleepEndpoints/wakeEndpoints schema is precise and validator-friendlyThe new
sleepEndpoints/wakeEndpointsdefinitions:
- Require
endpoint- Constrain
methodto a small enum with sensible default- Use
timeout>= 0 with0clearly documented as “use global timeout”- Disallow
additionalPropertiesThis should give good feedback from schema-aware editors while matching the Go
HTTPEndpointtype and config behavior.docs/configuration.md (2)
124-135: Global sleep/wake timeout docs are clear and match behaviorThe description of
sleepRequestTimeout/wakeRequestTimeouthere matches the config schema and example YAML: global defaults, per-endpoint override viatimeout, and usage during swap/wake. Good level of detail for users.
321-398: vLLM sleep mode documentation aligns with config examplesThe Level 1 and Level 2 vLLM sleep mode examples in the docs correctly reflect:
- The same YAML structure as
config.example.yaml- Level-specific endpoints and multi-step wake sequence
- Optional per-endpoint timeout overrides
Nice balance of explanation and copy‑pasteable snippets.
proxy/config/config_windows_test.go (1)
204-207: Windows config defaults extended correctly for sleep/wakeAdding
SleepRequestTimeout: 10andWakeRequestTimeout: 10alongside existingHealthCheckTimeoutandMetricsMaxInMemorykeeps Windows expectations in sync with Posix and the new global defaults. Looks good.proxy/config/config_posix_test.go (1)
215-218: Posix config defaults updated to include sleep/wake timeoutsIncluding
SleepRequestTimeout: 10andWakeRequestTimeout: 10in the expectedConfigstruct keeps this test aligned with the new global defaults and the Windows test. No functional concerns.proxy/processgroup.go (1)
68-70: Switching from Stop() to MakeIdle() on last-used process is reasonableUsing
lastProcess.MakeIdle()when swapping models keeps the group logic the same while delegating finer-grained lifecycle control toProcess(which now understands sleep/wake). No issues spotted in the locking or flow here.proxy/proxymanager_api.go (2)
15-22: Model.SleepEnabled + sleep-related states are wired through cleanlyThe addition of
SleepEnabledand the extra state mappings (sleepPending,asleep,waking) fit naturally intogetModelStatus:
sleepEnabledis computed directly fromprocess.isSleepEnabled(), so the UI can avoid offering sleep where unsupported.- Existing models default to
SleepEnabled: falsewithout breaking compatibility.- State string values mirror the new
ProcessStateconstants.This is a straightforward, backward-compatible extension of the status API.
Also applies to: 52-84
243-263: *New /api/models/sleep/model endpoint mirrors unload flow and looks correctThe
apiSleepSingleModelHandler:
- Resolves aliases via
RealModelName, just like unload.- Validates the process group and propagates errors from
SleepProcess.- Returns
"OK"on success.Given that callers can already inspect
sleepEnabledfrom the model list, this endpoint behavior is consistent and predictable.proxy/config/model_config.go (1)
11-17: Sleep/wake endpoint config validation is solid and matches documented semanticsThe additions here are well-structured:
HTTPEndpointcleanly models the YAML structure used in examples and schema.SleepEndpoints/WakeEndpointsonModelConfiggive per-model control without impacting existing configs (empty by default).UnmarshalYAML:
- Preserves existing defaults.
- Enforces the “both-or-none” rule for sleep/wake endpoints, which guards against partially configured models.
- Validates each endpoint with clear, indexed error messages.
validateEndpoint:
- Requires a non-empty endpoint path.
- Defaults
Methodto POST, uppercases it, and restricts to a small safe set.- Ensures
Timeoutis non-negative, aligning with0as “use global timeout”.Overall this is a robust, backward-compatible extension of the model config surface.
Also applies to: 30-33, 56-109, 111-135
proxy/process_test.go (4)
100-112: Assertion string now correctly matches makeReady error pathUpdating the expected substring to
"unable to makeReady process"aligns this test with the newProxyRequesterror message whenmakeReady()fails, so the test remains meaningful with the refactored startup/wake logic.
241-275: Sleep/wake state transition table looks consistent with isValidTransitionThe added cases for
SleepPending,Asleep, andWakingmatch the transition rules inisValidTransitionand exercise both valid and invalid edges (e.g., Ready→SleepPending, SleepPending→Asleep, Asleep→Waking, and invalid Ready→Asleep/Asleep→Ready), which gives good coverage of the new lifecycle states.
612-639: Multi-step wake test nicely validates ordered endpoint execution
TestProcess_MultiStepWakeSequencedoes a good job verifying that a multi-step wake sequence succeeds end-to-end and that the process ends up back inStateReadyafter executing all three configured endpoints; this is a solid coverage point for the multi-endpoint wake feature.
672-704: Wake-failure fallback behavior is tested clearly
TestProcess_WakeFailureFallsBackToStartcorrectly asserts that a failing wake endpoint produces an error mentioning “wake” and leaves the process inStateStopped, matching the fallback-to-stop behavior inwake(). This gives good coverage for the error path.proxy/config/config.go (4)
269-302: Macro expansion into sleep/wake endpoints is consistent with existing fieldsExtending the LIFO macro substitution pass into
SleepEndpointsandWakeEndpoints(for bothEndpointandBody) mirrors the behavior forCmd,Proxy, andFilters.StripParams. This makes endpoint configuration fully macro-aware, includingMODEL_IDand other custom macros, and is aligned with the existing substitution rules.
304-344: PORT macro substitution into sleep/wake endpoints matches core behaviorSubstituting
${PORT}intoSleepEndpointsandWakeEndpointsin the same pass that handlesCmd,CmdStop, andProxyensures endpoint URLs and bodies stay in sync with the allocated port. This avoids subtle mismatches between inference endpoints and sleep/wake URLs when macros are used.
400-411: Defaulting per-endpoint timeouts from global config works as intendedThe loops that fill
SleepEndpoints[i].TimeoutandWakeEndpoints[i].Timeoutwhen they are0correctly implement “inherit from global timeout unless explicitly overridden,” which is exactly what the new tests assert. This centralizes timeout behavior and keeps endpoint YAML concise.
615-630: validateEndpointMacros silently skips empty fields; behavior seems fine
validateEndpointMacroschecks bothEndpointandBodystrings per entry and ignores them when empty, which is appropriate for optional bodies. The function also correctly distinguishes between reserved macros not being substituted and unknown macros. The current behavior is sound; no changes needed.proxy/config/config_test.go (6)
765-808: Basic sleep/wake config test maps well to new global defaults
TestConfig_SleepWakeBasicConfigurationvalidates that globalsleepRequestTimeoutandwakeRequestTimeoutare read correctly and applied to per-endpoint timeouts when not overridden. This directly tests the defaulting logic inLoadConfigFromReaderand is a good sanity check.
805-836: Macro substitution test gives strong coverage for endpoints and bodies
TestConfig_SleepWakeMacroSubstitutionconfirms that both global (SLEEP_LEVEL) and implicit (MODEL_ID,PORT) macros are correctly expanded intosleepEndpointsandwakeEndpoints, including within JSON bodies. This tightly couples the test to the intended substitution semantics and should catch regressions in macro handling for endpoints.
838-871: Timeout override test correctly captures inheritance vs override behavior
TestConfig_SleepWakeTimeoutOverridesclearly asserts that explicittimeoutvalues on endpoints override global settings while endpoints withouttimeoutinherit fromwakeRequestTimeout. This matches the logic in config.go and is a good behavioral spec.
925-945: Unknown macro test in endpoints correctly exercises new validation
TestConfig_SleepWakeUnknownMacroInEndpointsensures that leftover${UNKNOWN_MACRO}references insleepEndpointsproduce a configuration error mentioning the macro name, model ID, and field (sleepEndpoints). This directly verifiesvalidateEndpointMacrosand should prevent misconfigured sleep/wake URLs from slipping through.
947-970: Default timeouts test confirms config.go’s 10s default behavior
TestConfig_SleepWakeDefaultTimeoutsverifies that when no global or per-endpoint timeouts are specified, both sleep and wake endpoints receive a default of 10 seconds. This matches the defaults inLoadConfigFromReaderand strengthens confidence in the implicit behavior.
972-1008: Model-level macro precedence test nicely mirrors existing macro behavior
TestConfig_SleepWakeModelLevelMacrosconfirms that model-level macros override global macros when substituting intosleepEndpoints, just like they do for other fields. This is an important coverage point to ensure consistency across all macro-using config fields.proxy/process.go (10)
27-43: New sleep/wake states and httpDialTimeout introduce clear lifecycle phasesAdding
StateSleepPending,StateAsleep, andStateWakingplus a dedicatedhttpDialTimeoutconstant cleanly models the sleep/wake lifecycle and decouples TCP connection timeout from per-endpoint request timeouts. The chosen 500ms dial timeout is reasonable for local inference servers.
80-88: Additional WaitGroups for Sleep/Wake operations follow the existing start patternIntroducing
waitSleepingandwaitWakingmirrorswaitStartingand enables deduplication of concurrentSleep()/wake()calls. This structure is appropriate for coordinating concurrent callers around transitional states.
170-202: WaitGroup increments inside swapState are subtle but generally soundIncrementing
waitStarting,waitSleeping, andwaitWakinginsideswapStatewhen entering transitional states ensures the counter is incremented before other goroutines can observe the new state and callWait(), which is key to avoiding races in the “already starting/sleeping/waking” fast paths. Reuse of these WaitGroups across cycles follows the same pattern as existing start logic and appears correct.
205-225: isValidTransition correctly encodes new sleep/wake transitionsThe updated
isValidTransitionallows:
- Ready→SleepPending,
- SleepPending→Asleep/Stopping/Stopped,
- Asleep→Waking/Stopping,
- Waking→Ready/Stopping/Stopped,
while still enforcing no transitions fromStateShutdown. This matches the intended lifecycle and is consistent with the new tests inTestProcess_SwapState.
394-424: TTL monitoring correctly accounts for new sleep/wake states
startUnloadMonitoringnow treatsStateReady,StateSleepPending,StateAsleep, andStateWakingas valid “alive” states and continues TTL checks across sleep/wake transitions, while still skipping when there are in-flight requests. This keeps the “UnloadAfter” behavior consistent even when models are put to sleep.
469-511: sendSleepRequests/sendWakeRequests sequencing and logging are straightforwardBoth helpers iterate through configured endpoints in order, log each step with method/URL/timeout, and abort on the first failure, which is then surfaced back to
Sleep()/wake()and ultimately to callers. This strongly matches the intended “multi-step vLLM wake sequence” semantics.
518-571: Sleep implementation is mostly solid but relies on Ready→SleepPending swap
Sleep():
- Verifies sleep is configured,
- Handles concurrent callers by waiting on
waitSleepingwhen alreadyStateSleepPending,- Waits for in-flight requests, and
- Transitions
Ready→SleepPending→Asleep, falling back toStopImmediately()on HTTP failure or invalid state transition.The only nuance is that it always calls
swapState(StateReady, StateSleepPending), so a concurrent transition out ofReady(e.g., toStopping) will cause a mismatch and early return with a warning, which is acceptable but worth keeping in mind when orchestrating higher-level flows.
573-610: wake() concurrency handling pairs well with waitWaking
wake()usesswapState(StateAsleep, StateWaking)to initiate the transition and, on expected-state mismatches, will:
- Block on
waitWakingif another wake is already in progress, returning success if the final state isStateReady, or- Error out if the process is in any other state.
On success, it transitions
Waking→Readyand logs duration; on failure of HTTP steps it falls back toStopImmediately()and returns an error. Combined with the recommendedmakeReadyfix, this provides a robust, deduplicated wake path.
699-710: checkHealthEndpoint reuses the generic HTTP helper cleanlyWrapping health checks as a synthetic
HTTPEndpointwith a fixed 5-second timeout and feeding it throughsendHTTPRequestkeeps the HTTP logic centralized and enforces the same connection/timeout behavior as sleep/wake endpoints. This is a nice simplification compared to hand-rolling separate health-check code.
722-777: ProxyRequest now blocks SleepPending and correctly delegates readiness to makeReadyThe updated
ProxyRequest:
- Rejects requests when the process is
Shutdown,Stopping, orSleepPending,- Applies concurrency limits and in-flight tracking, and
- Uses
makeReady()(with SSE loading streams when enabled) whenever the state isn’tReady.This is a good fit for the new sleep/wake model and keeps the “start vs wake” decision fully encapsulated inside
makeReady().
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
proxy/process.go (1)
653-697: Refactor to improve HTTP client efficiency and handle response bodies properlyTwo improvements recommended:
Response body handling: The response body is never read (only closed at line 690), which can prevent HTTP/1.1 connection reuse. Reading and discarding the body is best practice.
Client creation overhead: Creating a new
http.Clientfor each endpoint call (line 663) prevents connection pooling. While acceptable for infrequent sleep/wake operations, consider reusing a client or using a package-level client for better performance in multi-step sleep/wake sequences.Apply this diff to properly handle response bodies:
resp, err := client.Do(req) if err != nil { return err } defer resp.Body.Close() + + // Read and discard body to enable connection reuse + _, _ = io.Copy(io.Discard, resp.Body) if resp.StatusCode != http.StatusOK { return fmt.Errorf("status code: %d", resp.StatusCode) }For the client reuse, consider creating a reusable client in
NewProcess()and storing it in theProcessstruct, since the learnings indicate preferring simplicity for health checks. However, for sleep/wake operations that may involve multiple sequential calls, connection reuse could provide better performance.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
proxy/config/config.go(8 hunks)proxy/process.go(15 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- proxy/config/config.go
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-29T05:03:26.638Z
Learnt from: mostlygeek
Repo: mostlygeek/llama-swap PR: 278
File: proxy/process.go:368-379
Timestamp: 2025-08-29T05:03:26.638Z
Learning: For health check implementations in the llama-swap codebase, prefer simple and explicit HTTP client configurations over performance optimizations like preserving DefaultTransport settings, as health checks only need to verify HTTP 200 responses and clarity is more important than HTTP/2 support or connection pooling.
Applied to files:
proxy/process.go
📚 Learning: 2025-10-29T05:26:34.964Z
Learnt from: mostlygeek
Repo: mostlygeek/llama-swap PR: 371
File: proxy/process.go:0-0
Timestamp: 2025-10-29T05:26:34.964Z
Learning: In proxy/process.go, the loading message "llama-swap loading model: {name}" intentionally uses p.ID (Process.ID) rather than the realModelName from the request context. This is the correct design choice.
Applied to files:
proxy/process.go
🧬 Code graph analysis (1)
proxy/process.go (1)
proxy/config/model_config.go (1)
HTTPEndpoint(12-17)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: run-tests
🔇 Additional comments (4)
proxy/process.go (4)
469-511: LGTM! Sequential endpoint execution is well-implementedThe sleep and wake request handlers correctly execute endpoints in sequence with clear logging and error reporting. The step-by-step approach (lines 477-484, 499-506) ensures multi-step sleep/wake operations complete in order, which is essential for operations like level-2 sleep that require multiple API calls.
518-571: LGTM! Sleep() correctly handles concurrent calls and state transitionsThe implementation properly:
- Blocks concurrent
Sleep()calls usingwaitSleeping(lines 528-537)- Waits for in-flight requests before sleeping (line 546)
- Falls back to
StopImmediately()if sleep requests fail (line 559)- Transitions through
StateSleepPending→StateAsleepwith proper error handling
573-610: LGTM! wake() implementation follows established patternsThe method correctly handles:
- Concurrent
wake()calls by waiting onwaitWaking(lines 580-585)- State transition from
StateAsleep→StateWaking→StateReady- Error recovery by falling back to
StopImmediately()(line 600)The implementation pattern matches
start()method's concurrent-call handling (lines 285-291), ensuring consistency.
634-651: LGTM! Clean refactoring of health check logicThe refactoring successfully:
- Extracts URL building logic into
buildFullURL()using properurl.ParseandResolveReference- Unifies HTTP request handling through
sendHTTPRequest()by creating anHTTPEndpointconfig (lines 702-707)- Sets a reasonable 5-second timeout for health checks (line 705)
This abstraction eliminates code duplication and makes the health check logic consistent with sleep/wake endpoint handling.
Also applies to: 699-710
|
If I understand vllm sleep correctly, it would require the vllm process continue running. API calls tell it to sleep the model using one of two stages. Most people use llama-swap with llama-server. When llama-server also has support for the sleep api/functionality that’s when it’s time to consider a feature like this. |
Hey @mostlygeek, What would it take to get this merged? Are there specific concerns I can address, or changes you'd like to see? |
|
I’m not going to merge this until llama-swap for now. Please maintain it in your fork for now until/if the sleep api becomes more widely supported. |
Add Sleep/Wake Support for Fast Model Switching
Summary
Adds sleep/wake functionality for fast model switching. Instead of killing and restarting processes, models can be put to sleep and woken up on demand, dramatically reducing swap times.
Designed for vLLM's sleep mode (blog post), but will work with any future inference servers that implement HTTP-based sleep/wake endpoints.
What Changed
State Machine:
StateSleepPending,StateAsleep,StateWakingConfiguration:
Endpoints are called sequentially. Per-endpoint timeouts override global defaults.
Features:
sleepEndpoints/wakeEndpointsconfigured)POST /models/sleep/:model_idSummary by CodeRabbit
New Features
Documentation
Bug Fixes
Tests