Add lifecycle hooks for model health checks and shutdown by chand1012 · Pull Request #595 · mostlygeek/llama-swap

chand1012 · 2026-03-20T15:32:20Z

Adds two new lifecycle hooks to ModelConfig that run at specific points in each model's lifecycle:

afterHealthy: Runs once after the model passes its health check, before transitioning to StateReady. Blocks model from accepting requests until completion. Failure is logged as a warning but doesn't prevent startup.
beforeStop: Runs right before the upstream process is killed. Blocks shutdown until completion (or failure), but the process is killed regardless of outcome.

Both hooks inherit the upstream process environment and log output through the process logger.

Key Changes

proxy/config/model_config.go: Added AfterHealthy and BeforeStop fields
proxy/process.go: Added runHookCommand helper; wired hooks into start() and stopCommand()
proxy/process_test.go: Added TestProcess_AfterHealthyHook and TestProcess_BeforeStopHook
config.example.yaml: Documented both new fields with curl examples for llama.cpp slots API

Add two new lifecycle hooks to ModelConfig that run at specific points in each model's lifecycle: - afterHealthy: one-shot command that runs after the health check passes, before the process transitions to StateReady. Blocks the model from accepting requests until it completes. Failure is logged as a warning but does not prevent startup. - beforeStop: command that runs right before the upstream process is killed. Blocks shutdown until it completes (or fails), but the process is killed regardless of the outcome. Both hooks inherit the upstream process environment and log output through the process logger. Primary use case is loading/saving llama.cpp prompt cache slots. - proxy/config/model_config.go: add AfterHealthy and BeforeStop fields - proxy/process.go: add runHookCommand helper; wire hooks into start() and stopCommand() - proxy/process_test.go: add TestProcess_AfterHealthyHook and TestProcess_BeforeStopHook - config.example.yaml: document both new fields Co-authored-by: chand1012 <3521582+chand1012@users.noreply.github.com>

Replace llama-cli command examples with curl calls to the llama.cpp slots API, which is the correct way to save/restore prompt cache state via the server's HTTP interface. Co-authored-by: chand1012 <3521582+chand1012@users.noreply.github.com>

- move afterHealthy hook to run after health check instead of after state ready - expand ${PORT} and other macros in afterHealthy and beforeStop fields

coderabbitai · 2026-03-20T15:32:34Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 04cf9fd6-187c-487f-8719-190866f4fad3

📥 Commits

Reviewing files that changed from the base of the PR and between 6ca9637 and b82e763.

📒 Files selected for processing (1)

config.example.yaml

✅ Files skipped from review due to trivial changes (1)

config.example.yaml

Walkthrough

Adds two optional model lifecycle hooks—afterHealthy (one-shot after health check) and beforeStop (runs before termination). Changes include config example docs, two new ModelConfig fields, macro/${PORT} substitution/validation updates, runtime hook execution with timeout, and tests verifying behavior.

Changes

Cohort / File(s)	Summary
Configuration example `config.example.yaml`	Added commented/example entries documenting `afterHealthy` and `beforeStop` for the `models.docker-llama` example.
Model config & loading `proxy/config/model_config.go`, `proxy/config/config.go`	Added `afterHealthy` and `beforeStop` fields to `ModelConfig`; extended macro substitution, `${PORT}` detection/substitution, and unknown-macro validation to include these fields.
Process lifecycle `proxy/process.go`	Added `runHookCommand()` helper (30s timeout, sanitization, env inheritance, logging, proc attrs); invoke `afterHealthy` after health check and `beforeStop` during stop; failures logged as warnings.
Tests `proxy/process_test.go`	Added `TestProcess_AfterHealthyHook` and `TestProcess_BeforeStopHook` (skip on Windows) to assert hooks run at expected times.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Refactor all default config values into config.go #162: Modifies proxy/config loading and process start/stop behavior; closely related to macro and lifecycle changes.
Add macros to Configuration schema #149: Introduced core macro substitution and ${PORT} handling that this change extends to new hook fields.
Add stopCmd to model configuration #136: Prior work touching process stop lifecycle and custom stop-command logic relevant to beforeStop.

Suggested labels

enhancement, configuration

Suggested reviewers

mostlygeek

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: adding lifecycle hooks for model health checks and shutdown, which aligns with the core functionality introduced across all modified files.
Description check	✅ Passed	The description is directly related to the changeset, detailing the two new lifecycle hooks (afterHealthy and beforeStop), their behavior, and affected files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

proxy/process.go (1)

671-688: Consider adding a timeout for hook command execution.

The hook command runs without a timeout. If a hook hangs (e.g., network issue with a curl call), it will block startup or shutdown indefinitely. Consider using exec.CommandContext with a configurable or reasonable default timeout.

♻️ Suggested approach using context with timeout

-func (p *Process) runHookCommand(hookCmd string) error {
+func (p *Process) runHookCommand(hookCmd string) error {
+	// Use a timeout to prevent hooks from blocking indefinitely
+	ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
+	defer cancel()
+
 	args, err := config.SanitizeCommand(hookCmd)
 	if err != nil {
 		return fmt.Errorf("failed to sanitize hook command %q: %v", hookCmd, err)
 	}
 
-	cmd := exec.Command(args[0], args[1:]...)
+	cmd := exec.CommandContext(ctx, args[0], args[1:]...)
 	cmd.Stdout = p.processLogger
 	cmd.Stderr = p.processLogger
 	if p.cmd != nil {
 		cmd.Env = p.cmd.Env
 	}
 	setProcAttributes(cmd)
 
 	return cmd.Run()
 }

Alternatively, this could be a configurable per-hook timeout if flexibility is needed.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@proxy/process.go` around lines 671 - 688, The runHookCommand function
currently uses exec.Command and can hang indefinitely; update it to use a
context with timeout (e.g., context.WithTimeout) and exec.CommandContext to
ensure hook commands are killed after a reasonable or configurable deadline.
Modify runHookCommand (and its use of config.SanitizeCommand, exec.Command,
cmd.Env, setProcAttributes, and cmd.Run) to create a context with a default or
configurable timeout, pass that context into exec.CommandContext, and ensure the
context is canceled/cleaned up before returning so processes and resources are
released.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@proxy/process.go`:
- Around line 671-688: The runHookCommand function currently uses exec.Command
and can hang indefinitely; update it to use a context with timeout (e.g.,
context.WithTimeout) and exec.CommandContext to ensure hook commands are killed
after a reasonable or configurable deadline. Modify runHookCommand (and its use
of config.SanitizeCommand, exec.Command, cmd.Env, setProcAttributes, and
cmd.Run) to create a context with a default or configurable timeout, pass that
context into exec.CommandContext, and ensure the context is canceled/cleaned up
before returning so processes and resources are released.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 399a0177-e099-41f2-b02f-4918633ed21e

📥 Commits

Reviewing files that changed from the base of the PR and between a3725e7 and d7f8e52.

📒 Files selected for processing (5)

config.example.yaml
proxy/config/config.go
proxy/config/model_config.go
proxy/process.go
proxy/process_test.go

candrews · 2026-03-23T17:41:53Z

This would be great.

The use case I'm exploring is saving the KV cache on shutdown and restoring it on startup as described at ggml-org/llama.cpp#13606 which should make model switching a lot faster.

chand1012 · 2026-03-23T17:47:32Z

@candrews that's my use case as well, currently using it on my inference server and its working great!

candrews · 2026-03-23T17:55:57Z

@candrews that's my use case as well, currently using it on my inference server and its working great!

Can you share more of how you set up that up? And it's it working for you?

I'd rather start with you already have rather than reinvent the wheel :)

chand1012 · 2026-03-23T21:21:28Z

@candrews that's my use case as well, currently using it on my inference server and its working great!

Can you share more of how you set up that up? And it's it working for you?

I'd rather start with you already have rather than reinvent the wheel :)

Sure! Unfortunately it doesn't work for multimodal models (yet), but here's a snippet of my config.

models:
  qwopus-35b-moe:
    afterHealthy: "curl -s -X POST \"http://localhost:${PORT}/slots/1?action=restore\" -H \"Content-Type: application/json\" -d '{\"filename\": \"qwopus-35b-moe.bin\"}'"
    beforeStop: "curl -s -X POST \"http://localhost:${PORT}/slots/1?action=save\" -H \"Content-Type: application/json\" -d '{\"filename\": \"qwopus-35b-moe.bin\"}'"
    cmd: |
      ${build_dir}/bin/llama-server 
      --model ${models_dir}/qwen3.5/Qwopus-35B-A3B.Q4_K_M.gguf
      --chat-template-file ${models_dir}/qwen3.5/qwen3.5_chat_template.jinja
      --slot-save-path ${cache_dir}
      --host 127.0.0.1
      --port ${PORT}
      --n-gpu-layers -1
      --ctx-size ${128k_context}
      --cache-type-k q4_0
      --cache-type-v q4_0
      --flash-attn on
      --ubatch-size ${ubatch_size}
      --batch-size 2048
      --jinja
      --no-context-shift
      --keep -1
      --temp 0.7
      --top-p 0.95
      --top-k 20
      --min-p 0.0
      --presence-penalty 0
      --repeat-penalty 1.0
    aliases:
      - qwopus-moe
    ttl: 0

  omnicoder:
    afterHealthy: "curl -s -X POST \"http://localhost:${PORT}/slots/1?action=restore\" -H \"Content-Type: application/json\" -d '{\"filename\": \"omnicoder.bin\"}'"
    beforeStop: "curl -s -X POST \"http://localhost:${PORT}/slots/1?action=save\" -H \"Content-Type: application/json\" -d '{\"filename\": \"omnicoder.bin\"}'"
    cmd: |
      ${build_dir}/bin/llama-server 
      --model ${models_dir}/omnicoder/omnicoder-9b-q5_k_m.gguf
      --slot-save-path ${cache_dir}
      --host 127.0.0.1
      --port ${PORT}
      --n-gpu-layers -1
      --ctx-size ${256k_context}
      --cache-type-k q4_0
      --cache-type-v q4_0
      --flash-attn on
      --ubatch-size ${ubatch_size}
      --batch-size 2048
      --jinja
      --no-context-shift
      --keep -1
      --temp 0.6
      --top-p 0.95
      --top-k 20
      --min-p 0.0
      --presence-penalty 0
      --repeat-penalty 1.0
    ttl: 0

candrews · 2026-03-23T22:07:44Z

I thinking using a for loop to save/restore all slots would be even better (using a macro to define the number of slots).

it doesn't work for multimodal models (yet)

You can turn off multimodal with --no-mmproj and then it works great.

ggml-org/llama.cpp#19466 is the issue to follow

Copilot AI and others added 5 commits March 19, 2026 21:47

Initial plan

1264d0e

proxy: run afterHealthy hook after health check passes

c3d404b

- move afterHealthy hook to run after health check instead of after state ready - expand ${PORT} and other macros in afterHealthy and beforeStop fields

config: update curl examples to use query params

d7f8e52

coderabbitai bot reviewed Mar 20, 2026

View reviewed changes

chand1012 added 2 commits March 21, 2026 13:14

Add 30 second timeout for hooks

6ca9637

Fix examples

b82e763

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add lifecycle hooks for model health checks and shutdown#595

Add lifecycle hooks for model health checks and shutdown#595
chand1012 wants to merge 7 commits intomostlygeek:mainfrom
chand1012:feat/swap-hooks

chand1012 commented Mar 20, 2026

Uh oh!

coderabbitai bot commented Mar 20, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

candrews commented Mar 23, 2026

Uh oh!

chand1012 commented Mar 23, 2026

Uh oh!

candrews commented Mar 23, 2026

Uh oh!

chand1012 commented Mar 23, 2026

Uh oh!

candrews commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

chand1012 commented Mar 20, 2026

Uh oh!

coderabbitai bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

candrews commented Mar 23, 2026

Uh oh!

chand1012 commented Mar 23, 2026

Uh oh!

candrews commented Mar 23, 2026

Uh oh!

chand1012 commented Mar 23, 2026

Uh oh!

candrews commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai bot commented Mar 20, 2026 •

edited

Loading