Add lifecycle hooks for model health checks and shutdown#595
Add lifecycle hooks for model health checks and shutdown#595chand1012 wants to merge 7 commits intomostlygeek:mainfrom
Conversation
Add two new lifecycle hooks to ModelConfig that run at specific points in each model's lifecycle: - afterHealthy: one-shot command that runs after the health check passes, before the process transitions to StateReady. Blocks the model from accepting requests until it completes. Failure is logged as a warning but does not prevent startup. - beforeStop: command that runs right before the upstream process is killed. Blocks shutdown until it completes (or fails), but the process is killed regardless of the outcome. Both hooks inherit the upstream process environment and log output through the process logger. Primary use case is loading/saving llama.cpp prompt cache slots. - proxy/config/model_config.go: add AfterHealthy and BeforeStop fields - proxy/process.go: add runHookCommand helper; wire hooks into start() and stopCommand() - proxy/process_test.go: add TestProcess_AfterHealthyHook and TestProcess_BeforeStopHook - config.example.yaml: document both new fields Co-authored-by: chand1012 <3521582+chand1012@users.noreply.github.com>
Replace llama-cli command examples with curl calls to the llama.cpp slots API, which is the correct way to save/restore prompt cache state via the server's HTTP interface. Co-authored-by: chand1012 <3521582+chand1012@users.noreply.github.com>
- move afterHealthy hook to run after health check instead of after state ready
- expand ${PORT} and other macros in afterHealthy and beforeStop fields
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
WalkthroughAdds two optional model lifecycle hooks— Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
proxy/process.go (1)
671-688: Consider adding a timeout for hook command execution.The hook command runs without a timeout. If a hook hangs (e.g., network issue with a
curlcall), it will block startup or shutdown indefinitely. Consider usingexec.CommandContextwith a configurable or reasonable default timeout.♻️ Suggested approach using context with timeout
-func (p *Process) runHookCommand(hookCmd string) error { +func (p *Process) runHookCommand(hookCmd string) error { + // Use a timeout to prevent hooks from blocking indefinitely + ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) + defer cancel() + args, err := config.SanitizeCommand(hookCmd) if err != nil { return fmt.Errorf("failed to sanitize hook command %q: %v", hookCmd, err) } - cmd := exec.Command(args[0], args[1:]...) + cmd := exec.CommandContext(ctx, args[0], args[1:]...) cmd.Stdout = p.processLogger cmd.Stderr = p.processLogger if p.cmd != nil { cmd.Env = p.cmd.Env } setProcAttributes(cmd) return cmd.Run() }Alternatively, this could be a configurable per-hook timeout if flexibility is needed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@proxy/process.go` around lines 671 - 688, The runHookCommand function currently uses exec.Command and can hang indefinitely; update it to use a context with timeout (e.g., context.WithTimeout) and exec.CommandContext to ensure hook commands are killed after a reasonable or configurable deadline. Modify runHookCommand (and its use of config.SanitizeCommand, exec.Command, cmd.Env, setProcAttributes, and cmd.Run) to create a context with a default or configurable timeout, pass that context into exec.CommandContext, and ensure the context is canceled/cleaned up before returning so processes and resources are released.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@proxy/process.go`:
- Around line 671-688: The runHookCommand function currently uses exec.Command
and can hang indefinitely; update it to use a context with timeout (e.g.,
context.WithTimeout) and exec.CommandContext to ensure hook commands are killed
after a reasonable or configurable deadline. Modify runHookCommand (and its use
of config.SanitizeCommand, exec.Command, cmd.Env, setProcAttributes, and
cmd.Run) to create a context with a default or configurable timeout, pass that
context into exec.CommandContext, and ensure the context is canceled/cleaned up
before returning so processes and resources are released.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 399a0177-e099-41f2-b02f-4918633ed21e
📒 Files selected for processing (5)
config.example.yamlproxy/config/config.goproxy/config/model_config.goproxy/process.goproxy/process_test.go
|
This would be great. The use case I'm exploring is saving the KV cache on shutdown and restoring it on startup as described at ggml-org/llama.cpp#13606 which should make model switching a lot faster. |
|
@candrews that's my use case as well, currently using it on my inference server and its working great! |
Can you share more of how you set up that up? And it's it working for you? I'd rather start with you already have rather than reinvent the wheel :) |
Sure! Unfortunately it doesn't work for multimodal models (yet), but here's a snippet of my config. models:
qwopus-35b-moe:
afterHealthy: "curl -s -X POST \"http://localhost:${PORT}/slots/1?action=restore\" -H \"Content-Type: application/json\" -d '{\"filename\": \"qwopus-35b-moe.bin\"}'"
beforeStop: "curl -s -X POST \"http://localhost:${PORT}/slots/1?action=save\" -H \"Content-Type: application/json\" -d '{\"filename\": \"qwopus-35b-moe.bin\"}'"
cmd: |
${build_dir}/bin/llama-server
--model ${models_dir}/qwen3.5/Qwopus-35B-A3B.Q4_K_M.gguf
--chat-template-file ${models_dir}/qwen3.5/qwen3.5_chat_template.jinja
--slot-save-path ${cache_dir}
--host 127.0.0.1
--port ${PORT}
--n-gpu-layers -1
--ctx-size ${128k_context}
--cache-type-k q4_0
--cache-type-v q4_0
--flash-attn on
--ubatch-size ${ubatch_size}
--batch-size 2048
--jinja
--no-context-shift
--keep -1
--temp 0.7
--top-p 0.95
--top-k 20
--min-p 0.0
--presence-penalty 0
--repeat-penalty 1.0
aliases:
- qwopus-moe
ttl: 0
omnicoder:
afterHealthy: "curl -s -X POST \"http://localhost:${PORT}/slots/1?action=restore\" -H \"Content-Type: application/json\" -d '{\"filename\": \"omnicoder.bin\"}'"
beforeStop: "curl -s -X POST \"http://localhost:${PORT}/slots/1?action=save\" -H \"Content-Type: application/json\" -d '{\"filename\": \"omnicoder.bin\"}'"
cmd: |
${build_dir}/bin/llama-server
--model ${models_dir}/omnicoder/omnicoder-9b-q5_k_m.gguf
--slot-save-path ${cache_dir}
--host 127.0.0.1
--port ${PORT}
--n-gpu-layers -1
--ctx-size ${256k_context}
--cache-type-k q4_0
--cache-type-v q4_0
--flash-attn on
--ubatch-size ${ubatch_size}
--batch-size 2048
--jinja
--no-context-shift
--keep -1
--temp 0.6
--top-p 0.95
--top-k 20
--min-p 0.0
--presence-penalty 0
--repeat-penalty 1.0
ttl: 0 |
|
I thinking using a
You can turn off multimodal with ggml-org/llama.cpp#19466 is the issue to follow |
Adds two new lifecycle hooks to ModelConfig that run at specific points in each model's lifecycle:
afterHealthy: Runs once after the model passes its health check, before transitioning to StateReady. Blocks model from accepting requests until completion. Failure is logged as a warning but doesn't prevent startup.beforeStop: Runs right before the upstream process is killed. Blocks shutdown until completion (or failure), but the process is killed regardless of outcome.Both hooks inherit the upstream process environment and log output through the process logger.
Key Changes
proxy/config/model_config.go: Added AfterHealthy and BeforeStop fieldsproxy/process.go: Added runHookCommand helper; wired hooks into start() and stopCommand()proxy/process_test.go: Added TestProcess_AfterHealthyHook and TestProcess_BeforeStopHookconfig.example.yaml: Documented both new fields with curl examples for llama.cpp slots API