Conversation
Fixes issues on Windows showing new windows for every process llama-swap spawns.
Change the user back to root for containers. Additionally, built a "non-root" labeled container for users who wish to have the additional security of running llama-swap as a lower privileged user.
…ap repositories (mostlygeek#396) * feat: Add support for custom llama.cpp base image and forked llama-swap repositories - Introduce BASE_LLAMACPP_IMAGE env var to customize llama.cpp base image - Introduce LS_REPO env var to customize llama-swap source - Use GITHUB_REPOSITORY env var to automatically detect forked repos - Update container tagging to use dynamic repo paths - Pass build args for BASE_IMAGE and LS_REPO to Containerfile - Enable flexible release downloads from forked repositories * chore: quote entire curl options, appease coderabbitai
* proxy: add support for anthropic v1/messages api * proxy: restrict loading message to /v1/chat/completions
…nsiderations (mostlygeek#416) * docs: add documentation for non-root container images and security considerations * docs: move container security section to dedicated file and update README links
- add supported anthropic API - add example for docker hot reload support
Make it so llama-server can be called directly instead of with the full path at /app/llama-server. Fixes mostlygeek#423 Ref: mostlygeek#233
WalkthroughThis pull request updates documentation and infrastructure to support Anthropic API endpoints alongside OpenAI-compatible servers, refactors Docker builds to generate both root and non-root container variants, improves metrics parsing for v1/messages responses, adds cross-platform process attribute handling, and consolidates request routing to a unified inference handler across multiple endpoints. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes
Possibly related PRs
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
proxy/metrics_monitor.go (1)
230-240: Potential data loss when timings has incomplete token counts.When
timingsexists, lines 231-232 unconditionally overwriteinputTokensandoutputTokensfrom theusageblock. Iftimingsis present butprompt_norpredicted_nare missing or zero, the valid token counts fromusagewill be replaced with zeros.Consider guarding the overwrites:
// use llama-server's timing data for tok/sec and duration as it is more accurate if timings.Exists() { - inputTokens = int(timings.Get("prompt_n").Int()) - outputTokens = int(timings.Get("predicted_n").Int()) + if pn := timings.Get("prompt_n"); pn.Exists() { + inputTokens = int(pn.Int()) + } + if pdn := timings.Get("predicted_n"); pdn.Exists() { + outputTokens = int(pdn.Int()) + } promptPerSecond = timings.Get("prompt_per_second").Float() tokensPerSecond = timings.Get("predicted_per_second").Float() durationMs = int(timings.Get("prompt_ms").Float() + timings.Get("predicted_ms").Float())
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (3)
docs/assets/hero1.jpgis excluded by!**/*.jpggo.sumis excluded by!**/*.sumui/package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (10)
README.md(4 hunks)docker/build-container.sh(2 hunks)docker/llmsnap.Containerfile(2 hunks)docs/container-security.md(1 hunks)go.mod(2 hunks)proxy/metrics_monitor.go(3 hunks)proxy/process.go(3 hunks)proxy/process_unix.go(1 hunks)proxy/process_windows.go(1 hunks)proxy/proxymanager.go(2 hunks)
🧰 Additional context used
🪛 LanguageTool
README.md
[grammar] ~16-~16: Use a hyphen to join words.
Context: ...model switching - ✅ Use any local OpenAI compatible server (llama.cpp, vllm, tabb...
(QB_NEW_EN_HYPHEN)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: run-tests
- GitHub Check: run-tests
🔇 Additional comments (21)
proxy/metrics_monitor.go (3)
125-136: LGTM!The extraction of
usageandtimingsfrom the parsed JSON with short-circuit logic when neither exists is clean and appropriate. Error handling correctly logs warnings without affecting client responses.
184-190: LGTM!The streaming response parsing correctly mirrors the non-streaming pattern, extracting
usageandtimingsand continuing the backward iteration if neither is present.
197-227: LGTM!The refactored function signature cleanly separates usage and timings parsing. The dual-format support for OpenAI-style (
prompt_tokens/completion_tokens) and v1/messages-style (input_tokens/output_tokens) fields is well-implemented with proper fallback logic.proxy/process_unix.go (1)
1-12: LGTM!Clean cross-platform abstraction. The no-op Unix implementation correctly pairs with the Windows variant that sets
HideWindowandCREATE_NO_WINDOWflags.proxy/proxymanager.go (2)
239-261: LGTM! Good consolidation of inference endpoints.The unified
proxyInferenceHandlerappropriately handles both OpenAI-compatible and Anthropic/v1/messagesendpoints since both use themodelfield in the JSON body for routing.
550-626: Handler implementation is sound.The renamed
proxyInferenceHandlercorrectly extracts themodelfield from JSON bodies, which is compatible with both OpenAI (/v1/chat/completions) and Anthropic (/v1/messages) request formats.docker/build-container.sh (2)
56-79: Both container variants share the same:${ARCH}latest tag for root.The loop builds non-root first, then root. Since both use
CONTAINER_LATEST="ghcr.io/${LS_REPO}:${ARCH}"as the base (with non-root appending-non-root), the final:${ARCH}tag will point to the root variant. This is likely intentional for backward compatibility, but worth confirming.If the intent is that
:${ARCH}should remain the root image and:${ARCH}-non-rootfor non-root, the current logic is correct.
23-33: Good use of environment variables for fork customization.The
BASE_LLAMACPP_IMAGEandGITHUB_REPOSITORYoverrides enable easy testing with forked repositories while maintaining sensible defaults.proxy/process_windows.go (1)
10-16: LGTM!The Windows-specific implementation correctly sets
HideWindowandCREATE_NO_WINDOWto prevent console windows from appearing when spawning backend processes. The magic number0x08000000is clearly documented.proxy/process.go (2)
872-872: LGTM!Consistent application of platform-specific process attributes to the stop command, matching the approach used for the start command at Line 304.
304-304: Platform-specific implementations exist and are correctly structured.The
setProcAttributes(p.cmd)call is properly implemented across platforms:
proxy/process_unix.go: No-op implementation for Unix systemsproxy/process_windows.go: SetsHideWindowattribute viaSysProcAttrThe cross-platform pattern using Go's filename convention (
_unix.go,_windows.go) is correctly applied.docker/llmsnap.Containerfile (4)
1-3: LGTM!Parameterizing the base image and tag provides flexibility for building different container variants (root, non-root, different platforms) as described in the PR objectives.
6-7: LGTM!The switch to semantic versioning (0.0.1) and parameterized repository make the build more flexible and align with the container variant improvements in this PR.
33-34: LGTM!Adding
/appto PATH allows thellmsnapbinary to be invoked directly without specifying the full path, improving usability.
37-39: LGTM!Using variables for the download URL and filenames makes the build more maintainable and aligns with the parameterization approach throughout the Containerfile.
docs/container-security.md (1)
1-9: LGTM!This documentation provides clear, balanced guidance on container security considerations. It appropriately explains the trade-offs between root and non-root containers, offers practical configuration options, and includes helpful references to external documentation.
README.md (5)
24-25: LGTM!Clear documentation of the new Anthropic v1/messages endpoint support. Note that there's a separate comment on
proxy/process.goregarding streaming loading state support for this endpoint.
67-67: LGTM!Good documentation of non-root container variants with appropriate link to the security documentation, helping users make informed choices.
78-84: LGTM!Excellent addition demonstrating the configuration hot-reload feature with practical Docker volume mounting examples.
101-101: LGTM!Image tag updated consistently with the version changes in the Containerfile.
103-104: LGTM!Clear example of the non-root CUDA variant, making it easy for security-conscious users to adopt the safer container image.
Summary by CodeRabbit
New Features
Documentation
Bug Fixes
Chores
✏️ Tip: You can customize this high-level summary in your review settings.