Skip to content

Sync with llama-swap#7

Merged
napmany merged 12 commits intomainfrom
sync
Dec 11, 2025
Merged

Sync with llama-swap#7
napmany merged 12 commits intomainfrom
sync

Conversation

@napmany
Copy link
Owner

@napmany napmany commented Dec 11, 2025

Summary by CodeRabbit

  • New Features

    • Added support for Anthropic API (v1/messages endpoint)
    • Introduced non-root Docker container variants for improved security
    • Added configuration hot-reload example with directory mount and watch flag
  • Documentation

    • New container security guide with best practices and mitigation options
    • Updated README with Anthropic API support and security documentation links
    • Expanded Docker installation examples and nightly image notes
  • Bug Fixes

    • Improved metrics parsing to support both OpenAI and Anthropic API formats
  • Chores

    • Updated Go toolchain and dependencies

✏️ Tip: You can customize this high-level summary in your review settings.

nikeshparajuli and others added 12 commits November 24, 2025 21:39
Fixes issues on Windows showing new windows for every process llama-swap spawns.
Change the user back to root for containers. Additionally, built a "non-root" labeled container for users who wish to have the additional security of running llama-swap as a lower privileged user.
…ap repositories (mostlygeek#396)

* feat: Add support for custom llama.cpp base image and forked llama-swap repositories

- Introduce BASE_LLAMACPP_IMAGE env var to customize llama.cpp base image
- Introduce LS_REPO env var to customize llama-swap source
- Use GITHUB_REPOSITORY env var to automatically detect forked repos
- Update container tagging to use dynamic repo paths
- Pass build args for BASE_IMAGE and LS_REPO to Containerfile
- Enable flexible release downloads from forked repositories

* chore: quote entire curl options, appease coderabbitai
* proxy: add support for anthropic v1/messages api
* proxy: restrict loading message to /v1/chat/completions
…nsiderations (mostlygeek#416)

* docs: add documentation for non-root container images and security considerations
* docs: move container security section to dedicated file and update README links
- add supported anthropic API 
- add example for docker hot reload support
Make it so llama-server can be called directly instead of with the full
path at /app/llama-server.

Fixes mostlygeek#423
Ref: mostlygeek#233
@coderabbitai
Copy link

coderabbitai bot commented Dec 11, 2025

Walkthrough

This pull request updates documentation and infrastructure to support Anthropic API endpoints alongside OpenAI-compatible servers, refactors Docker builds to generate both root and non-root container variants, improves metrics parsing for v1/messages responses, adds cross-platform process attribute handling, and consolidates request routing to a unified inference handler across multiple endpoints.

Changes

Cohort / File(s) Change Summary
Documentation
README.md, docs/container-security.md
Updates README with Anthropic API support, non-root Docker variants, config hot-reload example, and updated image tags. Adds new container-security.md documenting root vs. non-root trade-offs and mitigation strategies.
Docker Build Infrastructure
docker/build-container.sh, docker/llmsnap.Containerfile
Refactors build script to loop over non-root and root variants, introduces LS_REPO and BASE_IMAGE environment variables, constructs per-type tags and build args (UID, GID, USER_HOME). Updates Containerfile to accept parameterized base image, LS_REPO, and LS_VER; appends /app to PATH.
Dependencies
go.mod
Bumps Go toolchain from 1.23.0 to 1.25.4 and updates indirect x/crypto, x/net, x/sys, x/text modules.
Metrics Parsing
proxy/metrics_monitor.go
Refactors parseMetrics signature to accept nested usage and timings objects instead of flat JSON. Adds detection logic for v1/messages (input_tokens, output_tokens) vs. v1/chat/completions (prompt_tokens, completion_tokens) token fields and cache_read_input_tokens.
Request Processing & Routing
proxy/process.go, proxy/proxymanager.go
Adds setProcAttributes calls in start and stop command paths. Consolidates multiple endpoints (/v1/chat/completions, /v1/completions, /v1/embeddings, /v1/messages, /reranking, /infill, /audio/speech) to use unified proxyInferenceHandler; renames proxyOAIHandler. Adds Anthropic v1/chat/completions path check for streaming.
Cross-Platform Process Handling
proxy/process_unix.go, proxy/process_windows.go
Introduces platform-specific setProcAttributes implementations: Unix variant is a no-op; Windows variant sets HideWindow and CREATE_NO_WINDOW flags.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

  • docker/build-container.sh: Introduces loop-driven build logic with conditional tagging, multiple environment variables, and per-type build arguments that require careful verification across architectures and variants.
  • proxy/metrics_monitor.go: Signature change with nested JSON extraction and format-detection logic (Anthropic vs. OpenAI); requires verification of token field mapping and handling of optional cache_read_input_tokens.
  • proxy/proxymanager.go: Handler consolidation across multiple endpoints with new Anthropic route; verify all existing endpoints still map correctly and that unified handler logic is appropriate for all request types.
  • docker/llmsnap.Containerfile: Parameterized base image and LS_REPO changes; verify URL quoting and variable substitution work as intended.

Possibly related PRs

  • Fix containers build action #5: Both PRs modify docker/build-container.sh with changes to container repo/tag logic and build arguments—potential for merge conflicts or duplicated effort in the Docker build refactoring.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title "Sync with llama-swap" is vague and does not clearly summarize the substantial changes made across multiple files, including Docker optimizations, Anthropic API support, security enhancements, and Go dependency updates. Consider using a more descriptive title that captures the main changes, such as "Add Anthropic API support, non-root Docker variants, and security documentation" or similar.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch sync

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
proxy/metrics_monitor.go (1)

230-240: Potential data loss when timings has incomplete token counts.

When timings exists, lines 231-232 unconditionally overwrite inputTokens and outputTokens from the usage block. If timings is present but prompt_n or predicted_n are missing or zero, the valid token counts from usage will be replaced with zeros.

Consider guarding the overwrites:

 	// use llama-server's timing data for tok/sec and duration as it is more accurate
 	if timings.Exists() {
-		inputTokens = int(timings.Get("prompt_n").Int())
-		outputTokens = int(timings.Get("predicted_n").Int())
+		if pn := timings.Get("prompt_n"); pn.Exists() {
+			inputTokens = int(pn.Int())
+		}
+		if pdn := timings.Get("predicted_n"); pdn.Exists() {
+			outputTokens = int(pdn.Int())
+		}
 		promptPerSecond = timings.Get("prompt_per_second").Float()
 		tokensPerSecond = timings.Get("predicted_per_second").Float()
 		durationMs = int(timings.Get("prompt_ms").Float() + timings.Get("predicted_ms").Float())
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bb1d79e and f2fd786.

⛔ Files ignored due to path filters (3)
  • docs/assets/hero1.jpg is excluded by !**/*.jpg
  • go.sum is excluded by !**/*.sum
  • ui/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (10)
  • README.md (4 hunks)
  • docker/build-container.sh (2 hunks)
  • docker/llmsnap.Containerfile (2 hunks)
  • docs/container-security.md (1 hunks)
  • go.mod (2 hunks)
  • proxy/metrics_monitor.go (3 hunks)
  • proxy/process.go (3 hunks)
  • proxy/process_unix.go (1 hunks)
  • proxy/process_windows.go (1 hunks)
  • proxy/proxymanager.go (2 hunks)
🧰 Additional context used
🪛 LanguageTool
README.md

[grammar] ~16-~16: Use a hyphen to join words.
Context: ...model switching - ✅ Use any local OpenAI compatible server (llama.cpp, vllm, tabb...

(QB_NEW_EN_HYPHEN)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: run-tests
  • GitHub Check: run-tests
🔇 Additional comments (21)
proxy/metrics_monitor.go (3)

125-136: LGTM!

The extraction of usage and timings from the parsed JSON with short-circuit logic when neither exists is clean and appropriate. Error handling correctly logs warnings without affecting client responses.


184-190: LGTM!

The streaming response parsing correctly mirrors the non-streaming pattern, extracting usage and timings and continuing the backward iteration if neither is present.


197-227: LGTM!

The refactored function signature cleanly separates usage and timings parsing. The dual-format support for OpenAI-style (prompt_tokens/completion_tokens) and v1/messages-style (input_tokens/output_tokens) fields is well-implemented with proper fallback logic.

proxy/process_unix.go (1)

1-12: LGTM!

Clean cross-platform abstraction. The no-op Unix implementation correctly pairs with the Windows variant that sets HideWindow and CREATE_NO_WINDOW flags.

proxy/proxymanager.go (2)

239-261: LGTM! Good consolidation of inference endpoints.

The unified proxyInferenceHandler appropriately handles both OpenAI-compatible and Anthropic /v1/messages endpoints since both use the model field in the JSON body for routing.


550-626: Handler implementation is sound.

The renamed proxyInferenceHandler correctly extracts the model field from JSON bodies, which is compatible with both OpenAI (/v1/chat/completions) and Anthropic (/v1/messages) request formats.

docker/build-container.sh (2)

56-79: Both container variants share the same :${ARCH} latest tag for root.

The loop builds non-root first, then root. Since both use CONTAINER_LATEST="ghcr.io/${LS_REPO}:${ARCH}" as the base (with non-root appending -non-root), the final :${ARCH} tag will point to the root variant. This is likely intentional for backward compatibility, but worth confirming.

If the intent is that :${ARCH} should remain the root image and :${ARCH}-non-root for non-root, the current logic is correct.


23-33: Good use of environment variables for fork customization.

The BASE_LLAMACPP_IMAGE and GITHUB_REPOSITORY overrides enable easy testing with forked repositories while maintaining sensible defaults.

proxy/process_windows.go (1)

10-16: LGTM!

The Windows-specific implementation correctly sets HideWindow and CREATE_NO_WINDOW to prevent console windows from appearing when spawning backend processes. The magic number 0x08000000 is clearly documented.

proxy/process.go (2)

872-872: LGTM!

Consistent application of platform-specific process attributes to the stop command, matching the approach used for the start command at Line 304.


304-304: Platform-specific implementations exist and are correctly structured.

The setProcAttributes(p.cmd) call is properly implemented across platforms:

  • proxy/process_unix.go: No-op implementation for Unix systems
  • proxy/process_windows.go: Sets HideWindow attribute via SysProcAttr

The cross-platform pattern using Go's filename convention (_unix.go, _windows.go) is correctly applied.

docker/llmsnap.Containerfile (4)

1-3: LGTM!

Parameterizing the base image and tag provides flexibility for building different container variants (root, non-root, different platforms) as described in the PR objectives.


6-7: LGTM!

The switch to semantic versioning (0.0.1) and parameterized repository make the build more flexible and align with the container variant improvements in this PR.


33-34: LGTM!

Adding /app to PATH allows the llmsnap binary to be invoked directly without specifying the full path, improving usability.


37-39: LGTM!

Using variables for the download URL and filenames makes the build more maintainable and aligns with the parameterization approach throughout the Containerfile.

docs/container-security.md (1)

1-9: LGTM!

This documentation provides clear, balanced guidance on container security considerations. It appropriately explains the trade-offs between root and non-root containers, offers practical configuration options, and includes helpful references to external documentation.

README.md (5)

24-25: LGTM!

Clear documentation of the new Anthropic v1/messages endpoint support. Note that there's a separate comment on proxy/process.go regarding streaming loading state support for this endpoint.


67-67: LGTM!

Good documentation of non-root container variants with appropriate link to the security documentation, helping users make informed choices.


78-84: LGTM!

Excellent addition demonstrating the configuration hot-reload feature with practical Docker volume mounting examples.


101-101: LGTM!

Image tag updated consistently with the version changes in the Containerfile.


103-104: LGTM!

Clear example of the non-root CUDA variant, making it easy for security-conscious users to adopt the safer container image.

@napmany napmany merged commit 199d25b into main Dec 11, 2025
3 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Feb 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants