Sync with llama-swap by napmany · Pull Request #7 · napmany/llmsnap

napmany · 2025-12-11T12:33:24Z

Summary by CodeRabbit

New Features
- Added support for Anthropic API (v1/messages endpoint)
- Introduced non-root Docker container variants for improved security
- Added configuration hot-reload example with directory mount and watch flag
Documentation
- New container security guide with best practices and mitigation options
- Updated README with Anthropic API support and security documentation links
- Expanded Docker installation examples and nightly image notes
Bug Fixes
- Improved metrics parsing to support both OpenAI and Anthropic API formats
Chores
- Updated Go toolchain and dependencies

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Fixes issues on Windows showing new windows for every process llama-swap spawns.

Change the user back to root for containers. Additionally, built a "non-root" labeled container for users who wish to have the additional security of running llama-swap as a lower privileged user.

…ap repositories (mostlygeek#396) * feat: Add support for custom llama.cpp base image and forked llama-swap repositories - Introduce BASE_LLAMACPP_IMAGE env var to customize llama.cpp base image - Introduce LS_REPO env var to customize llama-swap source - Use GITHUB_REPOSITORY env var to automatically detect forked repos - Update container tagging to use dynamic repo paths - Pass build args for BASE_IMAGE and LS_REPO to Containerfile - Enable flexible release downloads from forked repositories * chore: quote entire curl options, appease coderabbitai

* proxy: add support for anthropic v1/messages api * proxy: restrict loading message to /v1/chat/completions

…eek#418)

…nsiderations (mostlygeek#416) * docs: add documentation for non-root container images and security considerations * docs: move container security section to dedicated file and update README links

- add supported anthropic API - add example for docker hot reload support

Make it so llama-server can be called directly instead of with the full path at /app/llama-server. Fixes mostlygeek#423 Ref: mostlygeek#233

coderabbitai · 2025-12-11T12:33:33Z

Walkthrough

This pull request updates documentation and infrastructure to support Anthropic API endpoints alongside OpenAI-compatible servers, refactors Docker builds to generate both root and non-root container variants, improves metrics parsing for v1/messages responses, adds cross-platform process attribute handling, and consolidates request routing to a unified inference handler across multiple endpoints.

Changes

Cohort / File(s)	Change Summary
Documentation `README.md`, `docs/container-security.md`	Updates README with Anthropic API support, non-root Docker variants, config hot-reload example, and updated image tags. Adds new container-security.md documenting root vs. non-root trade-offs and mitigation strategies.
Docker Build Infrastructure `docker/build-container.sh`, `docker/llmsnap.Containerfile`	Refactors build script to loop over non-root and root variants, introduces LS_REPO and BASE_IMAGE environment variables, constructs per-type tags and build args (UID, GID, USER_HOME). Updates Containerfile to accept parameterized base image, LS_REPO, and LS_VER; appends /app to PATH.
Dependencies `go.mod`	Bumps Go toolchain from 1.23.0 to 1.25.4 and updates indirect x/crypto, x/net, x/sys, x/text modules.
Metrics Parsing `proxy/metrics_monitor.go`	Refactors parseMetrics signature to accept nested usage and timings objects instead of flat JSON. Adds detection logic for v1/messages (input_tokens, output_tokens) vs. v1/chat/completions (prompt_tokens, completion_tokens) token fields and cache_read_input_tokens.
Request Processing & Routing `proxy/process.go`, `proxy/proxymanager.go`	Adds setProcAttributes calls in start and stop command paths. Consolidates multiple endpoints (/v1/chat/completions, /v1/completions, /v1/embeddings, /v1/messages, /reranking, /infill, /audio/speech) to use unified proxyInferenceHandler; renames proxyOAIHandler. Adds Anthropic v1/chat/completions path check for streaming.
Cross-Platform Process Handling `proxy/process_unix.go`, `proxy/process_windows.go`	Introduces platform-specific setProcAttributes implementations: Unix variant is a no-op; Windows variant sets HideWindow and CREATE_NO_WINDOW flags.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

docker/build-container.sh: Introduces loop-driven build logic with conditional tagging, multiple environment variables, and per-type build arguments that require careful verification across architectures and variants.
proxy/metrics_monitor.go: Signature change with nested JSON extraction and format-detection logic (Anthropic vs. OpenAI); requires verification of token field mapping and handling of optional cache_read_input_tokens.
proxy/proxymanager.go: Handler consolidation across multiple endpoints with new Anthropic route; verify all existing endpoints still map correctly and that unified handler logic is appropriate for all request types.
docker/llmsnap.Containerfile: Parameterized base image and LS_REPO changes; verify URL quoting and variable substitution work as intended.

Possibly related PRs

Fix containers build action #5: Both PRs modify docker/build-container.sh with changes to container repo/tag logic and build arguments—potential for merge conflicts or duplicated effort in the Docker build refactoring.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Title check	❓ Inconclusive	The title "Sync with llama-swap" is vague and does not clearly summarize the substantial changes made across multiple files, including Docker optimizations, Anthropic API support, security enhancements, and Go dependency updates.	Consider using a more descriptive title that captures the main changes, such as "Add Anthropic API support, non-root Docker variants, and security documentation" or similar.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch sync

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

proxy/metrics_monitor.go (1)
230-240: Potential data loss when timings has incomplete token counts.

When timings exists, lines 231-232 unconditionally overwrite inputTokens and outputTokens from the usage block. If timings is present but prompt_n or predicted_n are missing or zero, the valid token counts from usage will be replaced with zeros.

Consider guarding the overwrites:
 	// use llama-server's timing data for tok/sec and duration as it is more accurate
 	if timings.Exists() {
-		inputTokens = int(timings.Get("prompt_n").Int())
-		outputTokens = int(timings.Get("predicted_n").Int())
+		if pn := timings.Get("prompt_n"); pn.Exists() {
+			inputTokens = int(pn.Int())
+		}
+		if pdn := timings.Get("predicted_n"); pdn.Exists() {
+			outputTokens = int(pdn.Int())
+		}
 		promptPerSecond = timings.Get("prompt_per_second").Float()
 		tokensPerSecond = timings.Get("predicted_per_second").Float()
 		durationMs = int(timings.Get("prompt_ms").Float() + timings.Get("predicted_ms").Float())

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bb1d79e and f2fd786.

⛔ Files ignored due to path filters (3)

docs/assets/hero1.jpg is excluded by !**/*.jpg
go.sum is excluded by !**/*.sum
ui/package-lock.json is excluded by !**/package-lock.json

📒 Files selected for processing (10)

README.md (4 hunks)
docker/build-container.sh (2 hunks)
docker/llmsnap.Containerfile (2 hunks)
docs/container-security.md (1 hunks)
go.mod (2 hunks)
proxy/metrics_monitor.go (3 hunks)
proxy/process.go (3 hunks)
proxy/process_unix.go (1 hunks)
proxy/process_windows.go (1 hunks)
proxy/proxymanager.go (2 hunks)

🧰 Additional context used

🪛 LanguageTool

README.md

[grammar] ~16-~16: Use a hyphen to join words.
Context: ...model switching - ✅ Use any local OpenAI compatible server (llama.cpp, vllm, tabb...

(QB_NEW_EN_HYPHEN)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: run-tests
GitHub Check: run-tests

🔇 Additional comments (21)

proxy/metrics_monitor.go (3)

125-136: LGTM!

The extraction of usage and timings from the parsed JSON with short-circuit logic when neither exists is clean and appropriate. Error handling correctly logs warnings without affecting client responses.

184-190: LGTM!

The streaming response parsing correctly mirrors the non-streaming pattern, extracting usage and timings and continuing the backward iteration if neither is present.

197-227: LGTM!

The refactored function signature cleanly separates usage and timings parsing. The dual-format support for OpenAI-style (prompt_tokens/completion_tokens) and v1/messages-style (input_tokens/output_tokens) fields is well-implemented with proper fallback logic.

proxy/process_unix.go (1)

1-12: LGTM!

Clean cross-platform abstraction. The no-op Unix implementation correctly pairs with the Windows variant that sets HideWindow and CREATE_NO_WINDOW flags.

proxy/proxymanager.go (2)

239-261: LGTM! Good consolidation of inference endpoints.

The unified proxyInferenceHandler appropriately handles both OpenAI-compatible and Anthropic /v1/messages endpoints since both use the model field in the JSON body for routing.

550-626: Handler implementation is sound.

The renamed proxyInferenceHandler correctly extracts the model field from JSON bodies, which is compatible with both OpenAI (/v1/chat/completions) and Anthropic (/v1/messages) request formats.

docker/build-container.sh (2)

56-79: Both container variants share the same :${ARCH} latest tag for root.

The loop builds non-root first, then root. Since both use CONTAINER_LATEST="ghcr.io/${LS_REPO}:${ARCH}" as the base (with non-root appending -non-root), the final :${ARCH} tag will point to the root variant. This is likely intentional for backward compatibility, but worth confirming.

If the intent is that :${ARCH} should remain the root image and :${ARCH}-non-root for non-root, the current logic is correct.

23-33: Good use of environment variables for fork customization.

The BASE_LLAMACPP_IMAGE and GITHUB_REPOSITORY overrides enable easy testing with forked repositories while maintaining sensible defaults.

proxy/process_windows.go (1)

10-16: LGTM!

The Windows-specific implementation correctly sets HideWindow and CREATE_NO_WINDOW to prevent console windows from appearing when spawning backend processes. The magic number 0x08000000 is clearly documented.

proxy/process.go (2)

872-872: LGTM!

Consistent application of platform-specific process attributes to the stop command, matching the approach used for the start command at Line 304.

304-304: Platform-specific implementations exist and are correctly structured.

The setProcAttributes(p.cmd) call is properly implemented across platforms:

proxy/process_unix.go: No-op implementation for Unix systems

proxy/process_windows.go: Sets HideWindow attribute via SysProcAttr

The cross-platform pattern using Go's filename convention (_unix.go, _windows.go) is correctly applied.

docker/llmsnap.Containerfile (4)

1-3: LGTM!

Parameterizing the base image and tag provides flexibility for building different container variants (root, non-root, different platforms) as described in the PR objectives.

6-7: LGTM!

The switch to semantic versioning (0.0.1) and parameterized repository make the build more flexible and align with the container variant improvements in this PR.

33-34: LGTM!

Adding /app to PATH allows the llmsnap binary to be invoked directly without specifying the full path, improving usability.

37-39: LGTM!

Using variables for the download URL and filenames makes the build more maintainable and aligns with the parameterization approach throughout the Containerfile.

docs/container-security.md (1)

1-9: LGTM!

This documentation provides clear, balanced guidance on container security considerations. It appropriately explains the trade-offs between root and non-root containers, offers practical configuration options, and includes helpful references to external documentation.

README.md (5)

24-25: LGTM!

Clear documentation of the new Anthropic v1/messages endpoint support. Note that there's a separate comment on proxy/process.go regarding streaming loading state support for this endpoint.

67-67: LGTM!

Good documentation of non-root container variants with appropriate link to the security documentation, helping users make informed choices.

78-84: LGTM!

Excellent addition demonstrating the configuration hot-reload feature with practical Docker volume mounting examples.

101-101: LGTM!

Image tag updated consistently with the version changes in the Containerfile.

103-104: LGTM!

Clear example of the non-root CUDA variant, making it easy for security-conscious users to adopt the safer container image.

proxy/process.go

nikeshparajuli and others added 12 commits November 24, 2025 21:39

feat: add platform-specific process attributes support (mostlygeek#411)

06523d8

Fixes issues on Windows showing new windows for every process llama-swap spawns.

docker: build both root and non-root container images (mostlygeek#412)

b1dec8b

Change the user back to root for containers. Additionally, built a "non-root" labeled container for users who wish to have the additional security of running llama-swap as a lower privileged user.

proxy: add support for anthropic v1/messages api (mostlygeek#417)

c968da1

* proxy: add support for anthropic v1/messages api * proxy: restrict loading message to /v1/chat/completions

go.mod,ui/package-lock.json: dependency and security updates (mostlyg…

bccce5f

…eek#418)

proxy: extract metrics for v1/messages (mostlygeek#419)

dea9873

docs: add documentation for non-root container images and security co…

dc00d17

…nsiderations (mostlygeek#416) * docs: add documentation for non-root container images and security considerations * docs: move container security section to dedicated file and update README links

Update README.md

f03871c

- add supported anthropic API - add example for docker hot reload support

README: update hero image

021ccce

move header images around [skip ci]

7b3b0f5

docker: add /app to $PATH (mostlygeek#424)

98879b3

Make it so llama-server can be called directly instead of with the full path at /app/llama-server. Fixes mostlygeek#423 Ref: mostlygeek#233

Merge remote-tracking branch 'borigin/main' into sync

f2fd786

coderabbitai bot reviewed Dec 11, 2025

View reviewed changes

proxy/process.go Show resolved Hide resolved

napmany merged commit 199d25b into main Dec 11, 2025
3 checks passed

coderabbitai bot mentioned this pull request Feb 14, 2026

Sync with llama-swap v190 #17

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync with llama-swap#7

Sync with llama-swap#7
napmany merged 12 commits intomainfrom
sync

napmany commented Dec 11, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 11, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

napmany commented Dec 11, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

napmany commented Dec 11, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 11, 2025 •

edited

Loading