Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
53b32f3
proxy: add API key support (#436)
mostlygeek Dec 24, 2025
9864f9f
.coderabbit.yaml: disable annoying features
mostlygeek Dec 24, 2025
22e098a
Add Peer Model Support (#438)
mostlygeek Dec 28, 2025
37d74ef
proxy: add /v1/images/generations (#443)
mostlygeek Dec 31, 2025
addb986
proxy: add support for basic authorization (#445)
mostlygeek Dec 31, 2025
3dc3603
proxy: skip very slow tests in -short test mode (#446)
mostlygeek Dec 31, 2025
7931212
proxy: add v1/images/edits API endpoint (#447)
mostlygeek Jan 1, 2026
8df5e85
proxy: add /v1/responses and /v1/audio/voices endpoints (#448)
mostlygeek Jan 1, 2026
4413881
proxy: actually add /v1/responses endpoint (#449)
mostlygeek Jan 1, 2026
c19309f
CLAUDE.md: small instruction tweaks
mostlygeek Jan 8, 2026
267c030
ui: update react-router-dom to 7.12.0 (#456)
mostlygeek Jan 9, 2026
4f863fd
CLAUDE.md: tweak instructions
mostlygeek Jan 10, 2026
66d555e
Improve container build reliability (#457)
mostlygeek Jan 11, 2026
3edb180
ci: free up disk space before ROCm container build (#460)
mostlygeek Jan 15, 2026
eb5bfff
proxy: unify filtering for local models and peers
mostlygeek Jan 16, 2026
124007c
config: add environment variable macros (#466)
mostlygeek Jan 17, 2026
8f2137c
config: support environment variable macros in apiKeys (#467)
mostlygeek Jan 17, 2026
b73f367
config-schema.json,config.example.yaml: Update examples and schema
mostlygeek Jan 17, 2026
75fced5
config: support macros in peer apiKey and filters (#469)
mostlygeek Jan 17, 2026
4e850c2
config: refactor macro substitution in configuration (#470)
mostlygeek Jan 19, 2026
14207f8
ui: npm security update
mostlygeek Jan 19, 2026
205efd4
proxy: extend /running endpoint with additional process data (#474)
mostlygeek Jan 20, 2026
7493618
Add count_tokens api proxying (#476)
simcop2387 Jan 20, 2026
f942261
build(deps-dev): bump tar from 7.5.3 to 7.5.6 in /ui (#477)
dependabot[bot] Jan 22, 2026
6439ab1
ui: add peer:true in package-lock.json
mostlygeek Jan 22, 2026
4384315
ui-svelte: add Svelte port of React UI (#487)
mostlygeek Jan 29, 2026
6f8e7cc
.github/workflows: switch release.yml to build ui-svelte
mostlygeek Jan 29, 2026
5de387d
ui: fix node-tar vulnerability
mostlygeek Jan 29, 2026
cdea7d1
proxy/config: skip env macros in YAML comment lines (#496)
mostlygeek Jan 31, 2026
20738f3
proxy,ui-svelte: replace old UI with svelte+playground
mostlygeek Feb 1, 2026
7b20fc0
Add path filters to CI workflows and create UI test workflow (#501)
mostlygeek Feb 1, 2026
0462e3d
Reorganize UI controls and improve form interactions (#500)
mostlygeek Feb 1, 2026
bc01e6f
build: add stable-diffusion server to musa and vulkan container image…
mostlygeek Feb 2, 2026
7eef5de
docs: add stable-diffusion.cpp references (#506)
rare-magma Feb 5, 2026
b5fde8e
proxy,ui-svelte: add request/response capturing (#508)
mostlygeek Feb 7, 2026
8d6d949
proxy: support timings for /infill from llama-server (#510)
mostlygeek Feb 8, 2026
44e8fb3
Merge remote-tracking branch 'borigin/main' into sync-v190
napmany Feb 13, 2026
9a7f1ec
fix merge
napmany Feb 13, 2026
715c545
restore sleep/wake support in ui
napmany Feb 14, 2026
8274569
merge fix
napmany Feb 14, 2026
0633920
fix merge
napmany Feb 14, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .coderabbit.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,15 @@ reviews:
poem: false
review_status: true
collapse_walkthrough: false
sequence_diagrams: false
finishing_touches:
docstrings:
enabled: false
auto_review:
enabled: true
drafts: false
chat:
auto_reply: true
issue_enrichment:
planning:
enabled: false
24 changes: 22 additions & 2 deletions .github/workflows/containers.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,37 @@ on:
# Allows manual triggering of the workflow
workflow_dispatch:

# Run on workflow file changes (without pushing)
push:
paths:
- '.github/workflows/containers.yml'
- 'docker/build-container.sh'
- 'docker/*.Containerfile'

jobs:
build-and-push:
runs-on: ubuntu-latest
strategy:
matrix:
platform: [intel, cuda, vulkan, cpu, musa]
platform: [intel, cuda, vulkan, cpu, musa, rocm]
fail-fast: false
steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Free up disk space
if: matrix.platform == 'rocm'
run: |
echo "Before cleanup:"
df -h
sudo rm -rf /usr/share/dotnet
sudo rm -rf /usr/local/lib/android
sudo rm -rf /opt/ghc
sudo rm -rf /opt/hostedtoolcache/CodeQL
sudo docker system prune -af
echo "After cleanup:"
df -h

- name: Log in to GitHub Container Registry
uses: docker/login-action@v2
with:
Expand All @@ -31,7 +51,7 @@ jobs:
- name: Run build-container
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: ./docker/build-container.sh ${{ matrix.platform }} true
run: ./docker/build-container.sh ${{ matrix.platform }} ${{ github.event_name != 'push' }}

# note make sure napmany/llmsnap has admin rights to the llmsnap package
# see: https://github.com/actions/delete-package-versions/issues/74
Expand Down
20 changes: 18 additions & 2 deletions .github/workflows/go-ci-windows.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,25 @@ name: Windows CI
on:
push:
branches: [ "main" ]
# only run when backend source changes
# cmd/ is excluded because it contains utilities without tests
paths:
- '**/*.go'
- '!cmd/**'
- 'go.mod'
- 'go.sum'
- 'Makefile'
- '.github/workflows/go-ci-windows.yml'

pull_request:
branches: [ "main" ]
paths:
- '**/*.go'
- '!cmd/**'
- 'go.mod'
- 'go.sum'
- 'Makefile'
- '.github/workflows/go-ci-windows.yml'

# Allows manual triggering of the workflow
workflow_dispatch:
Expand All @@ -28,7 +44,7 @@ jobs:
uses: actions/cache/restore@v4
with:
path: ./build
key: ${{ runner.os }}-simple-responder-${{ hashFiles('misc/simple-responder/simple-responder.go') }}
key: ${{ runner.os }}-simple-responder-${{ hashFiles('cmd/simple-responder/simple-responder.go') }}

# necessary for testing proxy/Process swapping
- name: Create simple-responder
Expand All @@ -42,7 +58,7 @@ jobs:
uses: actions/cache/save@v4
with:
path: ./build
key: ${{ runner.os }}-simple-responder-${{ hashFiles('misc/simple-responder/simple-responder.go') }}
key: ${{ runner.os }}-simple-responder-${{ hashFiles('cmd/simple-responder/simple-responder.go') }}

- name: Test all
shell: bash
Expand Down
16 changes: 16 additions & 0 deletions .github/workflows/go-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,25 @@ name: Linux CI
on:
push:
branches: [ "main" ]
# only run when backend source changes
# cmd/ is excluded because it contains utilities without tests
paths:
- '**/*.go'
- '!cmd/**'
- 'go.mod'
- 'go.sum'
- 'Makefile'
- '.github/workflows/go-ci.yml'

pull_request:
branches: [ "main" ]
paths:
- '**/*.go'
- '!cmd/**'
- 'go.mod'
- 'go.sum'
- 'Makefile'
- '.github/workflows/go-ci.yml'

# Allows manual triggering of the workflow
workflow_dispatch:
Expand Down
27 changes: 11 additions & 16 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,13 @@ name: goreleaser
on:
push:
tags:
- '*'
- "*"

# Allows manual triggering of the workflow
workflow_dispatch:
inputs:
tag:
description: 'Tag version to release (e.g. v144)'
description: "Tag version to release (e.g. v144)"
required: true

permissions:
Expand All @@ -19,35 +19,30 @@ jobs:
goreleaser:
runs-on: ubuntu-latest
steps:
-
name: Checkout
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
ref: ${{ github.event.inputs.tag || github.ref }}
-
name: Set up Go
- name: Set up Go
uses: actions/setup-go@v5
-
name: Set up Node.js
- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: '23'
-
name: Install dependencies and build UI
node-version: "24"
- name: Install dependencies and build UI
run: |
cd ui
cd ui-svelte
npm ci
npm run build

-
name: Run GoReleaser
- name: Run GoReleaser
uses: goreleaser/goreleaser-action@v6
with:
# either 'goreleaser' (default) or 'goreleaser-pro'
distribution: goreleaser
# 'latest', 'nightly', or a semver
version: '~> v2'
version: "~> v2"
args: release --clean
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Expand Down Expand Up @@ -76,4 +71,4 @@ jobs:
"release": {
"tag_name": "${{ steps.tag.outputs.tag }}"
}
}
}
42 changes: 42 additions & 0 deletions .github/workflows/ui-tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
name: UI Tests

on:
push:
branches: [ "main" ]
paths:
- 'ui-svelte/**'
- '.github/workflows/ui-tests.yml'

pull_request:
branches: [ "main" ]
paths:
- 'ui-svelte/**'
- '.github/workflows/ui-tests.yml'

workflow_dispatch:

jobs:

run-tests:
runs-on: ubuntu-latest
defaults:
run:
working-directory: ui-svelte
steps:
- uses: actions/checkout@v4

- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: '24'
cache: 'npm'
cache-dependency-path: ui-svelte/package-lock.json

- name: Install dependencies
run: npm ci

- name: Type check
run: npm run check

- name: Run tests
run: npm test
54 changes: 31 additions & 23 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,37 +7,45 @@ llmsnap is a light weight, transparent proxy server that provides automatic mode
## Tech stack

- golang
- typescript, vite and react for UI (ui/)

## Testing

- `make test-dev` - Use this when making iterative changes. Runs `go test` and `staticcheck`. Fix any static checking errors. Use this only when changes are made to any code under the `proxy/` directory
- `make test-all` - runs at the end before completing work. Includes long running concurrency tests.
- typescript, vite and react for UI (located in ui/)

## Workflow Tasks

### Plan Improvements
- when summarizing changes only include details that require further action
- just say "Done." when there is no further action
- use `gh` to create PRs and load issues
- do include Co-Authored-By or created by when committing changes or creating PRs
- keep PR descriptions short and focused on changes.
- never include a test plan

## Testing

Work plans are located in ai-plans/. Plans written by the user may be incomplete, contain inconsistencies or errors.
- Follow test naming conventions like `TestProxyManager_<test name>`, `TestProcessGroup_<test name>`, etc.
- Use `go test -v -run <name pattern for new tests>` to run any new tests you've written.
- Use `make test-dev` after running new tests for a quick over all test run. This runs `go test` and `staticcheck`. Fix any static checking errors. Use this only when changes are made to any code under the `proxy/` directory
- Use `make test-all` before completing work. This includes long running concurrency tests.

When the user asks to improve a plan follow these guidelines for expanding and improving it.
### Commit message example format:

- Identify any inconsistencies.
- Expand plans out to be detailed specification of requirements and changes to be made.
- Plans should have at least these sections:
- Title - very short, describes changes
- Overview: A more detailed summary of goal and outcomes desired
- Design Requirements: Detailed descriptions of what needs to be done
- Testing Plan: Tests to be implemented
- Checklist: A detailed list of changes to be made
```
proxy: add new feature

Look for "plan expansion" as explicit instructions to improve a plan.
Add new feature that implements functionality X and Y.

### Implementation of plans
- key change 1
- key change 2
- key change 3

When the user says "paint it", respond with "commencing automated assembly". Then implement the changes as described by the plan. Update the checklist as you complete items.
fixes #123
```

## General Rules
## Code Reviews

- when summarizing changes only include details that require further action (action items)
- when there are no action items, just say "Done."
- use three levels High, Medium, Low severity
- label each discovered issue with a label like H1, M2, L3 respectively
- High severity are must fix issues (security, race conditions, critical bugs)
- Medium severity are recommended improvements (coding style, missing functionality, inconsistencies)
- Low severity are nice to have changes and nits
- Include a suggestion with each discovered item
- Limit your code review to three items with the highest priority first
- Double check your discovered items and recommended remediations
4 changes: 2 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -36,11 +36,11 @@ test-all: proxy/ui_dist/placeholder.txt
go test -race -count=1 ./proxy/...

ui/node_modules:
cd ui && npm install
cd ui-svelte && npm install

# build react UI
ui: ui/node_modules
cd ui && npm run build
cd ui-svelte && npm run build

# Build OSX binary
mac: ui
Expand Down
9 changes: 8 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,21 @@ Built in Go for performance and simplicity, llmsnap has zero dependencies and is

- ✅ Easy to deploy and configure: one binary, one configuration file. no external dependencies
- ✅ On-demand model switching
- ✅ Use any local OpenAI compatible server (llama.cpp, vllm, tabbyAPI, etc.)
- ✅ Use any local OpenAI compatible server (llama.cpp, vllm, tabbyAPI, stable-diffusion.cpp, etc.)
- future proof, upgrade your inference servers at any time.
- ✅ OpenAI API supported endpoints:
- `v1/completions`
- `v1/chat/completions`
- `v1/responses`
- `v1/embeddings`
- `v1/audio/speech` ([#36](https://github.com/mostlygeek/llama-swap/issues/36))
- `v1/audio/transcriptions` ([docs](https://github.com/mostlygeek/llama-swap/issues/41#issuecomment-2722637867))
- `v1/audio/voices`
- `v1/images/generations`
- `v1/images/edits`
- ✅ Anthropic API supported endpoints:
- `v1/messages`
- `v1/messages/count_tokens`
- ✅ llama-server (llama.cpp) supported endpoints
- `v1/rerank`, `v1/reranking`, `/rerank`
- `/infill` - for code infilling
Expand All @@ -35,6 +40,7 @@ Built in Go for performance and simplicity, llmsnap has zero dependencies and is
- `/running` - list currently running models ([#61](https://github.com/mostlygeek/llama-swap/issues/61))
- `/log` - remote log monitoring
- `/health` - just returns "OK"
- ✅ API Key support - define keys to restrict access to API endpoints
- ✅ Customizable
- Run multiple models at once with `Groups` ([#107](https://github.com/mostlygeek/llama-swap/issues/107))
- Automatic unloading of models after timeout by setting a `ttl`
Expand Down Expand Up @@ -65,6 +71,7 @@ llmsnap can be installed in multiple ways
### Docker Install ([download images](https://github.com/napmany/llmsnap/pkgs/container/llmsnap))

Nightly container images with llmsnap and llama-server are built for multiple platforms (cuda, vulkan, intel, etc.) including [non-root variants with improved security](docs/container-security.md).
The stable-diffusion.cpp server is also included for the musa and vulkan platforms.

```shell
$ docker pull ghcr.io/napmany/llmsnap:cuda
Expand Down
5 changes: 5 additions & 0 deletions cmd/simple-responder/simple-responder.go
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,11 @@ func main() {
})
})

r.GET("/v1/audio/voices", func(c *gin.Context) {
model := c.Query("model")
c.JSON(http.StatusOK, gin.H{"voices": []string{"voice1"}, "model": model})
})

r.GET("/slow-respond", func(c *gin.Context) {
echo := c.Query("echo")
delay := c.Query("delay")
Expand Down
Loading
Loading