Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
3989c17
add timeout feature
overcuriousity Jan 30, 2026
c34372c
implement first draft of new feature
overcuriousity Jan 30, 2026
29ef364
proxy/config: fix RPC endpoint parsing on Windows
overcuriousity Jan 30, 2026
ac074d1
fix unit test
overcuriousity Jan 30, 2026
c8f2761
rework web interface
overcuriousity Jan 30, 2026
6f023c7
fix error assumption healthy
overcuriousity Jan 30, 2026
c17df42
proxy: make RPC health checks independent of process state
overcuriousity Jan 30, 2026
4987daf
WIP: web config changes
overcuriousity Jan 30, 2026
e6f9f9a
proxy: fix requestTimeout feature to actually terminate requests
overcuriousity Jan 31, 2026
0e86bbc
docs: add requestTimeout to README features list
overcuriousity Jan 31, 2026
97976a6
Merge pull request #10 from overcuriousity/feat--web-config
overcuriousity Jan 31, 2026
fc33fdf
Merge pull request #11 from overcuriousity/feat--timeout
overcuriousity Jan 31, 2026
88f02d7
Merge branch 'new-features' into feat--conditional-rpc-healthcheck
overcuriousity Jan 31, 2026
7187493
Merge pull request #12 from overcuriousity/feat--conditional-rpc-heal…
overcuriousity Jan 31, 2026
79332e3
ui-svelte: improve Config editor dark mode styling
Jan 31, 2026
9ab8bd8
Merge branch 'mostlygeek:main' into feat--web-config
overcuriousity Jan 31, 2026
15a6aa7
Merge branch 'mostlygeek:main' into new-features
overcuriousity Jan 31, 2026
59db9f0
ui-svelte: fix Config editor compartment collision and error handling
Jan 31, 2026
60f599b
Merge branch 'new-features' into feat--web-config
overcuriousity Jan 31, 2026
8e62ce1
ui-svelte: fix Config editor cursor jumping on input
Jan 31, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,9 +42,11 @@ Built in Go for performance and simplicity, llama-swap has zero dependencies and
- ✅ API Key support - define keys to restrict access to API endpoints
- ✅ Customizable
- Run multiple models at once with `Groups` ([#107](https://github.com/mostlygeek/llama-swap/issues/107))
- Automatic unloading of models after timeout by setting a `ttl`
- Automatic unloading of models after idle timeout by setting a `ttl`
- Request timeout protection with `requestTimeout` to prevent runaway inference
- Reliable Docker and Podman support using `cmd` and `cmdStop` together
- Preload models on startup with `hooks` ([#235](https://github.com/mostlygeek/llama-swap/pull/235))
- RPC health checking for distributed inference - conditionally expose models based on RPC server availability

### Web UI

Expand Down Expand Up @@ -174,6 +176,7 @@ Almost all configuration settings are optional and can be added one step at a ti
- `useModelName` to override model names sent to upstream servers
- `${PORT}` automatic port variables for dynamic port assignment
- `filters` rewrite parts of requests before sending to the upstream server
- `rpcHealthCheck` monitor RPC server health for distributed inference models

See the [configuration documentation](docs/configuration.md) for all options.

Expand Down
11 changes: 11 additions & 0 deletions config-schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -216,10 +216,21 @@
"type": "boolean",
"description": "Overrides the global sendLoadingState for this model. Ommitting this property will use the global setting."
},
"requestTimeout": {
"type": "integer",
"minimum": 0,
"default": 0,
"description": "Maximum time in seconds for a single request to complete before forcefully killing the model process. This prevents runaway inference processes from blocking the GPU indefinitely. 0 disables timeout (default). When exceeded, the process is terminated and must be restarted for the next request."
},
"unlisted": {
"type": "boolean",
"default": false,
"description": "If true the model will not show up in /v1/models responses. It can still be used as normal in API requests."
},
"rpcHealthCheck": {
"type": "boolean",
"default": false,
"description": "Enable TCP health checks for RPC endpoints specified in cmd. When enabled, parses --rpc host:port[,host:port,...] from cmd and performs health checks every 30 seconds. Models with unhealthy RPC endpoints are filtered from /v1/models and return 503 on inference requests."
}
}
}
Expand Down
28 changes: 28 additions & 0 deletions config.example.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -249,6 +249,16 @@ models:
# - recommended to be omitted and the default used
concurrencyLimit: 0

# requestTimeout: maximum time in seconds for a single request to complete
# - optional, default: 0 (no timeout)
# - useful for preventing runaway inference processes that never complete
# - when exceeded, the model process is forcefully stopped
# - protects against GPU overheating and blocking from stuck processes
# - the process must be restarted for the next request
# - set to 0 to disable timeout
# - recommended for models that may have infinite loops or excessive generation
requestTimeout: 0 # disabled by default, set to e.g., 300 for 5 minutes

# sendLoadingState: overrides the global sendLoadingState setting for this model
# - optional, default: undefined (use global setting)
sendLoadingState: false
Expand All @@ -262,6 +272,24 @@ models:
unlisted: true
cmd: llama-server --port ${PORT} -m Llama-3.2-1B-Instruct-Q4_K_M.gguf -ngl 0

# RPC health check example for distributed inference:
"qwen-distributed":
# rpcHealthCheck: enable TCP health checks for RPC endpoints
# - optional, default: false
# - when enabled, parses --rpc host:port[,host:port,...] from cmd
# - performs TCP connectivity checks every 30 seconds
# - model is only listed in /v1/models when ALL RPC endpoints are healthy
# - inference requests to unhealthy models return HTTP 503
# - useful for distributed inference with llama.cpp's rpc-server
rpcHealthCheck: true
cmd: |
llama-server --port ${PORT}
--rpc 192.168.1.10:50051,192.168.1.11:50051
-m Qwen2.5-32B-Instruct-Q4_K_M.gguf
-ngl 99
name: "Qwen 32B (Distributed)"
description: "Large model using distributed RPC inference"

# Docker example:
# container runtimes like Docker and Podman can be used reliably with
# a combination of cmd, cmdStop, and ${MODEL_ID}
Expand Down
13 changes: 13 additions & 0 deletions config_embed.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
package main

import (
_ "embed"
)

//go:embed config.example.yaml
var configExampleYAML []byte

// GetConfigExampleYAML returns the embedded example config file
func GetConfigExampleYAML() []byte {
return configExampleYAML
}
31 changes: 21 additions & 10 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,16 +72,17 @@ models:

llama-swap supports many more features to customize how you want to manage your environment.

| Feature | Description |
| --------- | ---------------------------------------------- |
| `ttl` | automatic unloading of models after a timeout |
| `macros` | reusable snippets to use in configurations |
| `groups` | run multiple models at a time |
| `hooks` | event driven functionality |
| `env` | define environment variables per model |
| `aliases` | serve a model with different names |
| `filters` | modify requests before sending to the upstream |
| `...` | And many more tweaks |
| Feature | Description |
| ----------------- | ------------------------------------------------------- |
| `ttl` | automatic unloading of models after a timeout |
| `macros` | reusable snippets to use in configurations |
| `groups` | run multiple models at a time |
| `hooks` | event driven functionality |
| `env` | define environment variables per model |
| `aliases` | serve a model with different names |
Comment on lines +75 to +82
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Hyphenate the compound adjective (“event-driven”).
Minor doc polish.

Proposed edit
-| `hooks`           | event driven functionality                              |
+| `hooks`           | event-driven functionality                              |
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
| Feature | Description |
| ----------------- | ------------------------------------------------------- |
| `ttl` | automatic unloading of models after a timeout |
| `macros` | reusable snippets to use in configurations |
| `groups` | run multiple models at a time |
| `hooks` | event driven functionality |
| `env` | define environment variables per model |
| `aliases` | serve a model with different names |
| Feature | Description |
| ----------------- | ------------------------------------------------------- |
| `ttl` | automatic unloading of models after a timeout |
| `macros` | reusable snippets to use in configurations |
| `groups` | run multiple models at a time |
| `hooks` | event-driven functionality |
| `env` | define environment variables per model |
| `aliases` | serve a model with different names |
🧰 Tools
🪛 LanguageTool

[uncategorized] ~80-~80: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ... | | hooks | event driven functionality ...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

🤖 Prompt for AI Agents
In `@docs/configuration.md` around lines 75 - 82, The documentation table entry
for the "hooks" feature uses "event driven" but should be hyphenated as the
compound adjective "event-driven"; update the Description cell for the `hooks`
row in the table (the line containing the `hooks` feature) to read "event-driven
functionality".

| `filters` | modify requests before sending to the upstream |
| `rpcHealthCheck` | monitor RPC server health for distributed inference |
| `...` | And many more tweaks |

## Full Configuration Example

Expand Down Expand Up @@ -319,6 +320,16 @@ models:
# - recommended to be omitted and the default used
concurrencyLimit: 0

# requestTimeout: maximum time in seconds for a single request to complete
# - optional, default: 0 (no timeout)
# - useful for preventing runaway inference processes that never complete
# - when exceeded, the model process is forcefully stopped
# - protects against GPU overheating and blocking from stuck processes
# - the process must be restarted for the next request
# - set to 0 to disable timeout
# - recommended for models that may have infinite loops or excessive generation
requestTimeout: 300 # 5 minutes

# sendLoadingState: overrides the global sendLoadingState setting for this model
# - optional, default: undefined (use global setting)
sendLoadingState: false
Expand Down
18 changes: 12 additions & 6 deletions llama-swap.go
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,8 @@ func main() {
currentPM.Shutdown()
newPM := proxy.New(conf)
newPM.SetVersion(date, commit, version)
newPM.SetConfigPath(*configPath)
newPM.SetConfigExample(GetConfigExampleYAML())
srv.Handler = newPM
fmt.Println("Configuration Reloaded")

Expand All @@ -114,20 +116,24 @@ func main() {
}
newPM := proxy.New(conf)
newPM.SetVersion(date, commit, version)
newPM.SetConfigPath(*configPath)
newPM.SetConfigExample(GetConfigExampleYAML())
srv.Handler = newPM
}
}

// load the initial proxy manager
reloadProxyManager()
debouncedReload := debounce(time.Second, reloadProxyManager)
if *watchConfig {
defer event.On(func(e proxy.ConfigFileChangedEvent) {
if e.ReloadingState == proxy.ReloadingStateStart {
debouncedReload()
}
})()

// Always listen for API-triggered config changes
defer event.On(func(e proxy.ConfigFileChangedEvent) {
if e.ReloadingState == proxy.ReloadingStateStart {
debouncedReload()
}
})()

if *watchConfig {
fmt.Println("Watching Configuration for changes")
go func() {
absConfigPath, err := filepath.Abs(*configPath)
Expand Down
58 changes: 58 additions & 0 deletions proxy/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ package config
import (
"fmt"
"io"
"net"
"net/url"
"os"
"regexp"
Expand Down Expand Up @@ -533,6 +534,63 @@ func SanitizeCommand(cmdStr string) ([]string, error) {
return args, nil
}

// ParseRPCEndpoints extracts RPC endpoints from command string
// Handles: --rpc host:port,host2:port2 or --rpc=host:port or -rpc host:port
func ParseRPCEndpoints(cmdStr string) ([]string, error) {
args, err := SanitizeCommand(cmdStr)
if err != nil {
return nil, err
}

var endpoints []string
for i, arg := range args {
if arg == "--rpc" || arg == "-rpc" {
if i+1 < len(args) {
endpoints = parseEndpointList(args[i+1])
}
} else if strings.HasPrefix(arg, "--rpc=") {
endpoints = parseEndpointList(strings.TrimPrefix(arg, "--rpc="))
} else if strings.HasPrefix(arg, "-rpc=") {
endpoints = parseEndpointList(strings.TrimPrefix(arg, "-rpc="))
}
}

// Validate each endpoint
for _, ep := range endpoints {
if _, _, err := net.SplitHostPort(ep); err != nil {
return nil, fmt.Errorf("invalid RPC endpoint %q: %w", ep, err)
}
}

return endpoints, nil
Comment on lines +545 to +565
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fail fast when --rpc/-rpc has no value (and avoid silent overwrite).

If a flag is present without a value, the function currently returns an empty list. Also, repeated flags overwrite earlier endpoints. Consider erroring on missing values and appending endpoints.

🛠️ Suggested fix
 	for i, arg := range args {
 		if arg == "--rpc" || arg == "-rpc" {
-			if i+1 < len(args) {
-				endpoints = parseEndpointList(args[i+1])
-			}
+			if i+1 >= len(args) {
+				return nil, fmt.Errorf("rpc flag requires host:port value")
+			}
+			endpoints = append(endpoints, parseEndpointList(args[i+1])...)
 		} else if strings.HasPrefix(arg, "--rpc=") {
-			endpoints = parseEndpointList(strings.TrimPrefix(arg, "--rpc="))
+			endpoints = append(endpoints, parseEndpointList(strings.TrimPrefix(arg, "--rpc="))...)
 		} else if strings.HasPrefix(arg, "-rpc=") {
-			endpoints = parseEndpointList(strings.TrimPrefix(arg, "-rpc="))
+			endpoints = append(endpoints, parseEndpointList(strings.TrimPrefix(arg, "-rpc="))...)
 		}
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
var endpoints []string
for i, arg := range args {
if arg == "--rpc" || arg == "-rpc" {
if i+1 < len(args) {
endpoints = parseEndpointList(args[i+1])
}
} else if strings.HasPrefix(arg, "--rpc=") {
endpoints = parseEndpointList(strings.TrimPrefix(arg, "--rpc="))
} else if strings.HasPrefix(arg, "-rpc=") {
endpoints = parseEndpointList(strings.TrimPrefix(arg, "-rpc="))
}
}
// Validate each endpoint
for _, ep := range endpoints {
if _, _, err := net.SplitHostPort(ep); err != nil {
return nil, fmt.Errorf("invalid RPC endpoint %q: %w", ep, err)
}
}
return endpoints, nil
var endpoints []string
for i, arg := range args {
if arg == "--rpc" || arg == "-rpc" {
if i+1 >= len(args) {
return nil, fmt.Errorf("rpc flag requires host:port value")
}
endpoints = append(endpoints, parseEndpointList(args[i+1])...)
} else if strings.HasPrefix(arg, "--rpc=") {
endpoints = append(endpoints, parseEndpointList(strings.TrimPrefix(arg, "--rpc="))...)
} else if strings.HasPrefix(arg, "-rpc=") {
endpoints = append(endpoints, parseEndpointList(strings.TrimPrefix(arg, "-rpc="))...)
}
}
// Validate each endpoint
for _, ep := range endpoints {
if _, _, err := net.SplitHostPort(ep); err != nil {
return nil, fmt.Errorf("invalid RPC endpoint %q: %w", ep, err)
}
}
return endpoints, nil
🤖 Prompt for AI Agents
In `@proxy/config/config.go` around lines 545 - 565, The loop that parses RPC
flags currently overwrites endpoints and silently accepts flags with no value;
update the logic so that encountering "--rpc" or "-rpc" without a following
value returns an error, and when parsing "--rpc=..." or "-rpc=..." or the next
arg value you append parsed endpoints to the existing endpoints slice instead of
replacing it; modify the block that calls parseEndpointList (referenced here as
parseEndpointList and the endpoints variable) to append results (e.g., endpoints
= append(endpoints, parseEndpointList(... )...)) and return an explicit error
when a flag is present but has no value.

}

func parseEndpointList(s string) []string {
s = strings.TrimSpace(s)

// Strip surrounding quotes (both single and double) from the whole string
// if they match. This handles cases like: "host:port,host2:port2"
if len(s) >= 2 {
if (s[0] == '\'' && s[len(s)-1] == '\'') || (s[0] == '"' && s[len(s)-1] == '"') {
s = s[1 : len(s)-1]
}
}

parts := strings.Split(s, ",")
var result []string
for _, p := range parts {
p = strings.TrimSpace(p)
// Strip any remaining leading/trailing quotes from individual parts
// This handles Windows where shlex doesn't handle single quotes and
// may split 'host:port, host2:port' into "'host:port," and "host2:port'"
p = strings.Trim(p, "'\"")
if p != "" {
result = append(result, p)
}
}
return result
}

func StripComments(cmdStr string) string {
var cleanedLines []string
for _, line := range strings.Split(cmdStr, "\n") {
Expand Down
105 changes: 105 additions & 0 deletions proxy/config/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1373,3 +1373,108 @@ models:
})

}

func TestParseRPCEndpoints_ValidFormats(t *testing.T) {
tests := []struct {
name string
cmd string
expected []string
}{
{
name: "single endpoint with --rpc",
cmd: "llama-server --rpc localhost:50051 -ngl 99",
expected: []string{"localhost:50051"},
},
{
name: "single endpoint with --rpc=",
cmd: "llama-server --rpc=192.168.1.100:50051 -ngl 99",
expected: []string{"192.168.1.100:50051"},
},
{
name: "single endpoint with -rpc",
cmd: "llama-server -rpc localhost:50051 -ngl 99",
expected: []string{"localhost:50051"},
},
{
name: "single endpoint with -rpc=",
cmd: "llama-server -rpc=localhost:50051 -ngl 99",
expected: []string{"localhost:50051"},
},
{
name: "multiple endpoints comma-separated",
cmd: "llama-server --rpc 192.168.1.10:50051,192.168.1.11:50051 -ngl 99",
expected: []string{"192.168.1.10:50051", "192.168.1.11:50051"},
},
{
name: "multiple endpoints with spaces trimmed",
cmd: "llama-server --rpc '192.168.1.10:50051, 192.168.1.11:50051' -ngl 99",
expected: []string{"192.168.1.10:50051", "192.168.1.11:50051"},
},
{
name: "IPv6 endpoint",
cmd: "llama-server --rpc [::1]:50051 -ngl 99",
expected: []string{"[::1]:50051"},
},
}

for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
endpoints, err := ParseRPCEndpoints(tt.cmd)
assert.NoError(t, err)
assert.Equal(t, tt.expected, endpoints)
})
}
}

func TestParseRPCEndpoints_NoRPCFlag(t *testing.T) {
cmd := "llama-server -ngl 99 -m model.gguf"
endpoints, err := ParseRPCEndpoints(cmd)
assert.NoError(t, err)
assert.Empty(t, endpoints)
}

func TestParseRPCEndpoints_InvalidFormats(t *testing.T) {
tests := []struct {
name string
cmd string
wantErr string
}{
{
name: "missing port",
cmd: "llama-server --rpc localhost -ngl 99",
wantErr: "invalid RPC endpoint",
},
{
name: "invalid host:port format",
cmd: "llama-server --rpc not-a-valid-endpoint -ngl 99",
wantErr: "invalid RPC endpoint",
},
}

for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
_, err := ParseRPCEndpoints(tt.cmd)
assert.Error(t, err)
assert.Contains(t, err.Error(), tt.wantErr)
})
}
}

func TestParseRPCEndpoints_EmptyEndpointsFiltered(t *testing.T) {
// Empty strings after commas are filtered out
cmd := "llama-server --rpc 'localhost:50051,,' -ngl 99"
endpoints, err := ParseRPCEndpoints(cmd)
assert.NoError(t, err)
assert.Equal(t, []string{"localhost:50051"}, endpoints)
}

func TestParseRPCEndpoints_MultilineCommand(t *testing.T) {
cmd := `llama-server \
--rpc localhost:50051 \
-ngl 99 \
-m model.gguf`

endpoints, err := ParseRPCEndpoints(cmd)
assert.NoError(t, err)
assert.Equal(t, []string{"localhost:50051"}, endpoints)
}
Comment on lines +1377 to +1480
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Rename ParseRPCEndpoints tests to follow TestConfig_* convention.

The new test names don’t follow the repository’s test naming pattern for config tests.

🛠️ Suggested rename
-func TestParseRPCEndpoints_ValidFormats(t *testing.T) {
+func TestConfig_ParseRPCEndpoints_ValidFormats(t *testing.T) {
@@
-func TestParseRPCEndpoints_NoRPCFlag(t *testing.T) {
+func TestConfig_ParseRPCEndpoints_NoRPCFlag(t *testing.T) {
@@
-func TestParseRPCEndpoints_InvalidFormats(t *testing.T) {
+func TestConfig_ParseRPCEndpoints_InvalidFormats(t *testing.T) {
@@
-func TestParseRPCEndpoints_EmptyEndpointsFiltered(t *testing.T) {
+func TestConfig_ParseRPCEndpoints_EmptyEndpointsFiltered(t *testing.T) {
@@
-func TestParseRPCEndpoints_MultilineCommand(t *testing.T) {
+func TestConfig_ParseRPCEndpoints_MultilineCommand(t *testing.T) {
Based on learnings: Applies to **/*_test.go : Follow test naming conventions like `TestProxyManager_`, `TestProcessGroup_`, etc.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func TestParseRPCEndpoints_ValidFormats(t *testing.T) {
tests := []struct {
name string
cmd string
expected []string
}{
{
name: "single endpoint with --rpc",
cmd: "llama-server --rpc localhost:50051 -ngl 99",
expected: []string{"localhost:50051"},
},
{
name: "single endpoint with --rpc=",
cmd: "llama-server --rpc=192.168.1.100:50051 -ngl 99",
expected: []string{"192.168.1.100:50051"},
},
{
name: "single endpoint with -rpc",
cmd: "llama-server -rpc localhost:50051 -ngl 99",
expected: []string{"localhost:50051"},
},
{
name: "single endpoint with -rpc=",
cmd: "llama-server -rpc=localhost:50051 -ngl 99",
expected: []string{"localhost:50051"},
},
{
name: "multiple endpoints comma-separated",
cmd: "llama-server --rpc 192.168.1.10:50051,192.168.1.11:50051 -ngl 99",
expected: []string{"192.168.1.10:50051", "192.168.1.11:50051"},
},
{
name: "multiple endpoints with spaces trimmed",
cmd: "llama-server --rpc '192.168.1.10:50051, 192.168.1.11:50051' -ngl 99",
expected: []string{"192.168.1.10:50051", "192.168.1.11:50051"},
},
{
name: "IPv6 endpoint",
cmd: "llama-server --rpc [::1]:50051 -ngl 99",
expected: []string{"[::1]:50051"},
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
endpoints, err := ParseRPCEndpoints(tt.cmd)
assert.NoError(t, err)
assert.Equal(t, tt.expected, endpoints)
})
}
}
func TestParseRPCEndpoints_NoRPCFlag(t *testing.T) {
cmd := "llama-server -ngl 99 -m model.gguf"
endpoints, err := ParseRPCEndpoints(cmd)
assert.NoError(t, err)
assert.Empty(t, endpoints)
}
func TestParseRPCEndpoints_InvalidFormats(t *testing.T) {
tests := []struct {
name string
cmd string
wantErr string
}{
{
name: "missing port",
cmd: "llama-server --rpc localhost -ngl 99",
wantErr: "invalid RPC endpoint",
},
{
name: "invalid host:port format",
cmd: "llama-server --rpc not-a-valid-endpoint -ngl 99",
wantErr: "invalid RPC endpoint",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
_, err := ParseRPCEndpoints(tt.cmd)
assert.Error(t, err)
assert.Contains(t, err.Error(), tt.wantErr)
})
}
}
func TestParseRPCEndpoints_EmptyEndpointsFiltered(t *testing.T) {
// Empty strings after commas are filtered out
cmd := "llama-server --rpc 'localhost:50051,,' -ngl 99"
endpoints, err := ParseRPCEndpoints(cmd)
assert.NoError(t, err)
assert.Equal(t, []string{"localhost:50051"}, endpoints)
}
func TestParseRPCEndpoints_MultilineCommand(t *testing.T) {
cmd := `llama-server \
--rpc localhost:50051 \
-ngl 99 \
-m model.gguf`
endpoints, err := ParseRPCEndpoints(cmd)
assert.NoError(t, err)
assert.Equal(t, []string{"localhost:50051"}, endpoints)
}
func TestConfig_ParseRPCEndpoints_ValidFormats(t *testing.T) {
tests := []struct {
name string
cmd string
expected []string
}{
{
name: "single endpoint with --rpc",
cmd: "llama-server --rpc localhost:50051 -ngl 99",
expected: []string{"localhost:50051"},
},
{
name: "single endpoint with --rpc=",
cmd: "llama-server --rpc=192.168.1.100:50051 -ngl 99",
expected: []string{"192.168.1.100:50051"},
},
{
name: "single endpoint with -rpc",
cmd: "llama-server -rpc localhost:50051 -ngl 99",
expected: []string{"localhost:50051"},
},
{
name: "single endpoint with -rpc=",
cmd: "llama-server -rpc=localhost:50051 -ngl 99",
expected: []string{"localhost:50051"},
},
{
name: "multiple endpoints comma-separated",
cmd: "llama-server --rpc 192.168.1.10:50051,192.168.1.11:50051 -ngl 99",
expected: []string{"192.168.1.10:50051", "192.168.1.11:50051"},
},
{
name: "multiple endpoints with spaces trimmed",
cmd: "llama-server --rpc '192.168.1.10:50051, 192.168.1.11:50051' -ngl 99",
expected: []string{"192.168.1.10:50051", "192.168.1.11:50051"},
},
{
name: "IPv6 endpoint",
cmd: "llama-server --rpc [::1]:50051 -ngl 99",
expected: []string{"[::1]:50051"},
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
endpoints, err := ParseRPCEndpoints(tt.cmd)
assert.NoError(t, err)
assert.Equal(t, tt.expected, endpoints)
})
}
}
func TestConfig_ParseRPCEndpoints_NoRPCFlag(t *testing.T) {
cmd := "llama-server -ngl 99 -m model.gguf"
endpoints, err := ParseRPCEndpoints(cmd)
assert.NoError(t, err)
assert.Empty(t, endpoints)
}
func TestConfig_ParseRPCEndpoints_InvalidFormats(t *testing.T) {
tests := []struct {
name string
cmd string
wantErr string
}{
{
name: "missing port",
cmd: "llama-server --rpc localhost -ngl 99",
wantErr: "invalid RPC endpoint",
},
{
name: "invalid host:port format",
cmd: "llama-server --rpc not-a-valid-endpoint -ngl 99",
wantErr: "invalid RPC endpoint",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
_, err := ParseRPCEndpoints(tt.cmd)
assert.Error(t, err)
assert.Contains(t, err.Error(), tt.wantErr)
})
}
}
func TestConfig_ParseRPCEndpoints_EmptyEndpointsFiltered(t *testing.T) {
// Empty strings after commas are filtered out
cmd := "llama-server --rpc 'localhost:50051,,' -ngl 99"
endpoints, err := ParseRPCEndpoints(cmd)
assert.NoError(t, err)
assert.Equal(t, []string{"localhost:50051"}, endpoints)
}
func TestConfig_ParseRPCEndpoints_MultilineCommand(t *testing.T) {
cmd := `llama-server \
--rpc localhost:50051 \
-ngl 99 \
-m model.gguf`
endpoints, err := ParseRPCEndpoints(cmd)
assert.NoError(t, err)
assert.Equal(t, []string{"localhost:50051"}, endpoints)
}
🤖 Prompt for AI Agents
In `@proxy/config/config_test.go` around lines 1377 - 1480, Rename the tests to
follow the repository's config test naming convention by adding a TestConfig_
prefix to each test that targets ParseRPCEndpoints: rename
TestParseRPCEndpoints_ValidFormats to TestConfig_ParseRPCEndpoints_ValidFormats,
TestParseRPCEndpoints_NoRPCFlag to TestConfig_ParseRPCEndpoints_NoRPCFlag,
TestParseRPCEndpoints_InvalidFormats to
TestConfig_ParseRPCEndpoints_InvalidFormats,
TestParseRPCEndpoints_EmptyEndpointsFiltered to
TestConfig_ParseRPCEndpoints_EmptyEndpointsFiltered, and
TestParseRPCEndpoints_MultilineCommand to
TestConfig_ParseRPCEndpoints_MultilineCommand so the tests remain clearly
associated with the ParseRPCEndpoints function and follow the TestConfig_*
convention.

8 changes: 8 additions & 0 deletions proxy/config/model_config.go
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,12 @@ type ModelConfig struct {

// override global setting
SendLoadingState *bool `yaml:"sendLoadingState"`

// RPC health checking
RPCHealthCheck bool `yaml:"rpcHealthCheck"`
// Maximum time in seconds for a request to complete before killing the process
// 0 means no timeout (default)
RequestTimeout int `yaml:"requestTimeout"`
}

func (m *ModelConfig) UnmarshalYAML(unmarshal func(interface{}) error) error {
Expand All @@ -53,6 +59,8 @@ func (m *ModelConfig) UnmarshalYAML(unmarshal func(interface{}) error) error {
ConcurrencyLimit: 0,
Name: "",
Description: "",
RPCHealthCheck: false,
RequestTimeout: 0,
}

// the default cmdStop to taskkill /f /t /pid ${PID}
Expand Down
Loading
Loading