feat: expose llms with rpc only when rpc peers available by overcuriousity · Pull Request #494 · mostlygeek/llama-swap

overcuriousity · 2026-01-30T15:20:38Z

My use-case involves doing distributed inference across multiple nodes (native llama.cpp feature).

However, this might disrupt end user UX, as if not enough peers are available and the LLM does not load successfully, it is still displayed to users as available through the /models API endoint.
Practically this can be noticed in something like open-webui.

Solution:
Add a boolean option in the config, which when true and rpc is configured, sets up a watchdog which checks rpc availability. If unavailable, the model is not exposed through the /models API endpoint.

Summary by CodeRabbit

Release Notes

New Features
- Added RPC health checking for distributed inference model monitoring
- Added per-model request timeout configuration to control inference execution duration
- Added configuration management API endpoints for runtime config updates
- Added YAML configuration editor UI for simplified configuration management
Documentation
- Updated configuration documentation with new configuration options

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Fix parseEndpointList to handle single and double quotes that are treated as literal characters on Windows. - Strip surrounding quotes before parsing comma-separated endpoints - Fixes test failures on Windows CI Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

coderabbitai · 2026-01-30T15:28:32Z

Walkthrough

This PR introduces RPC health checking for distributed inference, per-model request timeouts, and a configuration management UI. Changes span configuration schema updates, Go backend RPC health monitoring with TCP checks, process timeout enforcement, new API endpoints for runtime config updates, and a new Svelte UI component with CodeMirror-based YAML editing.

Changes

Cohort / File(s)	Summary
Configuration Schema & Examples `README.md`, `config-schema.json`, `config.example.yaml`	Added requestTimeout (per-request termination timeout), rpcHealthCheck (TCP health checks for RPC endpoints), and unlisted (model visibility control) configuration fields; added qwen-distributed example model; updated documentation for idle-based unloading and health monitoring.
Configuration Embedding `config_embed.go`	New file embedding config.example.yaml as a byte slice with public accessor GetConfigExampleYAML() to support runtime config retrieval.
Documentation `docs/configuration.md`	Added requestTimeout and rpcHealthCheck feature entries with detailed explanations of timeout behavior, GPU protection, and RPC health monitoring mechanics.
RPC Endpoint Parsing `proxy/config/config.go`, `proxy/config/config_test.go`	Introduced ParseRPCEndpoints() public function to extract and validate RPC endpoints from command strings; supports multiple flag formats (--rpc, --rpc=, -rpc, -rpc=); added comprehensive unit tests covering valid/invalid formats and edge cases.
Model Configuration `proxy/config/model_config.go`	Added RPCHealthCheck and RequestTimeout fields to ModelConfig struct with YAML marshaling support and proper defaulting (false/0).
Process Lifecycle & RPC Health `proxy/process.go`, `proxy/process_test.go`, `proxy/process_rpc_health_test.go`	Implemented RPC health checking infrastructure with background TCP health monitor; added request timeout enforcement with forced process termination; extended NewProcess to accept shutdownCtx and modelConfig; updated all test calls with context parameter; added tests validating health state tracking, timeout behavior, and endpoint parsing.
Request Timeout Testing `proxy/process_timeout_test.go`	Added tests validating request timeout enforcement, process termination within bounds, and logging of timeout-driven stops using mock streaming HTTP server.
ProcessGroup Context Support `proxy/processgroup.go`, `proxy/processgroup_test.go`	Added shutdownCtx field to ProcessGroup and propagated it through NewProcess initialization; updated all test calls with context parameter.
ProxyManager Core Changes `proxy/proxymanager.go`, `llama-swap.go`	Added configPath and configExample fields to ProxyManager; introduced runtime model filtering to exclude models with unhealthy RPC endpoints from /v1/models responses; added health check pre-validation in inference handler returning 503 on unhealthy endpoints; unconditionally enabled API-triggered config reloads via SetConfigPath() and SetConfigExample().
Configuration API Endpoints `proxy/proxymanager_api.go`	Added three new authenticated API endpoints: GET /api/config/current (fetch active config), GET /api/config/example (fetch example config), and POST /api/config (update and reload config); integrated ConfigFileChangedEvent emission for triggering reloads.
UI Dependencies & Routing `ui-svelte/package.json`, `ui/package.json`, `ui-svelte/src/App.svelte`, `ui-svelte/src/components/Header.svelte`	Added CodeMirror (YAML syntax highlighting) and js-yaml dependencies; registered new "/config" route and header navigation link.
Configuration UI Component `ui-svelte/src/routes/Config.svelte`	New Svelte component with synchronized dual YAML editors (current/example configs), real-time validation, theme-aware CodeMirror integration, import/export functionality, and POST-based save flow with app reload on success.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

ui: add Svelte port of React UI #487: Extends the same ui-svelte Svelte UI with Config route/component and CodeMirror dependencies introduced by this PR.
feat: config hot-reload #106: Both PRs modify config hot-reload and proxy manager/server lifecycle behavior (llama-swap.go and proxy manager initialization patterns).
ui-svelte: add Playground page with chat interface #497: Both PRs update UI routing in App.svelte and Header.svelte to add new routes and navigation items (/config in this PR, /playground in PR#497).

Suggested labels

enhancement

Suggested reviewers

mostlygeek

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 24.39% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main feature: conditional exposure of LLMs based on RPC peer availability.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

mostlygeek · 2026-01-30T16:31:01Z

This is a neat feature, the implementation looks straight forward and it's not something I'm interested in maintaining.:)

I suggest maintaining it in your own fork so people can find it if they stumble onto this issue.

overcuriousity · 2026-01-30T16:58:08Z

Nice to hear!
I think there are two ways going forward: I can maintain it in my fork, but then it would be beneficial if we could reference that fork somehow, because people won´t easily find it if its just in my random fork.

The other option would be to integrate it in your upstream application anyways, as the code changes are minimal, the feature is completely optional and i think there won´t be much maintenance necessary going forward I suspect. Even if it is, I am here to support, of course.

Let me know what you think!

RPC health checking now runs continuously from process creation until proxy shutdown, completely independent of whether the model is loaded, starting, stopped, or in any other state. - Start health checker in NewProcess when rpcHealthCheck is enabled - Remove stopRPCHealthChecker - only stops on proxy shutdown - Remove state checks from health checker goroutine - Health status always reflects current RPC endpoint availability Previously, the health checker only ran while a process was in StateReady, causing stale health data when processes stopped. Now /v1/models always shows accurate RPC health regardless of model state. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Work in progress on web configuration feature. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

The requestTimeout feature was not working because the timeout context was not connected to the HTTP request. When the timeout fired, it attempted to kill the process but the reverse proxy continued waiting for a response indefinitely. - Use context.WithTimeout() to create a timeout context for the HTTP request - Clone the request with the timeout context before proxying - When timeout fires, the HTTP request is immediately cancelled - Fix StopImmediately() to handle timeouts during model loading (StateStarting) - Add unit test to verify timeout behavior Before: requests would run for 60+ seconds despite requestTimeout: 20 After: requests terminate in exactly 20 seconds as configured Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Add brief mention of requestTimeout feature in the customizable features section of README. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Feat web config

Feat timeout

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In `@config_embed.go`:
- Around line 7-12: GetConfigExampleYAML currently returns the package-level
slice configExampleYAML directly which allows callers to mutate the shared
backing array; import the bytes package and return a cloned copy using
bytes.Clone(configExampleYAML) from GetConfigExampleYAML so callers receive an
independent []byte, and add "bytes" to the imports if it's not present.

In `@docs/configuration.md`:
- Around line 75-81: Update the features table row for "hooks" to hyphenate the
compound adjective by replacing "event driven functionality" with "event-driven
functionality" so the description reads correctly; locate the table entry for
the `hooks` feature and adjust the phrase "event driven" to "event-driven".

In `@proxy/process_timeout_test.go`:
- Line 56: The test call to NewProcess is missing the required shutdownCtx
argument causing a compile error; update the invocation of NewProcess in the
test (the call that currently reads NewProcess("test-timeout", 30, cfg,
processLogger, proxyLogger)) to pass the proper shutdown context as the sixth
parameter (e.g., the existing shutdownCtx test variable or a context created for
the test such as context.Background()/context.WithCancel(...) if a dedicated
shutdownCtx isn't already defined) so the function signature matches
NewProcess(..., shutdownCtx).

In `@ui/package.json`:
- Around line 13-16: Update the vulnerable dependency versions in package.json:
bump "js-yaml" to ">=4.1.1" (e.g., "4.1.1"), and also update "@codemirror/state"
to ">=6.5.4" and "@codemirror/lang-yaml" to ">=6.1.2" (e.g., "6.5.4" and
"6.1.2"); if you have "@types/js-yaml" in devDependencies, bump it to a
compatible newer version as well; after changing these dependency specifiers run
your package manager (npm install or yarn install) to update node_modules and
the lockfile, then run tests/lint to ensure nothing breaks and commit the
updated package.json and lockfile.

🧹 Nitpick comments (3)

proxy/config/config_test.go (1)
1313-1416: Rename new RPC endpoint tests to follow the project’s test naming convention.
🔧 Suggested renames
-func TestParseRPCEndpoints_ValidFormats(t *testing.T) {
+func TestConfig_ParseRPCEndpoints_ValidFormats(t *testing.T) {

-func TestParseRPCEndpoints_NoRPCFlag(t *testing.T) {
+func TestConfig_ParseRPCEndpoints_NoRPCFlag(t *testing.T) {

-func TestParseRPCEndpoints_InvalidFormats(t *testing.T) {
+func TestConfig_ParseRPCEndpoints_InvalidFormats(t *testing.T) {

-func TestParseRPCEndpoints_EmptyEndpointsFiltered(t *testing.T) {
+func TestConfig_ParseRPCEndpoints_EmptyEndpointsFiltered(t *testing.T) {

-func TestParseRPCEndpoints_MultilineCommand(t *testing.T) {
+func TestConfig_ParseRPCEndpoints_MultilineCommand(t *testing.T) {
As per coding guidelines: Follow test naming conventions like TestProxyManager_<test name>, TestProcessGroup_<test name>, etc.
ui-svelte/src/routes/Config.svelte (1)
218-228: Consider cleaning up EditorView instances on component unmount.

The EditorView instances are created but not destroyed when the component unmounts. This could cause a memory leak if users navigate away and return to this route multiple times.
♻️ Proposed fix to add cleanup
+  import { onMount, onDestroy } from "svelte";
+
+  onDestroy(() => {
+    editorView?.destroy();
+    exampleView?.destroy();
+  });
Or alternatively, return cleanup functions from the $effect blocks by modifying the editor creation logic.
proxy/proxymanager_api.go (1)
291-318: Consider adding a request body size limit.

The io.ReadAll(c.Request.Body) on line 301 has no size limit. While the endpoint is protected by API key authentication, a malicious or buggy client could send an excessively large payload causing memory exhaustion.
♻️ Proposed fix to limit request body size
+const maxConfigSize = 1 << 20 // 1MB limit for config files
+
 func (pm *ProxyManager) apiUpdateConfig(c *gin.Context) {
 	pm.Lock()
 	configPath := pm.configPath
 	pm.Unlock()

 	if configPath == "" {
 		pm.sendErrorResponse(c, http.StatusBadRequest, "Config file path not set")
 		return
 	}

-	body, err := io.ReadAll(c.Request.Body)
+	body, err := io.ReadAll(io.LimitReader(c.Request.Body, maxConfigSize))
 	if err != nil {
 		pm.sendErrorResponse(c, http.StatusBadRequest, fmt.Sprintf("Failed to read request body: %v", err))
 		return
 	}
+	if len(body) >= maxConfigSize {
+		pm.sendErrorResponse(c, http.StatusRequestEntityTooLarge, "Config file too large")
+		return
+	}

config_embed.go

docs/configuration.md

proxy/process_timeout_test.go

ui/package.json

Adjust RPC health check parameters to reduce false positives when endpoints are under load and fix multiple security/correctness issues. - increase RPC health check timeout from 500ms to 3s to handle busy servers - decrease check interval from 30s to 10s for faster detection - fix process_timeout_test missing context parameter - fix config_embed to return cloned byte slice preventing mutation - update ui dependencies: js-yaml 4.1.1, @codemirror/state 6.5.4, @codemirror/lang-yaml 6.1.2 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Fix ParseRPCEndpoints to handle Windows shlex behavior where single-quoted strings with spaces are split into multiple arguments. - Collect all non-flag arguments after --rpc flag - Join them with space before parsing endpoint list - Fixes test failure: TestParseRPCEndpoints_ValidFormats/multiple_endpoints_with_spaces_trimmed Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

I/O timeout errors now don't mark RPC endpoints as unhealthy. Timeouts are logged at debug level and health state is preserved. - detect timeout errors using net.Error.Timeout() - add test for timeout handling behavior Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

mostlygeek · 2026-02-01T16:56:15Z

I think this feature would be best maintained in your own fork. :)

overcuriousity and others added 3 commits January 30, 2026 15:22

add timeout feature

3989c17

implement first draft of new feature

c34372c

overcuriousity and others added 4 commits January 30, 2026 18:12

fix unit test

ac074d1

rework web interface

c8f2761

fix error assumption healthy

6f023c7

overcuriousity force-pushed the feat--conditional-rpc-healthcheck branch from 4cbe1e7 to c17df42 Compare January 30, 2026 23:34

overcuriousity and others added 6 commits January 30, 2026 23:45

WIP: web config changes

4987daf

Work in progress on web configuration feature. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

docs: add requestTimeout to README features list

0e86bbc

Add brief mention of requestTimeout feature in the customizable features section of README. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Merge pull request #10 from overcuriousity/feat--web-config

97976a6

Feat web config

Merge pull request #11 from overcuriousity/feat--timeout

fc33fdf

Feat timeout

Merge branch 'new-features' into feat--conditional-rpc-healthcheck

88f02d7

overcuriousity marked this pull request as ready for review January 31, 2026 00:38

coderabbitai bot reviewed Jan 31, 2026

View reviewed changes

config_embed.go Outdated Show resolved Hide resolved

docs/configuration.md Show resolved Hide resolved

proxy/process_timeout_test.go Outdated Show resolved Hide resolved

ui/package.json Outdated Show resolved Hide resolved

Overcuriousity and others added 5 commits January 31, 2026 18:50

remove test config file

7ca1977

Merge branch 'mostlygeek:main' into feat--conditional-rpc-healthcheck

e762485

mostlygeek closed this Feb 1, 2026

overcuriousity deleted the feat--conditional-rpc-healthcheck branch March 5, 2026 12:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: expose llms with rpc only when rpc peers available#494

feat: expose llms with rpc only when rpc peers available#494
overcuriousity wants to merge 18 commits intomostlygeek:mainfrom
overcuriousity:feat--conditional-rpc-healthcheck

overcuriousity commented Jan 30, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 30, 2026 •

edited

Loading

Uh oh!

mostlygeek commented Jan 30, 2026 •

edited

Loading

Uh oh!

overcuriousity commented Jan 30, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mostlygeek commented Feb 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

overcuriousity commented Jan 30, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

mostlygeek commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

overcuriousity commented Jan 30, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mostlygeek commented Feb 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

overcuriousity commented Jan 30, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 30, 2026 •

edited

Loading

mostlygeek commented Jan 30, 2026 •

edited

Loading