Skip to content

feat: expose llms with rpc only when rpc peers available#494

Closed
overcuriousity wants to merge 18 commits intomostlygeek:mainfrom
overcuriousity:feat--conditional-rpc-healthcheck
Closed

feat: expose llms with rpc only when rpc peers available#494
overcuriousity wants to merge 18 commits intomostlygeek:mainfrom
overcuriousity:feat--conditional-rpc-healthcheck

Conversation

@overcuriousity
Copy link

@overcuriousity overcuriousity commented Jan 30, 2026

My use-case involves doing distributed inference across multiple nodes (native llama.cpp feature).

However, this might disrupt end user UX, as if not enough peers are available and the LLM does not load successfully, it is still displayed to users as available through the /models API endoint.
Practically this can be noticed in something like open-webui.

Solution:
Add a boolean option in the config, which when true and rpc is configured, sets up a watchdog which checks rpc availability. If unavailable, the model is not exposed through the /models API endpoint.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added RPC health checking for distributed inference model monitoring
    • Added per-model request timeout configuration to control inference execution duration
    • Added configuration management API endpoints for runtime config updates
    • Added YAML configuration editor UI for simplified configuration management
  • Documentation

    • Updated configuration documentation with new configuration options

✏️ Tip: You can customize this high-level summary in your review settings.

overcuriousity and others added 3 commits January 30, 2026 15:22
Fix parseEndpointList to handle single and double quotes that are
treated as literal characters on Windows.

- Strip surrounding quotes before parsing comma-separated endpoints
- Fixes test failures on Windows CI

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Jan 30, 2026

Walkthrough

This PR introduces RPC health checking for distributed inference, per-model request timeouts, and a configuration management UI. Changes span configuration schema updates, Go backend RPC health monitoring with TCP checks, process timeout enforcement, new API endpoints for runtime config updates, and a new Svelte UI component with CodeMirror-based YAML editing.

Changes

Cohort / File(s) Summary
Configuration Schema & Examples
README.md, config-schema.json, config.example.yaml
Added requestTimeout (per-request termination timeout), rpcHealthCheck (TCP health checks for RPC endpoints), and unlisted (model visibility control) configuration fields; added qwen-distributed example model; updated documentation for idle-based unloading and health monitoring.
Configuration Embedding
config_embed.go
New file embedding config.example.yaml as a byte slice with public accessor GetConfigExampleYAML() to support runtime config retrieval.
Documentation
docs/configuration.md
Added requestTimeout and rpcHealthCheck feature entries with detailed explanations of timeout behavior, GPU protection, and RPC health monitoring mechanics.
RPC Endpoint Parsing
proxy/config/config.go, proxy/config/config_test.go
Introduced ParseRPCEndpoints() public function to extract and validate RPC endpoints from command strings; supports multiple flag formats (--rpc, --rpc=, -rpc, -rpc=); added comprehensive unit tests covering valid/invalid formats and edge cases.
Model Configuration
proxy/config/model_config.go
Added RPCHealthCheck and RequestTimeout fields to ModelConfig struct with YAML marshaling support and proper defaulting (false/0).
Process Lifecycle & RPC Health
proxy/process.go, proxy/process_test.go, proxy/process_rpc_health_test.go
Implemented RPC health checking infrastructure with background TCP health monitor; added request timeout enforcement with forced process termination; extended NewProcess to accept shutdownCtx and modelConfig; updated all test calls with context parameter; added tests validating health state tracking, timeout behavior, and endpoint parsing.
Request Timeout Testing
proxy/process_timeout_test.go
Added tests validating request timeout enforcement, process termination within bounds, and logging of timeout-driven stops using mock streaming HTTP server.
ProcessGroup Context Support
proxy/processgroup.go, proxy/processgroup_test.go
Added shutdownCtx field to ProcessGroup and propagated it through NewProcess initialization; updated all test calls with context parameter.
ProxyManager Core Changes
proxy/proxymanager.go, llama-swap.go
Added configPath and configExample fields to ProxyManager; introduced runtime model filtering to exclude models with unhealthy RPC endpoints from /v1/models responses; added health check pre-validation in inference handler returning 503 on unhealthy endpoints; unconditionally enabled API-triggered config reloads via SetConfigPath() and SetConfigExample().
Configuration API Endpoints
proxy/proxymanager_api.go
Added three new authenticated API endpoints: GET /api/config/current (fetch active config), GET /api/config/example (fetch example config), and POST /api/config (update and reload config); integrated ConfigFileChangedEvent emission for triggering reloads.
UI Dependencies & Routing
ui-svelte/package.json, ui/package.json, ui-svelte/src/App.svelte, ui-svelte/src/components/Header.svelte
Added CodeMirror (YAML syntax highlighting) and js-yaml dependencies; registered new "/config" route and header navigation link.
Configuration UI Component
ui-svelte/src/routes/Config.svelte
New Svelte component with synchronized dual YAML editors (current/example configs), real-time validation, theme-aware CodeMirror integration, import/export functionality, and POST-based save flow with app reload on success.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

enhancement

Suggested reviewers

  • mostlygeek
🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 24.39% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main feature: conditional exposure of LLMs based on RPC peer availability.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mostlygeek
Copy link
Owner

mostlygeek commented Jan 30, 2026

This is a neat feature, the implementation looks straight forward and it's not something I'm interested in maintaining.:)

I suggest maintaining it in your own fork so people can find it if they stumble onto this issue.

@overcuriousity
Copy link
Author

Nice to hear!
I think there are two ways going forward: I can maintain it in my fork, but then it would be beneficial if we could reference that fork somehow, because people won´t easily find it if its just in my random fork.

The other option would be to integrate it in your upstream application anyways, as the code changes are minimal, the feature is completely optional and i think there won´t be much maintenance necessary going forward I suspect. Even if it is, I am here to support, of course.

Let me know what you think!

overcuriousity and others added 4 commits January 30, 2026 18:12
RPC health checking now runs continuously from process creation until
proxy shutdown, completely independent of whether the model is loaded,
starting, stopped, or in any other state.

- Start health checker in NewProcess when rpcHealthCheck is enabled
- Remove stopRPCHealthChecker - only stops on proxy shutdown
- Remove state checks from health checker goroutine
- Health status always reflects current RPC endpoint availability

Previously, the health checker only ran while a process was in StateReady,
causing stale health data when processes stopped. Now /v1/models always
shows accurate RPC health regardless of model state.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@overcuriousity overcuriousity force-pushed the feat--conditional-rpc-healthcheck branch from 4cbe1e7 to c17df42 Compare January 30, 2026 23:34
overcuriousity and others added 6 commits January 30, 2026 23:45
Work in progress on web configuration feature.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The requestTimeout feature was not working because the timeout
context was not connected to the HTTP request. When the timeout
fired, it attempted to kill the process but the reverse proxy
continued waiting for a response indefinitely.

- Use context.WithTimeout() to create a timeout context for the HTTP request
- Clone the request with the timeout context before proxying
- When timeout fires, the HTTP request is immediately cancelled
- Fix StopImmediately() to handle timeouts during model loading (StateStarting)
- Add unit test to verify timeout behavior

Before: requests would run for 60+ seconds despite requestTimeout: 20
After: requests terminate in exactly 20 seconds as configured

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add brief mention of requestTimeout feature in the customizable
features section of README.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@overcuriousity overcuriousity marked this pull request as ready for review January 31, 2026 00:38
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@config_embed.go`:
- Around line 7-12: GetConfigExampleYAML currently returns the package-level
slice configExampleYAML directly which allows callers to mutate the shared
backing array; import the bytes package and return a cloned copy using
bytes.Clone(configExampleYAML) from GetConfigExampleYAML so callers receive an
independent []byte, and add "bytes" to the imports if it's not present.

In `@docs/configuration.md`:
- Around line 75-81: Update the features table row for "hooks" to hyphenate the
compound adjective by replacing "event driven functionality" with "event-driven
functionality" so the description reads correctly; locate the table entry for
the `hooks` feature and adjust the phrase "event driven" to "event-driven".

In `@proxy/process_timeout_test.go`:
- Line 56: The test call to NewProcess is missing the required shutdownCtx
argument causing a compile error; update the invocation of NewProcess in the
test (the call that currently reads NewProcess("test-timeout", 30, cfg,
processLogger, proxyLogger)) to pass the proper shutdown context as the sixth
parameter (e.g., the existing shutdownCtx test variable or a context created for
the test such as context.Background()/context.WithCancel(...) if a dedicated
shutdownCtx isn't already defined) so the function signature matches
NewProcess(..., shutdownCtx).

In `@ui/package.json`:
- Around line 13-16: Update the vulnerable dependency versions in package.json:
bump "js-yaml" to ">=4.1.1" (e.g., "4.1.1"), and also update "@codemirror/state"
to ">=6.5.4" and "@codemirror/lang-yaml" to ">=6.1.2" (e.g., "6.5.4" and
"6.1.2"); if you have "@types/js-yaml" in devDependencies, bump it to a
compatible newer version as well; after changing these dependency specifiers run
your package manager (npm install or yarn install) to update node_modules and
the lockfile, then run tests/lint to ensure nothing breaks and commit the
updated package.json and lockfile.
🧹 Nitpick comments (3)
proxy/config/config_test.go (1)

1313-1416: Rename new RPC endpoint tests to follow the project’s test naming convention.

🔧 Suggested renames
-func TestParseRPCEndpoints_ValidFormats(t *testing.T) {
+func TestConfig_ParseRPCEndpoints_ValidFormats(t *testing.T) {

-func TestParseRPCEndpoints_NoRPCFlag(t *testing.T) {
+func TestConfig_ParseRPCEndpoints_NoRPCFlag(t *testing.T) {

-func TestParseRPCEndpoints_InvalidFormats(t *testing.T) {
+func TestConfig_ParseRPCEndpoints_InvalidFormats(t *testing.T) {

-func TestParseRPCEndpoints_EmptyEndpointsFiltered(t *testing.T) {
+func TestConfig_ParseRPCEndpoints_EmptyEndpointsFiltered(t *testing.T) {

-func TestParseRPCEndpoints_MultilineCommand(t *testing.T) {
+func TestConfig_ParseRPCEndpoints_MultilineCommand(t *testing.T) {

As per coding guidelines: Follow test naming conventions like TestProxyManager_<test name>, TestProcessGroup_<test name>, etc.

ui-svelte/src/routes/Config.svelte (1)

218-228: Consider cleaning up EditorView instances on component unmount.

The EditorView instances are created but not destroyed when the component unmounts. This could cause a memory leak if users navigate away and return to this route multiple times.

♻️ Proposed fix to add cleanup
+  import { onMount, onDestroy } from "svelte";
+
+  onDestroy(() => {
+    editorView?.destroy();
+    exampleView?.destroy();
+  });

Or alternatively, return cleanup functions from the $effect blocks by modifying the editor creation logic.

proxy/proxymanager_api.go (1)

291-318: Consider adding a request body size limit.

The io.ReadAll(c.Request.Body) on line 301 has no size limit. While the endpoint is protected by API key authentication, a malicious or buggy client could send an excessively large payload causing memory exhaustion.

♻️ Proposed fix to limit request body size
+const maxConfigSize = 1 << 20 // 1MB limit for config files
+
 func (pm *ProxyManager) apiUpdateConfig(c *gin.Context) {
 	pm.Lock()
 	configPath := pm.configPath
 	pm.Unlock()

 	if configPath == "" {
 		pm.sendErrorResponse(c, http.StatusBadRequest, "Config file path not set")
 		return
 	}

-	body, err := io.ReadAll(c.Request.Body)
+	body, err := io.ReadAll(io.LimitReader(c.Request.Body, maxConfigSize))
 	if err != nil {
 		pm.sendErrorResponse(c, http.StatusBadRequest, fmt.Sprintf("Failed to read request body: %v", err))
 		return
 	}
+	if len(body) >= maxConfigSize {
+		pm.sendErrorResponse(c, http.StatusRequestEntityTooLarge, "Config file too large")
+		return
+	}

Overcuriousity and others added 5 commits January 31, 2026 18:50
Adjust RPC health check parameters to reduce false positives when endpoints
are under load and fix multiple security/correctness issues.

- increase RPC health check timeout from 500ms to 3s to handle busy servers
- decrease check interval from 30s to 10s for faster detection
- fix process_timeout_test missing context parameter
- fix config_embed to return cloned byte slice preventing mutation
- update ui dependencies: js-yaml 4.1.1, @codemirror/state 6.5.4, @codemirror/lang-yaml 6.1.2

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fix ParseRPCEndpoints to handle Windows shlex behavior where
single-quoted strings with spaces are split into multiple arguments.

- Collect all non-flag arguments after --rpc flag
- Join them with space before parsing endpoint list
- Fixes test failure: TestParseRPCEndpoints_ValidFormats/multiple_endpoints_with_spaces_trimmed

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
I/O timeout errors now don't mark RPC endpoints as unhealthy.
Timeouts are logged at debug level and health state is preserved.

- detect timeout errors using net.Error.Timeout()
- add test for timeout handling behavior

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@mostlygeek
Copy link
Owner

I think this feature would be best maintained in your own fork. :)

@mostlygeek mostlygeek closed this Feb 1, 2026
@overcuriousity overcuriousity deleted the feat--conditional-rpc-healthcheck branch March 5, 2026 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants