feat: expose llms with rpc only when rpc peers available#494
feat: expose llms with rpc only when rpc peers available#494overcuriousity wants to merge 18 commits intomostlygeek:mainfrom
Conversation
Fix parseEndpointList to handle single and double quotes that are treated as literal characters on Windows. - Strip surrounding quotes before parsing comma-separated endpoints - Fixes test failures on Windows CI Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
WalkthroughThis PR introduces RPC health checking for distributed inference, per-model request timeouts, and a configuration management UI. Changes span configuration schema updates, Go backend RPC health monitoring with TCP checks, process timeout enforcement, new API endpoints for runtime config updates, and a new Svelte UI component with CodeMirror-based YAML editing. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
This is a neat feature, the implementation looks straight forward and it's not something I'm interested in maintaining.:) I suggest maintaining it in your own fork so people can find it if they stumble onto this issue. |
|
Nice to hear! The other option would be to integrate it in your upstream application anyways, as the code changes are minimal, the feature is completely optional and i think there won´t be much maintenance necessary going forward I suspect. Even if it is, I am here to support, of course. Let me know what you think! |
RPC health checking now runs continuously from process creation until proxy shutdown, completely independent of whether the model is loaded, starting, stopped, or in any other state. - Start health checker in NewProcess when rpcHealthCheck is enabled - Remove stopRPCHealthChecker - only stops on proxy shutdown - Remove state checks from health checker goroutine - Health status always reflects current RPC endpoint availability Previously, the health checker only ran while a process was in StateReady, causing stale health data when processes stopped. Now /v1/models always shows accurate RPC health regardless of model state. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
4cbe1e7 to
c17df42
Compare
Work in progress on web configuration feature. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The requestTimeout feature was not working because the timeout context was not connected to the HTTP request. When the timeout fired, it attempted to kill the process but the reverse proxy continued waiting for a response indefinitely. - Use context.WithTimeout() to create a timeout context for the HTTP request - Clone the request with the timeout context before proxying - When timeout fires, the HTTP request is immediately cancelled - Fix StopImmediately() to handle timeouts during model loading (StateStarting) - Add unit test to verify timeout behavior Before: requests would run for 60+ seconds despite requestTimeout: 20 After: requests terminate in exactly 20 seconds as configured Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add brief mention of requestTimeout feature in the customizable features section of README. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Feat web config
Feat timeout
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@config_embed.go`:
- Around line 7-12: GetConfigExampleYAML currently returns the package-level
slice configExampleYAML directly which allows callers to mutate the shared
backing array; import the bytes package and return a cloned copy using
bytes.Clone(configExampleYAML) from GetConfigExampleYAML so callers receive an
independent []byte, and add "bytes" to the imports if it's not present.
In `@docs/configuration.md`:
- Around line 75-81: Update the features table row for "hooks" to hyphenate the
compound adjective by replacing "event driven functionality" with "event-driven
functionality" so the description reads correctly; locate the table entry for
the `hooks` feature and adjust the phrase "event driven" to "event-driven".
In `@proxy/process_timeout_test.go`:
- Line 56: The test call to NewProcess is missing the required shutdownCtx
argument causing a compile error; update the invocation of NewProcess in the
test (the call that currently reads NewProcess("test-timeout", 30, cfg,
processLogger, proxyLogger)) to pass the proper shutdown context as the sixth
parameter (e.g., the existing shutdownCtx test variable or a context created for
the test such as context.Background()/context.WithCancel(...) if a dedicated
shutdownCtx isn't already defined) so the function signature matches
NewProcess(..., shutdownCtx).
In `@ui/package.json`:
- Around line 13-16: Update the vulnerable dependency versions in package.json:
bump "js-yaml" to ">=4.1.1" (e.g., "4.1.1"), and also update "@codemirror/state"
to ">=6.5.4" and "@codemirror/lang-yaml" to ">=6.1.2" (e.g., "6.5.4" and
"6.1.2"); if you have "@types/js-yaml" in devDependencies, bump it to a
compatible newer version as well; after changing these dependency specifiers run
your package manager (npm install or yarn install) to update node_modules and
the lockfile, then run tests/lint to ensure nothing breaks and commit the
updated package.json and lockfile.
🧹 Nitpick comments (3)
proxy/config/config_test.go (1)
1313-1416: Rename new RPC endpoint tests to follow the project’s test naming convention.🔧 Suggested renames
-func TestParseRPCEndpoints_ValidFormats(t *testing.T) { +func TestConfig_ParseRPCEndpoints_ValidFormats(t *testing.T) { -func TestParseRPCEndpoints_NoRPCFlag(t *testing.T) { +func TestConfig_ParseRPCEndpoints_NoRPCFlag(t *testing.T) { -func TestParseRPCEndpoints_InvalidFormats(t *testing.T) { +func TestConfig_ParseRPCEndpoints_InvalidFormats(t *testing.T) { -func TestParseRPCEndpoints_EmptyEndpointsFiltered(t *testing.T) { +func TestConfig_ParseRPCEndpoints_EmptyEndpointsFiltered(t *testing.T) { -func TestParseRPCEndpoints_MultilineCommand(t *testing.T) { +func TestConfig_ParseRPCEndpoints_MultilineCommand(t *testing.T) {As per coding guidelines: Follow test naming conventions like
TestProxyManager_<test name>,TestProcessGroup_<test name>, etc.ui-svelte/src/routes/Config.svelte (1)
218-228: Consider cleaning up EditorView instances on component unmount.The EditorView instances are created but not destroyed when the component unmounts. This could cause a memory leak if users navigate away and return to this route multiple times.
♻️ Proposed fix to add cleanup
+ import { onMount, onDestroy } from "svelte"; + + onDestroy(() => { + editorView?.destroy(); + exampleView?.destroy(); + });Or alternatively, return cleanup functions from the
$effectblocks by modifying the editor creation logic.proxy/proxymanager_api.go (1)
291-318: Consider adding a request body size limit.The
io.ReadAll(c.Request.Body)on line 301 has no size limit. While the endpoint is protected by API key authentication, a malicious or buggy client could send an excessively large payload causing memory exhaustion.♻️ Proposed fix to limit request body size
+const maxConfigSize = 1 << 20 // 1MB limit for config files + func (pm *ProxyManager) apiUpdateConfig(c *gin.Context) { pm.Lock() configPath := pm.configPath pm.Unlock() if configPath == "" { pm.sendErrorResponse(c, http.StatusBadRequest, "Config file path not set") return } - body, err := io.ReadAll(c.Request.Body) + body, err := io.ReadAll(io.LimitReader(c.Request.Body, maxConfigSize)) if err != nil { pm.sendErrorResponse(c, http.StatusBadRequest, fmt.Sprintf("Failed to read request body: %v", err)) return } + if len(body) >= maxConfigSize { + pm.sendErrorResponse(c, http.StatusRequestEntityTooLarge, "Config file too large") + return + }
Adjust RPC health check parameters to reduce false positives when endpoints are under load and fix multiple security/correctness issues. - increase RPC health check timeout from 500ms to 3s to handle busy servers - decrease check interval from 30s to 10s for faster detection - fix process_timeout_test missing context parameter - fix config_embed to return cloned byte slice preventing mutation - update ui dependencies: js-yaml 4.1.1, @codemirror/state 6.5.4, @codemirror/lang-yaml 6.1.2 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fix ParseRPCEndpoints to handle Windows shlex behavior where single-quoted strings with spaces are split into multiple arguments. - Collect all non-flag arguments after --rpc flag - Join them with space before parsing endpoint list - Fixes test failure: TestParseRPCEndpoints_ValidFormats/multiple_endpoints_with_spaces_trimmed Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
I/O timeout errors now don't mark RPC endpoints as unhealthy. Timeouts are logged at debug level and health state is preserved. - detect timeout errors using net.Error.Timeout() - add test for timeout handling behavior Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
I think this feature would be best maintained in your own fork. :) |
My use-case involves doing distributed inference across multiple nodes (native llama.cpp feature).
However, this might disrupt end user UX, as if not enough peers are available and the LLM does not load successfully, it is still displayed to users as available through the /models API endoint.
Practically this can be noticed in something like open-webui.
Solution:
Add a boolean option in the config, which when true and rpc is configured, sets up a watchdog which checks rpc availability. If unavailable, the model is not exposed through the /models API endpoint.
Summary by CodeRabbit
Release Notes
New Features
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.