in memory request logging and observability#503
in memory request logging and observability#503khimaros wants to merge 1 commit intomostlygeek:mainfrom
Conversation
WalkthroughAdds end-to-end request recording and streaming: new request monitor and events, captures request/response bodies in proxy handlers and metrics, exposes GET /api/requests and GET /api/requests/:id, and adds a Svelte Requests UI, types, and store integration. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@proxy/proxymanager.go`:
- Around line 739-740: The recorded request body can be arbitrarily large;
truncate the body to a safe cap (e.g. 1 MiB) before passing it to
pm.recordRequest to avoid unbounded in-memory growth—modify the call site around
recorder, done := pm.recordRequest(c, modelID, string(bodyBytes)) to pass a
capped/truncated string (or add truncation inside pm.recordRequest) so only the
first 1<<20 bytes are retained, and keep the recorder/done flow unchanged;
reference pm.recordRequest, recorder, done and the bodyBytes variable when
making the change.
- Around line 603-612: The read-failure branch must stop proxying and return an
error to the client: when io.ReadAll(c.Request.Body) returns an err, log the
error with pm.proxyLogger.Errorf and then abort the request handling (do not
continue to upstream) by sending an HTTP error response (e.g., 500) and
returning from the handler; do not attempt to proxy with a partially consumed
c.Request.Body or set requestBody in that case. Update the block around
io.ReadAll, c.Request.Body, requestBody and the surrounding handler logic to
perform this early return on error (use the framework's abort/return method such
as c.AbortWithStatus/AbortWithStatusJSON or equivalent).
- Around line 1055-1078: In recordRequest, the responseBodyCopier created by
newBodyCopier is never assigned to the Gin context writer so error responses
(e.g., those sent via sendErrorResponse) bypass it; fix by assigning the
recorder (responseBodyCopier) to c.Writer immediately after creation so it
implements gin.ResponseWriter and captures all writes (ensure recorder.onWrite
remains set and cleanup restores original writer if needed).
In `@ui-svelte/src/routes/Requests.svelte`:
- Around line 23-30: The current merge in selectedRequest ({ ...detailedRequest,
...fromList }) allows empty list fields to overwrite fetched detail bodies;
instead, keep detailedRequest as the source of truth for bodies and only pull
live status fields from the list. Update the selectedRequest derivation to merge
so detailedRequest properties win for request_body/response_body (e.g., merge
detailedRequest last) and, if fromList exists, copy only the live status fields
(like status, statusText or whatever live fields your app uses) from fromList
into the final object; reference the selectedRequest variable and the
detailedRequest/fromList identifiers when making this change.
- Around line 257-277: The clickable <tr> currently uses onclick with viewDetail
and blocks keyboard users; remove the row-level onclick and instead render a
native interactive element (a <button> or <a>) inside a <td> for each row entry
(e.g., wrap the row content in a full-width button inside the first or a
dedicated <td>), keep using viewDetail(req) as the click handler on that
element, style it with CSS (display:block; width:100%; padding:inherit) so it
visually spans the row, and preserve selection logic using selectedId and ARIA
attributes on the button (e.g., aria-pressed or aria-current) while keeping
existing helpers like formatRelativeTime and formatDuration unchanged.
🧹 Nitpick comments (3)
ui-svelte/src/components/JsonView.svelte (1)
16-22: Consider adding clipboard error handling.The
navigator.clipboard.writeTextcall can fail (e.g., in non-HTTPS contexts or when clipboard permissions are denied). Consider adding user feedback for success/failure.♻️ Optional: Add clipboard feedback
+<script lang="ts"> + let { content = "" } = $props(); + let copyStatus = $state<"idle" | "copied" | "error">("idle"); + + let formattedContent = $derived.by(() => { + try { + const obj = JSON.parse(content); + return JSON.stringify(obj, null, 2); + } catch (e) { + return content; + } + }); + + async function copyToClipboard() { + try { + await navigator.clipboard.writeText(formattedContent); + copyStatus = "copied"; + setTimeout(() => copyStatus = "idle", 1500); + } catch { + copyStatus = "error"; + setTimeout(() => copyStatus = "idle", 1500); + } + } +</script>Then update the button:
<button class="absolute top-2 right-2 p-1 bg-white/10 hover:bg-white/20 rounded text-xs opacity-0 group-hover:opacity-100 transition-opacity" - onclick={() => navigator.clipboard.writeText(formattedContent)} + onclick={copyToClipboard} title="Copy to clipboard" > - Copy + {copyStatus === "copied" ? "Copied!" : copyStatus === "error" ? "Failed" : "Copy"} </button>ui-svelte/src/stores/api.ts (1)
12-12: Consider adding a size limit to the requests store.Unlike
proxyLogswhich hasLOG_LENGTH_LIMIT, therequestsarray can grow unbounded during long sessions. If the backend sends many request events, this could consume significant browser memory over time.♻️ Suggested: Add requests limit
const LOG_LENGTH_LIMIT = 1024 * 100; /* 100KB of log data */ +const MAX_REQUESTS = 1000; /* Maximum number of requests to keep in memory */ // Stores export const models = writable<Model[]>([]);Then in the request handler:
case "request": { const req = JSON.parse(message.data) as RequestLog; requests.update((prev) => { const index = prev.findIndex((r) => r.id === req.id); if (index === -1) { - return [req, ...prev]; + const updated = [req, ...prev]; + return updated.length > MAX_REQUESTS ? updated.slice(0, MAX_REQUESTS) : updated; } else { const updated = [...prev]; updated[index] = req; return updated; } }); break; }proxy/request_monitor.go (1)
66-84: Consider clearingrespBufafter completion to reduce memory retention.
OnceResponseBodyis set, keeping the buffer doubles memory for large responses.♻️ Suggested tweak
e.Pending = false event.Emit(RequestEvent{Entry: *e}) + e.respBuf = bytes.Buffer{} return
| var requestBody string | ||
| if c.Request.ContentLength > 0 && c.Request.ContentLength < 1024*1024 { // Only capture small bodies | ||
| bodyBytes, err := io.ReadAll(c.Request.Body) | ||
| if err != nil { | ||
| pm.proxyLogger.Errorf("Error reading request body for recording: %v", err) | ||
| } else { | ||
| c.Request.Body = io.NopCloser(bytes.NewBuffer(bodyBytes)) | ||
| requestBody = string(bodyBytes) | ||
| } | ||
| } |
There was a problem hiding this comment.
Don’t continue proxying after a failed body read.
If io.ReadAll fails, the request body may be partially consumed, and the upstream call will see a truncated or empty body. Bail out early and return an error response.
🔧 Suggested fix
- bodyBytes, err := io.ReadAll(c.Request.Body)
- if err != nil {
- pm.proxyLogger.Errorf("Error reading request body for recording: %v", err)
- } else {
- c.Request.Body = io.NopCloser(bytes.NewBuffer(bodyBytes))
- requestBody = string(bodyBytes)
- }
+ bodyBytes, err := io.ReadAll(c.Request.Body)
+ if err != nil {
+ pm.sendErrorResponse(c, http.StatusBadRequest, "failed to read request body")
+ pm.proxyLogger.Errorf("Error reading request body for recording: %v", err)
+ return
+ }
+ c.Request.Body = io.NopCloser(bytes.NewBuffer(bodyBytes))
+ requestBody = string(bodyBytes)🤖 Prompt for AI Agents
In `@proxy/proxymanager.go` around lines 603 - 612, The read-failure branch must
stop proxying and return an error to the client: when io.ReadAll(c.Request.Body)
returns an err, log the error with pm.proxyLogger.Errorf and then abort the
request handling (do not continue to upstream) by sending an HTTP error response
(e.g., 500) and returning from the handler; do not attempt to proxy with a
partially consumed c.Request.Body or set requestBody in that case. Update the
block around io.ReadAll, c.Request.Body, requestBody and the surrounding handler
logic to perform this early return on error (use the framework's abort/return
method such as c.AbortWithStatus/AbortWithStatusJSON or equivalent).
| func (pm *ProxyManager) recordRequest(c *gin.Context, modelID string, requestBody string) (*responseBodyCopier, func()) { | ||
| startTime := time.Now() | ||
| requestID := pm.requestMonitor.Add(&RequestEntry{ | ||
| Timestamp: startTime, | ||
| Method: c.Request.Method, | ||
| Path: c.Request.URL.Path, | ||
| Model: modelID, | ||
| RequestBody: requestBody, | ||
| }) | ||
|
|
||
| recorder := newBodyCopier(c.Writer) | ||
| recorder.onWrite = func(b []byte) { | ||
| pm.requestMonitor.AppendResponse(requestID, string(b)) | ||
| } | ||
|
|
||
| return recorder, func() { | ||
| duration := time.Since(startTime) | ||
| respBody := "" | ||
| isStreaming := strings.Contains(recorder.Header().Get("Content-Type"), "text/event-stream") | ||
| if !isStreaming { | ||
| respBody = recorder.body.String() | ||
| } | ||
| pm.requestMonitor.Update(requestID, recorder.Status(), duration, respBody) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, let's locate the responseBodyCopier implementation
cd /tmp/repo 2>/dev/null || cd .
git ls-files | head -20
rg -l "responseBodyCopier|newBodyCopier" --type goRepository: mostlygeek/llama-swap
Length of output: 686
🏁 Script executed:
#!/bin/bash
# Find the responseBodyCopier type definition and newBodyCopier function
rg "type responseBodyCopier|newBodyCopier" -A 5 --type goRepository: mostlygeek/llama-swap
Length of output: 2756
🏁 Script executed:
#!/bin/bash
# Find all methods implemented by responseBodyCopier
rg "func \(.*\*?responseBodyCopier\)" --type goRepository: mostlygeek/llama-swap
Length of output: 388
🏁 Script executed:
#!/bin/bash
# Check how sendErrorResponse is implemented
rg "func.*sendErrorResponse" -A 20 --type goRepository: mostlygeek/llama-swap
Length of output: 1185
🏁 Script executed:
#!/bin/bash
# Find where recordRequest is called
rg "recordRequest" -B 3 -A 10 --type goRepository: mostlygeek/llama-swap
Length of output: 4222
🏁 Script executed:
#!/bin/bash
# Find the handler/middleware where recordRequest is used
rg "func.*gin.HandlerFunc|func.*gin.Context" -B 2 -A 25 --type go | head -200Repository: mostlygeek/llama-swap
Length of output: 10675
🏁 Script executed:
#!/bin/bash
# Check if wrapHandler or other handlers assign c.Writer = recorder
rg "c.Writer\s*=" -B 5 -A 5 --type go proxy/Repository: mostlygeek/llama-swap
Length of output: 47
🏁 Script executed:
#!/bin/bash
# Check the wrapHandler signature and implementation
rg "func.*wrapHandler" -A 30 --type goRepository: mostlygeek/llama-swap
Length of output: 1814
Assign the recorder to c.Writer to capture error responses.
recordRequest creates a recorder but doesn't attach it to c.Writer, so error responses written via sendErrorResponse bypass the recorder entirely. This leaves request logs with empty bodies on error paths.
The fix is valid: responseBodyCopier embeds gin.ResponseWriter, satisfying the interface for assignment to c.Writer.
Suggested fix
recorder := newBodyCopier(c.Writer)
+c.Writer = recorder
recorder.onWrite = func(b []byte) {🤖 Prompt for AI Agents
In `@proxy/proxymanager.go` around lines 1055 - 1078, In recordRequest, the
responseBodyCopier created by newBodyCopier is never assigned to the Gin context
writer so error responses (e.g., those sent via sendErrorResponse) bypass it;
fix by assigning the recorder (responseBodyCopier) to c.Writer immediately after
creation so it implements gin.ResponseWriter and captures all writes (ensure
recorder.onWrite remains set and cleanup restores original writer if needed).
| {#each sortedRequests as req (req.id)} | ||
| <tr | ||
| class="whitespace-nowrap text-sm cursor-pointer hover:bg-gray-100 dark:hover:bg-gray-800 {selectedId === req.id ? 'bg-gray-100 dark:bg-gray-800' : ''}" | ||
| onclick={() => viewDetail(req)} | ||
| > | ||
| <td class="px-4 py-3">{req.id + 1}</td> | ||
| <td class="px-4 py-3">{formatRelativeTime(req.timestamp)}</td> | ||
| <td class="px-4 py-3 font-mono">{req.method}</td> | ||
| <td class="px-4 py-3 font-mono text-xs">{req.path}</td> | ||
| <td class="px-4 py-3">{req.model}</td> | ||
| <td class="px-4 py-3"> | ||
| {#if req.pending} | ||
| <span class="text-yellow-500">pending</span> | ||
| {:else} | ||
| <span class={req.status >= 200 && req.status < 300 ? 'text-green-500' : 'text-red-500'}> | ||
| {req.status} | ||
| </span> | ||
| {/if} | ||
| </td> | ||
| <td class="px-4 py-3 text-gray-500">{req.pending ? "-" : formatDuration(req.duration)}</td> | ||
| </tr> |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Svelte event handler syntax on:keydown keyboard events
💡 Result:
In Svelte, keyboard events use the normal DOM event directive syntax:
<input on:keydown={handleKeydown} />The handler receives a KeyboardEvent, so you typically branch on event.key:
<script>
function handleKeydown(event) {
if (event.key === 'Enter') {
// ...
}
}
</script>
<input on:keydown={handleKeydown} />[1], [2]
You can also use Svelte’s event modifiers with | (these affect event behavior, not which key was pressed):
<input on:keydown|preventDefault|stopPropagation={handleKeydown} />[1], [3]
If you want a global key handler, use <svelte:window>:
<svelte:window on:keydown={handleKeydown} />[2]
Sources: [1] Svelte docs — Element directives (on:eventname, modifiers). [2] Svelte docs — <svelte:window> example with on:keydown and event.key. [3] Svelte docs — on: modifier list.
🌐 Web query:
Svelte accessible clickable table rows keyboard accessibility best practices
💡 Result:
Best practice (recommended): keep the <table> semantic and put a real interactive element in a cell
- Don’t make the
<tr>itself the “button/link”. Instead, put an actual<a>(navigation) or<button>(action) inside a<td>and style it to look like the whole row is clickable. Native links/buttons already have correct keyboard behavior (Enter; and Space for buttons). Svelte’s a11y guidance explicitly recommends this over adding click handlers to non-interactive elements. [1][2]
Svelte example (row navigates):
<tr>
<td class="row">
<a class="rowLink" href={`/users/${user.id}`}>
{user.name}
</a>
</td>
<td>{user.email}</td>
</tr>
<style>
td.row { padding: 0; }
a.rowLink { display: block; padding: .75rem 1rem; } /* makes the hit area “row-like” */
tr:hover a.rowLink { text-decoration: underline; }
a.rowLink:focus-visible { outline: 2px solid currentColor; outline-offset: 2px; }
</style>If you must attach on:click to something non-interactive
Svelte will warn unless you add equivalent keyboard support and make it focusable (typically tabindex="0"), but it still advises using native <button>/<a> where possible. [1][2]
- Add
on:keydown(oron:keyup) for Enter (and Space if it’s acting like a button). - Avoid positive tabindex values (they break expected tab order). [1]
Why “clickable <tr>” is usually a bad idea
- ARIA “table/row” patterns are structural and don’t define row-level keyboard interaction (“Keyboard Interaction: not applicable / none”). So if you turn rows into controls, you’re inventing interactions that assistive tech won’t expect. [3][4]
Sources: Svelte accessibility warnings (keyboard handlers + prefer <button>/<a>) [1][2], WAI-ARIA Authoring Practices / table pattern [3], MDN ARIA table/row roles (no keyboard interaction) [4].
Clickable rows lack keyboard accessibility. Users relying on keyboards cannot open a request, which blocks this workflow.
The suggested fix has two issues:
- Syntax error: Svelte uses
on:keydown(notonkeydown). The directive syntax ison:eventname. - Anti-pattern: Making
<tr>act like a clickable button violates ARIA table semantics and accessibility best practices. ARIA table/row roles don't define keyboard interaction patterns, so assistive technology won't expect row-level interactions.
Recommended approach: Place a native <button> or <a> inside a <td>, and use CSS to style it to visually span the row (e.g., display: block; padding: inherit;). This gives keyboard support natively and maintains proper semantic HTML.
🤖 Prompt for AI Agents
In `@ui-svelte/src/routes/Requests.svelte` around lines 257 - 277, The clickable
<tr> currently uses onclick with viewDetail and blocks keyboard users; remove
the row-level onclick and instead render a native interactive element (a
<button> or <a>) inside a <td> for each row entry (e.g., wrap the row content in
a full-width button inside the first or a dedicated <td>), keep using
viewDetail(req) as the click handler on that element, style it with CSS
(display:block; width:100%; padding:inherit) so it visually spans the row, and
preserve selection logic using selectedId and ARIA attributes on the button
(e.g., aria-pressed or aria-current) while keeping existing helpers like
formatRelativeTime and formatDuration unchanged.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@proxy/proxymanager_api.go`:
- Around line 257-265: apiGetRequests is mutating the entries returned by
pm.requestMonitor.GetEntries() which may be references to stored data; instead
create a new slice of copied entries and clear RequestBody/ResponseBody on the
copies so the original data isn't modified. Locate apiGetRequests and GetEntries
usage, allocate a new slice with the same length, copy each entry (e.g., by
value or deep copy), set the RequestBody and ResponseBody on the copy to empty,
and return the new slice in c.JSON to avoid corrupting the stored entries.
🧹 Nitpick comments (4)
ui-svelte/src/lib/types.ts (1)
69-74: Type loosening:roleis nowstringinstead of a union.Changing
rolefrom"user" | "assistant" | "system"tostringremoves compile-time safety for role values. If this is intentional to support additional roles (e.g., "tool"), consider documenting or using a broader union type for clarity.ui-svelte/src/components/JsonView.svelte (1)
16-22: Add error handling for clipboard API.The copy button directly calls
navigator.clipboard.writeText()without error handling. This can throw if the clipboard API is unavailable (non-HTTPS contexts) or if the user denies permission. TheChatMessage.sveltecomponent in this same codebase has a more robust implementation with fallback.Suggested fix
+<script lang="ts"> + let { content = "" } = $props(); + let copied = $state(false); + + let formattedContent = $derived.by(() => { + try { + const obj = JSON.parse(content); + return JSON.stringify(obj, null, 2); + } catch (e) { + return content; + } + }); + + async function copyToClipboard() { + try { + await navigator.clipboard.writeText(formattedContent); + copied = true; + setTimeout(() => (copied = false), 2000); + } catch (err) { + console.error("Failed to copy:", err); + } + } +</script>Then update the button:
<button class="absolute top-2 right-2 p-1 bg-white/10 hover:bg-white/20 rounded text-xs opacity-0 group-hover:opacity-100 transition-opacity" - onclick={() => navigator.clipboard.writeText(formattedContent)} + onclick={copyToClipboard} title="Copy to clipboard" > - Copy + {copied ? "Copied!" : "Copy"} </button>ui-svelte/src/routes/Requests.svelte (2)
32-42: Consider surfacing fetch errors to the user.The error is logged to console but the user sees no indication that the detail fetch failed. For a debugging tool, this may cause confusion if the detail panel appears empty without explanation.
💡 Optional: Add error state feedback
let isLoadingDetail = $state(false); +let detailError = $state<string | null>(null); ... async function viewDetail(req: RequestLog) { selectedId = req.id; isLoadingDetail = true; + detailError = null; try { detailedRequest = await getRequestDetail(req.id); } catch (err) { console.error(err); + detailError = "Failed to load request details"; } finally { isLoadingDetail = false; } }
450-484: Array tool arguments will display with numeric indices.The
typeof parsedArgs === 'object'check (line 457) includes arrays. If tool arguments are an array,Object.entrieswill show indices like "0", "1" as argument names. This may be confusing but won't break functionality.💡 Optional: Add array check for cleaner display
-{`#if` parsedArgs && typeof parsedArgs === 'object'} +{`#if` parsedArgs && typeof parsedArgs === 'object' && !Array.isArray(parsedArgs)} <!-- table rendering --> +{:else if Array.isArray(parsedArgs)} + <JsonView content={JSON.stringify(parsedArgs)} />
| func (pm *ProxyManager) apiGetRequests(c *gin.Context) { | ||
| entries := pm.requestMonitor.GetEntries() | ||
| // Strip bodies for list view | ||
| for i := range entries { | ||
| entries[i].RequestBody = "" | ||
| entries[i].ResponseBody = "" | ||
| } | ||
| c.JSON(http.StatusOK, entries) | ||
| } |
There was a problem hiding this comment.
Same mutation issue — modifying returned slice elements.
Similar to the SSE initial sync, this modifies entries[i] directly. If GetEntries() returns references, this corrupts the stored data.
Suggested fix
func (pm *ProxyManager) apiGetRequests(c *gin.Context) {
entries := pm.requestMonitor.GetEntries()
- // Strip bodies for list view
- for i := range entries {
- entries[i].RequestBody = ""
- entries[i].ResponseBody = ""
- }
- c.JSON(http.StatusOK, entries)
+ // Strip bodies for list view - create copies to avoid mutating stored data
+ stripped := make([]RequestEntry, len(entries))
+ for i, e := range entries {
+ stripped[i] = e
+ stripped[i].RequestBody = ""
+ stripped[i].ResponseBody = ""
+ }
+ c.JSON(http.StatusOK, stripped)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (pm *ProxyManager) apiGetRequests(c *gin.Context) { | |
| entries := pm.requestMonitor.GetEntries() | |
| // Strip bodies for list view | |
| for i := range entries { | |
| entries[i].RequestBody = "" | |
| entries[i].ResponseBody = "" | |
| } | |
| c.JSON(http.StatusOK, entries) | |
| } | |
| func (pm *ProxyManager) apiGetRequests(c *gin.Context) { | |
| entries := pm.requestMonitor.GetEntries() | |
| // Strip bodies for list view - create copies to avoid mutating stored data | |
| stripped := make([]RequestEntry, len(entries)) | |
| for i, e := range entries { | |
| stripped[i] = e | |
| stripped[i].RequestBody = "" | |
| stripped[i].ResponseBody = "" | |
| } | |
| c.JSON(http.StatusOK, stripped) | |
| } |
🤖 Prompt for AI Agents
In `@proxy/proxymanager_api.go` around lines 257 - 265, apiGetRequests is mutating
the entries returned by pm.requestMonitor.GetEntries() which may be references
to stored data; instead create a new slice of copied entries and clear
RequestBody/ResponseBody on the copies so the original data isn't modified.
Locate apiGetRequests and GetEntries usage, allocate a new slice with the same
length, copy each entry (e.g., by value or deep copy), set the RequestBody and
ResponseBody on the copy to empty, and return the new slice in c.JSON to avoid
corrupting the stored entries.
|
Hi, Thanks for submitting this PR. After building a few llm traffic capture tools I want this functionality to be independent of llama-swap. The main reason is so there is a more room for it to develop its own unique feature set. That would be better for llama-swap and the inspector tool. |
|
would you maybe consider exposing the request-response streams via an api, so one does not have to put another proxy in front of llama-swap? i'm also looking for a logging solution and came to the conclusion that it would be easiest to contribute it to llama-swap, landing me here. there is existing standalone software for this, for example llm-proxy - but it would require node, which i could do without... other solutions are heavy and need other supporting infrastructure. |
|
That's a good suggestion to expose the data via some sort of API. @h3po what is the use case you have for req/resp in llama-swap? |
|
i use llama-swap a lot for quickly trying out different models/sampling configs/quantizations and also frontend software like rag databases, chat ui etc. it would be useful to gather logs for debugging and auditing on the llama-swap side instead of coming up with a way to log each and every client software separately. |
|
Thanks that's helpful context. I think perhaps there is a good middle ground here. A lightweight UI similar to what @khimaros created and also an API/hooks/plugin system for a deeper inspection. What is llama-swap anymore!? :D |
|
this is my use case exactly as well. I've been running this PR on my Strix Halo and have found it incredibly useful for understanding how different clients behave. @mostlygeek if there is anything you need from me to make this more attractive for merge, please let me know. happy to iterate on design or technical approach. |
|
FWIW, i did try out some other proxies like Bifrost and LiteLLM but it's quite annoying maintaining two identical sources of truth for model lists (with a lot of clicking around a web admin interface). useful if also working with remote models but not helpful for my case. |
|
@khimaros thanks for implementing this. i'll suggest adding a filter for the request path; my log is full of /metrics requests because i use llama-server with --metrics (https://gist.github.com/h3po/f7703e7cc08cf7151b58820eaeccfbd9). also it would be nice if you would drop large components like type image_url from the body before the 1MB check |
Add saving request and response headers and bodies that go through llama-swap in memory. - captureBuffer added to configuration. Captures are enabled by default. - 5MB of memory is allocated for req/response captures in a ring buffer. Setting captureBuffer to 0 will disable captures. - UI elements to view captured data added to Activity page. Includes some QOL features like json formatting and recombining SSE chat streams - capture saving is done at the byte level and has minimal impact on llama-swap performance Fixes #464 Ref #503
|
closed by #508 |
|
awesome! a little harder to read than my PoC and doesn't show requests until they complete, but it will get the job done and save me from maintaining a fork 🤓 |

i tried to keep this as isolated as possible and minimize impact to the rest of the codebase. sorry i didn't run gofmt in a separate changelist before submitting this one so there is a bit of whitespace noise. fixes #464
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.