Improve Responses API captures and metrics#577
Improve Responses API captures and metrics#577coder543 wants to merge 2 commits intomostlygeek:mainfrom
Conversation
WalkthroughReworks metrics collection to use a timing-aware internal type ( Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ui-svelte/src/components/CaptureDialog.svelte`:
- Around line 500-529: The "(empty)" fallback inside the respBody "pretty"
branch is unreachable because the outer condition checks respBodyTab ===
"pretty" && (renderedResponse.reasoning || renderedResponse.content); remove the
unreachable check and its <pre>(empty)</pre> fallback so the UI only renders
Reasoning and/or Response when renderedResponse.reasoning or
renderedResponse.content exist; locate the block guarded by respBodyTab ===
"pretty" and renderedResponse checks in CaptureDialog.svelte (use respBodyTab
and renderedResponse identifiers to find it) and delete the unreachable fallback
lines.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7fb713c5-90d2-467c-acfb-c3467215c3ff
📒 Files selected for processing (3)
proxy/metrics_monitor.goproxy/metrics_monitor_test.goui-svelte/src/components/CaptureDialog.svelte
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
ui-svelte/src/components/CaptureDialog.svelte (1)
471-484: Minor: Redundant conditions could be simplified.Lines 471 and 478 use equivalent conditions (
isResponseJson || isSSEvsisSSE || isResponseJson). Consider extracting to a derived variable for clarity, though this is a minor readability nit.♻️ Optional refactor
+ let showPrettyRawTabs = $derived(isResponseJson || isSSE); // Then in template: - {`#if` isResponseJson || isSSE} + {`#if` showPrettyRawTabs} <button ...>Pretty</button> {/if} - {`#if` isSSE || isResponseJson} + {`#if` showPrettyRawTabs} <button ...>Raw</button> {/if}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ui-svelte/src/components/CaptureDialog.svelte` around lines 471 - 484, Extract the repeated boolean expression (isResponseJson || isSSE) into a single derived variable (e.g., showRespTabs) and use that variable in both conditional blocks that render the "Pretty" and "Raw" buttons (where respBodyTab is toggled); update the conditions around the two buttons (currently using isResponseJson || isSSE and isSSE || isResponseJson) to use showRespTabs to remove redundancy and improve readability while keeping respBodyTab, isResponseJson and isSSE logic unchanged.proxy/metrics_monitor.go (1)
505-521:StartTime()is semantically misleading now.After this change it returns
firstWrite, while the actual request start lives inTimings().requestStart. Renaming or removingStartTime()would make future duration code much harder to misuse.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@proxy/metrics_monitor.go` around lines 505 - 521, StartTime() currently returns firstWrite which is misleading; change the API so StartTime() returns the actual request start (responseBodyCopier.requestStart) and add a new FirstWriteTime() (or FirstWrite()) that returns firstWrite, or alternatively rename StartTime() to FirstWriteTime() and update callers—ensure any code using StartTime() for request start now uses Timings().requestStart or the new StartTime() returning requestStart; update references to responseBodyCopier.StartTime(), responseBodyCopier.Timings(), and any callers accordingly to preserve correct semantics.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@proxy/metrics_monitor.go`:
- Around line 428-431: The fallback TokensPerSecond estimation is being applied
for both streaming and non-streaming responses; restrict it so it only runs for
streaming usage-only streams by adding a boolean parameter (e.g., allowFallback
or isStreaming) to the function that computes TokensPerSecond where
timingInfo.generationDuration() is used, call it with true from
processStreamingResponse and false from the non-streaming response path, and
update the branch that sets tokensPerSecond (the block using generationDuration
and outputTokens) to only execute when that boolean is true; also add a
regression test that simulates a chunked non-streaming JSON response to ensure
no synthetic generation speed is recorded for non-streaming responses.
---
Nitpick comments:
In `@proxy/metrics_monitor.go`:
- Around line 505-521: StartTime() currently returns firstWrite which is
misleading; change the API so StartTime() returns the actual request start
(responseBodyCopier.requestStart) and add a new FirstWriteTime() (or
FirstWrite()) that returns firstWrite, or alternatively rename StartTime() to
FirstWriteTime() and update callers—ensure any code using StartTime() for
request start now uses Timings().requestStart or the new StartTime() returning
requestStart; update references to responseBodyCopier.StartTime(),
responseBodyCopier.Timings(), and any callers accordingly to preserve correct
semantics.
In `@ui-svelte/src/components/CaptureDialog.svelte`:
- Around line 471-484: Extract the repeated boolean expression (isResponseJson
|| isSSE) into a single derived variable (e.g., showRespTabs) and use that
variable in both conditional blocks that render the "Pretty" and "Raw" buttons
(where respBodyTab is toggled); update the conditions around the two buttons
(currently using isResponseJson || isSSE and isSSE || isResponseJson) to use
showRespTabs to remove redundancy and improve readability while keeping
respBodyTab, isResponseJson and isSSE logic unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0a39fc5b-141c-4202-ae49-94fe6adca750
📒 Files selected for processing (3)
proxy/metrics_monitor.goproxy/metrics_monitor_test.goui-svelte/src/components/CaptureDialog.svelte
Handle llama-server Responses API captures and metrics more gracefully. - parse nested Responses usage payloads in streaming and non-streaming metrics - render Responses SSE captures with response and pretty JSON views - estimate only generation speed for usage-only streaming responses - leave prompt speed unknown without upstream timing/cache data - add Responses-specific coverage for metrics fallback behavior
There was a problem hiding this comment.
🧹 Nitpick comments (1)
ui-svelte/src/components/CaptureDialog.svelte (1)
262-270: Minor: Potential double newlines when only reasoning exists.When
renderedResponse.reasoningexists butrenderedResponse.contentis empty, the copy text will end with"\n\n"followed by an empty string. This is cosmetic but may leave trailing whitespace.💡 Optional fix
function getCopyText(): string { if (respBodyTab === "pretty" && (renderedResponse.reasoning || renderedResponse.content)) { let text = ""; - if (renderedResponse.reasoning) text += renderedResponse.reasoning + "\n\n"; - text += renderedResponse.content; + if (renderedResponse.reasoning) { + text += renderedResponse.reasoning; + if (renderedResponse.content) text += "\n\n"; + } + if (renderedResponse.content) text += renderedResponse.content; return text; } return displayedResponseBody; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ui-svelte/src/components/CaptureDialog.svelte` around lines 262 - 270, The getCopyText function may append "\n\n" even when renderedResponse.content is empty, causing trailing whitespace; update getCopyText to only add the separator between reasoning and content when both are present by checking renderedResponse.content before concatenating the "\n\n", ensuring respBodyTab === "pretty" still returns the combined reasoning+content when both exist and otherwise returns the single non-empty piece or displayedResponseBody.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@ui-svelte/src/components/CaptureDialog.svelte`:
- Around line 262-270: The getCopyText function may append "\n\n" even when
renderedResponse.content is empty, causing trailing whitespace; update
getCopyText to only add the separator between reasoning and content when both
are present by checking renderedResponse.content before concatenating the
"\n\n", ensuring respBodyTab === "pretty" still returns the combined
reasoning+content when both exist and otherwise returns the single non-empty
piece or displayedResponseBody.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 06e26352-638a-41ae-88b0-0aeec38925fc
📒 Files selected for processing (3)
proxy/metrics_monitor.goproxy/metrics_monitor_test.goui-svelte/src/components/CaptureDialog.svelte
🚧 Files skipped from review as they are similar to previous changes (1)
- proxy/metrics_monitor_test.go
Restore the existing capture dialog behavior and fix aggregate throughput calculations for generated token metrics. - restore Chat and Pretty tab semantics in the capture dialog - keep JSON Pretty views formatted instead of rendered - prefer backend tokens_per_second in the stats panel
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
ui-svelte/src/components/StatsPanel.svelte (1)
57-66:⚠️ Potential issue | 🟡 MinorEdge case: histogram calculation fails when all values are identical.
When
min === max(all tokens-per-second values are the same),binSizebecomes0, causing(value - min) / binSizeto produceNaNorInfinity. This results in incorrect bin indexing.🛡️ Suggested fix
const min = Math.min(...tokensPerSecond); const max = Math.max(...tokensPerSecond); + + // Handle edge case where all values are identical + if (min === max) { + const histogramData: HistogramData = { + bins: [tokensPerSecond.length], + min, + max, + binSize: 1, + p99, + p95, + p50, + }; + return { + totalRequests, + totalInputTokens, + totalOutputTokens, + inFlightRequests: $inFlightRequests, + tokenStats: { + p99: p99.toFixed(2), + p95: p95.toFixed(2), + p50: p50.toFixed(2), + }, + histogramData, + }; + } + const binCount = Math.min(30, Math.max(10, Math.floor(tokensPerSecond.length / 5))); const binSize = (max - min) / binCount;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ui-svelte/src/components/StatsPanel.svelte` around lines 57 - 66, The histogram code in StatsPanel.svelte can divide by zero when min === max; modify the logic around min, max, binCount, and binSize so that when max === min you avoid binSize = 0: either set binCount = 1 (and populate bins[0] = tokensPerSecond.length) or set binSize = 1 and explicitly assign every value to a valid bin index (e.g., 0 or binCount-1); update the block that computes binSize and the tokensPerSecond.forEach bin assignment to handle this branch and ensure no NaN/Infinity binIndex is produced.
🧹 Nitpick comments (2)
ui-svelte/src/components/CaptureDialog.svelte (2)
34-38: Potential reactivity dependency issue:renderedResponseused in$effectbefore being declared.In Svelte 5,
$effectcallbacks capture reactive dependencies. Here,renderedResponse(derived at line 283) is used inside the effect at line 34, butrenderedResponsedepends onresponseBodyRawwhich depends oncapture. Sincecaptureis already a dependency of this effect block, the reactive graph should work correctly. However, accessing a derived value that transitively depends on the same trigger (capture) within the same effect can lead to subtle timing issues if the derivations haven't re-evaluated yet.Consider whether the intent is to react to
capturechanges or torenderedResponsechanges. If the latter, this could be clearer:$effect(() => { if (capture && renderedResponse) { // explicit dependency } });This appears to work correctly in practice, but worth verifying the tab selection behaves as expected when
capturechanges.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ui-svelte/src/components/CaptureDialog.svelte` around lines 34 - 38, The $effect uses renderedResponse before it's declared which can cause timing issues; change the effect so it explicitly depends on renderedResponse (not just capture) by referencing renderedResponse inside the effect (e.g., guard with if (capture && renderedResponse) ...) and compute respBodyTab using renderedResponse, isSSE and respCt so the reactive graph clearly tracks renderedResponse changes; update the effect containing respBodyTab to read renderedResponse directly to ensure proper re-evaluation when responseBodyRaw/capture change.
283-287: Consider supporting non-streaming chat response rendering.Currently,
renderedResponseis only populated for SSE responses. Non-streaming chat completion responses (which return JSON directly) won't have the Chat tab available even though they contain the same logical content structure.If this is intentional (SSE captures only), the current implementation is correct. If you'd like to support the Chat view for non-streaming responses as well:
♻️ Optional enhancement
let renderedResponse = $derived.by(() => { if (!responseBodyRaw) return { reasoning: "", content: "" } as RenderedResponse; if (isSSE) return parseSSEChat(responseBodyRaw); + if (isResponseJson) { + try { + const parsed = JSON.parse(responseBodyRaw); + return appendResponseOutput({ reasoning: "", content: "" }, parsed); + } catch { + return { reasoning: "", content: "" } as RenderedResponse; + } + } return { reasoning: "", content: "" } as RenderedResponse; });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ui-svelte/src/components/CaptureDialog.svelte` around lines 283 - 287, renderedResponse only handles SSE via parseSSEChat so non-streaming JSON chat completions (responseBodyRaw when isSSE is false) are ignored; update the $derived.by callback for renderedResponse to detect non-streaming chat JSON and parse it into the same RenderedResponse shape (either by calling a new helper like parseNonStreamingChat or by JSON.parse + mapping to { reasoning, content }) while keeping existing parseSSEChat for isSSE paths; reference variables/functions: renderedResponse, responseBodyRaw, isSSE, parseSSEChat, RenderedResponse, $derived.by.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@ui-svelte/src/components/StatsPanel.svelte`:
- Around line 57-66: The histogram code in StatsPanel.svelte can divide by zero
when min === max; modify the logic around min, max, binCount, and binSize so
that when max === min you avoid binSize = 0: either set binCount = 1 (and
populate bins[0] = tokensPerSecond.length) or set binSize = 1 and explicitly
assign every value to a valid bin index (e.g., 0 or binCount-1); update the
block that computes binSize and the tokensPerSecond.forEach bin assignment to
handle this branch and ensure no NaN/Infinity binIndex is produced.
---
Nitpick comments:
In `@ui-svelte/src/components/CaptureDialog.svelte`:
- Around line 34-38: The $effect uses renderedResponse before it's declared
which can cause timing issues; change the effect so it explicitly depends on
renderedResponse (not just capture) by referencing renderedResponse inside the
effect (e.g., guard with if (capture && renderedResponse) ...) and compute
respBodyTab using renderedResponse, isSSE and respCt so the reactive graph
clearly tracks renderedResponse changes; update the effect containing
respBodyTab to read renderedResponse directly to ensure proper re-evaluation
when responseBodyRaw/capture change.
- Around line 283-287: renderedResponse only handles SSE via parseSSEChat so
non-streaming JSON chat completions (responseBodyRaw when isSSE is false) are
ignored; update the $derived.by callback for renderedResponse to detect
non-streaming chat JSON and parse it into the same RenderedResponse shape
(either by calling a new helper like parseNonStreamingChat or by JSON.parse +
mapping to { reasoning, content }) while keeping existing parseSSEChat for isSSE
paths; reference variables/functions: renderedResponse, responseBodyRaw, isSSE,
parseSSEChat, RenderedResponse, $derived.by.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e14aef3c-b50b-48de-80cc-ec36af6bd864
📒 Files selected for processing (2)
ui-svelte/src/components/CaptureDialog.svelteui-svelte/src/components/StatsPanel.svelte
|
Mine is a local llama-server, but I have never seen llama-swap show any stats or captures for requests sent via the Responses API until the changes above were implemented. Are you using streaming Responses API queries? Or non-streaming? Codex only supports streaming, and that’s where I’ve only seen zeroes and missing captures. I have not tried non-streaming Responses API queries. |
properly parse anthropic compatible usage data from streaming responses. closes: #577


Right now, requests through the Responses API have no metrics at all, and the responses are not captured. Using
codexthroughllama-swapis a little annoying because of this, sincecodexrequires the Responses API.So, a little PR to parse those stats out.
Hopefully this PR is useful.
Written by
codexusinggpt-5.4. Tested by hand to make sure it works. Code reviewed to make sure it seems reasonable.