proxy: extract metrics for v1/messages by mostlygeek · Pull Request #419 · mostlygeek/llama-swap

mostlygeek · 2025-11-30T07:41:20Z

Add missing metrics extraction for anthropic specific usage information.

Summary by CodeRabbit

Bug Fixes
- Enhanced metrics collection to support multiple data format standards for improved compatibility.
- Improved error handling and logging when metrics data is incomplete or unavailable.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-11-30T07:41:27Z

Walkthrough

Modified metrics extraction in proxy/metrics_monitor.go to parse nested usage and timings objects separately instead of a single metrics JSON root. Updated the parseMetrics function signature to accept separate usage and timings parameters, added support for multiple field naming conventions, and changed timings parsing to use llama-server style keys.

Changes

Cohort / File(s)	Summary
Metrics parsing refactoring `proxy/metrics_monitor.go`	Updated `parseMetrics` function signature to accept separate `usage` and `timings` objects instead of a single `jsonData` parameter. Modified wrapHandler and streaming response parsing to extract and pass usage and timings separately. Added support for multiple token field naming conventions (input_tokens/prompt_tokens, output_tokens/completion_tokens, cache_read_input_tokens). Updated timings parsing to read llama-server style keys (prompt_n, predicted_n, prompt_per_second, predicted_per_second, prompt_ms, predicted_ms, cache_n). Added conditional logic to emit metrics only when usage or timings data exists.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Verify all call sites of parseMetrics are updated with the new function signature (separate usage and timings parameters)
Validate that multiple field naming conventions (input_tokens vs. prompt_tokens, output_tokens vs. completion_tokens) are handled correctly for backward compatibility
Confirm timings field parsing matches the expected llama-server schema
Review conditional logic to ensure metrics are only emitted when valid data exists and proper logging occurs for skipped metrics

Possibly related PRs

Fix token metrics parsing #199: Modifies token-metrics parsing to support a separate "timings" object and adjusts streaming/parse logic with matching function signature and timing field changes
Include metrics from upstream chat requests #361: Updates the same metrics parsing code in proxy/metrics_monitor.go to accept separate usage and timings objects with parallel streaming/non-streaming parsing adjustments

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'proxy: extract metrics for v1/messages' directly and specifically describes the main change: adding metrics extraction for the Anthropic v1/messages endpoint, which aligns with the PR objectives and the raw summary.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch anthropic-metrics

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bccce5f and 25ad459.

📒 Files selected for processing (1)

proxy/metrics_monitor.go (3 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

proxy/**/*.go

📄 CodeRabbit inference engine (CLAUDE.md)

Run make test-dev when making iterative changes to code under the proxy/ directory - this runs go test and staticcheck, and all static checking errors must be fixed

Files:

proxy/metrics_monitor.go

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: run-tests

🔇 Additional comments (3)

proxy/metrics_monitor.go (3)

125-135: LGTM!

Clean extraction of nested usage and timings objects. The existence check ensures metrics are only processed when meaningful data is present.

184-190: LGTM!

Consistent with the non-streaming path. Correctly extracts metrics from the final SSE chunk containing usage data.

208-227: LGTM!

Good defensive coding with existence checks. The dual-format support (OpenAI's prompt_tokens/completion_tokens vs Anthropic's input_tokens/output_tokens) cleanly addresses the v1/messages endpoint requirements.

coderabbitai · 2025-11-30T07:44:19Z

proxy/metrics_monitor.go

 	if timings.Exists() {
-		inputTokens = int(jsonData.Get("timings.prompt_n").Int())
-		outputTokens = int(jsonData.Get("timings.predicted_n").Int())
-		promptPerSecond = jsonData.Get("timings.prompt_per_second").Float()
-		tokensPerSecond = jsonData.Get("timings.predicted_per_second").Float()
-		durationMs = int(jsonData.Get("timings.prompt_ms").Float() + jsonData.Get("timings.predicted_ms").Float())
+		inputTokens = int(timings.Get("prompt_n").Int())
+		outputTokens = int(timings.Get("predicted_n").Int())
+		promptPerSecond = timings.Get("prompt_per_second").Float()
+		tokensPerSecond = timings.Get("predicted_per_second").Float()
+		durationMs = int(timings.Get("prompt_ms").Float() + timings.Get("predicted_ms").Float())

-		if cachedValue := jsonData.Get("timings.cache_n"); cachedValue.Exists() {
+		if cachedValue := timings.Get("cache_n"); cachedValue.Exists() {
 			cachedTokens = int(cachedValue.Int())
 		}
 	}


⚠️ Potential issue | 🟠 Major

Add existence checks before overwriting token counts from timings.

When timings exists but lacks specific fields (e.g., prompt_n), gjson.Result.Int() returns 0, which overwrites potentially valid values from usage. This is inconsistent with line 237 which correctly checks cachedValue.Exists() before using it.

// use llama-server's timing data for tok/sec and duration as it is more accurate if timings.Exists() { - inputTokens = int(timings.Get("prompt_n").Int()) - outputTokens = int(timings.Get("predicted_n").Int()) - promptPerSecond = timings.Get("prompt_per_second").Float() - tokensPerSecond = timings.Get("predicted_per_second").Float() - durationMs = int(timings.Get("prompt_ms").Float() + timings.Get("predicted_ms").Float()) + if pn := timings.Get("prompt_n"); pn.Exists() { + inputTokens = int(pn.Int()) + } + if predn := timings.Get("predicted_n"); predn.Exists() { + outputTokens = int(predn.Int()) + } + if pps := timings.Get("prompt_per_second"); pps.Exists() { + promptPerSecond = pps.Float() + } + if tps := timings.Get("predicted_per_second"); tps.Exists() { + tokensPerSecond = tps.Float() + } + if pm := timings.Get("prompt_ms"); pm.Exists() { + if predm := timings.Get("predicted_ms"); predm.Exists() { + durationMs = int(pm.Float() + predm.Float()) + } + } if cachedValue := timings.Get("cache_n"); cachedValue.Exists() { cachedTokens = int(cachedValue.Int()) } }

🤖 Prompt for AI Agents

In proxy/metrics_monitor.go around lines 230 to 240, timings is checked for existence but individual timing fields (prompt_n, predicted_n, prompt_per_second, predicted_per_second, prompt_ms, predicted_ms) are read unconditionally, causing zero-valued gjson results to overwrite valid usage-derived token counts; change each assignment to first check that the specific timings.Get("<field>").Exists() before converting and assigning (similar to the existing cachedValue check) so you only overwrite values when that timing field is actually present.

proxy: extract metrics for v1/messages

25ad459

coderabbitai bot reviewed Nov 30, 2025

View reviewed changes

mostlygeek merged commit dea9873 into main Nov 30, 2025
3 checks passed

mostlygeek deleted the anthropic-metrics branch January 31, 2026 04:16

This was referenced Feb 5, 2026

proxy: expose metrics via endpoint for visualization in Grafana #509

Open

proxy: support timings for /infill from llama-server #510

Merged

This was referenced Mar 9, 2026

Improve Responses API captures and metrics #577

Closed

proxy: fix metrics capture for v1/responses #586

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proxy: extract metrics for v1/messages#419

proxy: extract metrics for v1/messages#419
mostlygeek merged 1 commit intomainfrom
anthropic-metrics

mostlygeek commented Nov 30, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 30, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mostlygeek commented Nov 30, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mostlygeek commented Nov 30, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 30, 2025 •

edited

Loading