Skip to content

proxy: extract metrics for v1/messages#419

Merged
mostlygeek merged 1 commit intomainfrom
anthropic-metrics
Nov 30, 2025
Merged

proxy: extract metrics for v1/messages#419
mostlygeek merged 1 commit intomainfrom
anthropic-metrics

Conversation

@mostlygeek
Copy link
Owner

@mostlygeek mostlygeek commented Nov 30, 2025

Add missing metrics extraction for anthropic specific usage information.

See #417

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced metrics collection to support multiple data format standards for improved compatibility.
    • Improved error handling and logging when metrics data is incomplete or unavailable.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Nov 30, 2025

Walkthrough

Modified metrics extraction in proxy/metrics_monitor.go to parse nested usage and timings objects separately instead of a single metrics JSON root. Updated the parseMetrics function signature to accept separate usage and timings parameters, added support for multiple field naming conventions, and changed timings parsing to use llama-server style keys.

Changes

Cohort / File(s) Summary
Metrics parsing refactoring
proxy/metrics_monitor.go
Updated parseMetrics function signature to accept separate usage and timings objects instead of a single jsonData parameter. Modified wrapHandler and streaming response parsing to extract and pass usage and timings separately. Added support for multiple token field naming conventions (input_tokens/prompt_tokens, output_tokens/completion_tokens, cache_read_input_tokens). Updated timings parsing to read llama-server style keys (prompt_n, predicted_n, prompt_per_second, predicted_per_second, prompt_ms, predicted_ms, cache_n). Added conditional logic to emit metrics only when usage or timings data exists.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Verify all call sites of parseMetrics are updated with the new function signature (separate usage and timings parameters)
  • Validate that multiple field naming conventions (input_tokens vs. prompt_tokens, output_tokens vs. completion_tokens) are handled correctly for backward compatibility
  • Confirm timings field parsing matches the expected llama-server schema
  • Review conditional logic to ensure metrics are only emitted when valid data exists and proper logging occurs for skipped metrics

Possibly related PRs

  • Fix token metrics parsing #199: Modifies token-metrics parsing to support a separate "timings" object and adjusts streaming/parse logic with matching function signature and timing field changes
  • Include metrics from upstream chat requests #361: Updates the same metrics parsing code in proxy/metrics_monitor.go to accept separate usage and timings objects with parallel streaming/non-streaming parsing adjustments

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'proxy: extract metrics for v1/messages' directly and specifically describes the main change: adding metrics extraction for the Anthropic v1/messages endpoint, which aligns with the PR objectives and the raw summary.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch anthropic-metrics

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bccce5f and 25ad459.

📒 Files selected for processing (1)
  • proxy/metrics_monitor.go (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
proxy/**/*.go

📄 CodeRabbit inference engine (CLAUDE.md)

Run make test-dev when making iterative changes to code under the proxy/ directory - this runs go test and staticcheck, and all static checking errors must be fixed

Files:

  • proxy/metrics_monitor.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: run-tests
🔇 Additional comments (3)
proxy/metrics_monitor.go (3)

125-135: LGTM!

Clean extraction of nested usage and timings objects. The existence check ensures metrics are only processed when meaningful data is present.


184-190: LGTM!

Consistent with the non-streaming path. Correctly extracts metrics from the final SSE chunk containing usage data.


208-227: LGTM!

Good defensive coding with existence checks. The dual-format support (OpenAI's prompt_tokens/completion_tokens vs Anthropic's input_tokens/output_tokens) cleanly addresses the v1/messages endpoint requirements.

Comment on lines 230 to 240
if timings.Exists() {
inputTokens = int(jsonData.Get("timings.prompt_n").Int())
outputTokens = int(jsonData.Get("timings.predicted_n").Int())
promptPerSecond = jsonData.Get("timings.prompt_per_second").Float()
tokensPerSecond = jsonData.Get("timings.predicted_per_second").Float()
durationMs = int(jsonData.Get("timings.prompt_ms").Float() + jsonData.Get("timings.predicted_ms").Float())
inputTokens = int(timings.Get("prompt_n").Int())
outputTokens = int(timings.Get("predicted_n").Int())
promptPerSecond = timings.Get("prompt_per_second").Float()
tokensPerSecond = timings.Get("predicted_per_second").Float()
durationMs = int(timings.Get("prompt_ms").Float() + timings.Get("predicted_ms").Float())

if cachedValue := jsonData.Get("timings.cache_n"); cachedValue.Exists() {
if cachedValue := timings.Get("cache_n"); cachedValue.Exists() {
cachedTokens = int(cachedValue.Int())
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add existence checks before overwriting token counts from timings.

When timings exists but lacks specific fields (e.g., prompt_n), gjson.Result.Int() returns 0, which overwrites potentially valid values from usage. This is inconsistent with line 237 which correctly checks cachedValue.Exists() before using it.

 	// use llama-server's timing data for tok/sec and duration as it is more accurate
 	if timings.Exists() {
-		inputTokens = int(timings.Get("prompt_n").Int())
-		outputTokens = int(timings.Get("predicted_n").Int())
-		promptPerSecond = timings.Get("prompt_per_second").Float()
-		tokensPerSecond = timings.Get("predicted_per_second").Float()
-		durationMs = int(timings.Get("prompt_ms").Float() + timings.Get("predicted_ms").Float())
+		if pn := timings.Get("prompt_n"); pn.Exists() {
+			inputTokens = int(pn.Int())
+		}
+		if predn := timings.Get("predicted_n"); predn.Exists() {
+			outputTokens = int(predn.Int())
+		}
+		if pps := timings.Get("prompt_per_second"); pps.Exists() {
+			promptPerSecond = pps.Float()
+		}
+		if tps := timings.Get("predicted_per_second"); tps.Exists() {
+			tokensPerSecond = tps.Float()
+		}
+		if pm := timings.Get("prompt_ms"); pm.Exists() {
+			if predm := timings.Get("predicted_ms"); predm.Exists() {
+				durationMs = int(pm.Float() + predm.Float())
+			}
+		}
 
 		if cachedValue := timings.Get("cache_n"); cachedValue.Exists() {
 			cachedTokens = int(cachedValue.Int())
 		}
 	}
🤖 Prompt for AI Agents
In proxy/metrics_monitor.go around lines 230 to 240, timings is checked for
existence but individual timing fields (prompt_n, predicted_n,
prompt_per_second, predicted_per_second, prompt_ms, predicted_ms) are read
unconditionally, causing zero-valued gjson results to overwrite valid
usage-derived token counts; change each assignment to first check that the
specific timings.Get("<field>").Exists() before converting and assigning
(similar to the existing cachedValue check) so you only overwrite values when
that timing field is actually present.

@mostlygeek mostlygeek merged commit dea9873 into main Nov 30, 2025
3 checks passed
@mostlygeek mostlygeek deleted the anthropic-metrics branch January 31, 2026 04:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant