proxy: extract metrics for v1/messages#419
Conversation
WalkthroughModified metrics extraction in Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Possibly related PRs
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
proxy/metrics_monitor.go(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
proxy/**/*.go
📄 CodeRabbit inference engine (CLAUDE.md)
Run
make test-devwhen making iterative changes to code under theproxy/directory - this runsgo testandstaticcheck, and all static checking errors must be fixed
Files:
proxy/metrics_monitor.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: run-tests
🔇 Additional comments (3)
proxy/metrics_monitor.go (3)
125-135: LGTM!Clean extraction of nested
usageandtimingsobjects. The existence check ensures metrics are only processed when meaningful data is present.
184-190: LGTM!Consistent with the non-streaming path. Correctly extracts metrics from the final SSE chunk containing usage data.
208-227: LGTM!Good defensive coding with existence checks. The dual-format support (OpenAI's
prompt_tokens/completion_tokensvs Anthropic'sinput_tokens/output_tokens) cleanly addresses the v1/messages endpoint requirements.
| if timings.Exists() { | ||
| inputTokens = int(jsonData.Get("timings.prompt_n").Int()) | ||
| outputTokens = int(jsonData.Get("timings.predicted_n").Int()) | ||
| promptPerSecond = jsonData.Get("timings.prompt_per_second").Float() | ||
| tokensPerSecond = jsonData.Get("timings.predicted_per_second").Float() | ||
| durationMs = int(jsonData.Get("timings.prompt_ms").Float() + jsonData.Get("timings.predicted_ms").Float()) | ||
| inputTokens = int(timings.Get("prompt_n").Int()) | ||
| outputTokens = int(timings.Get("predicted_n").Int()) | ||
| promptPerSecond = timings.Get("prompt_per_second").Float() | ||
| tokensPerSecond = timings.Get("predicted_per_second").Float() | ||
| durationMs = int(timings.Get("prompt_ms").Float() + timings.Get("predicted_ms").Float()) | ||
|
|
||
| if cachedValue := jsonData.Get("timings.cache_n"); cachedValue.Exists() { | ||
| if cachedValue := timings.Get("cache_n"); cachedValue.Exists() { | ||
| cachedTokens = int(cachedValue.Int()) | ||
| } | ||
| } |
There was a problem hiding this comment.
Add existence checks before overwriting token counts from timings.
When timings exists but lacks specific fields (e.g., prompt_n), gjson.Result.Int() returns 0, which overwrites potentially valid values from usage. This is inconsistent with line 237 which correctly checks cachedValue.Exists() before using it.
// use llama-server's timing data for tok/sec and duration as it is more accurate
if timings.Exists() {
- inputTokens = int(timings.Get("prompt_n").Int())
- outputTokens = int(timings.Get("predicted_n").Int())
- promptPerSecond = timings.Get("prompt_per_second").Float()
- tokensPerSecond = timings.Get("predicted_per_second").Float()
- durationMs = int(timings.Get("prompt_ms").Float() + timings.Get("predicted_ms").Float())
+ if pn := timings.Get("prompt_n"); pn.Exists() {
+ inputTokens = int(pn.Int())
+ }
+ if predn := timings.Get("predicted_n"); predn.Exists() {
+ outputTokens = int(predn.Int())
+ }
+ if pps := timings.Get("prompt_per_second"); pps.Exists() {
+ promptPerSecond = pps.Float()
+ }
+ if tps := timings.Get("predicted_per_second"); tps.Exists() {
+ tokensPerSecond = tps.Float()
+ }
+ if pm := timings.Get("prompt_ms"); pm.Exists() {
+ if predm := timings.Get("predicted_ms"); predm.Exists() {
+ durationMs = int(pm.Float() + predm.Float())
+ }
+ }
if cachedValue := timings.Get("cache_n"); cachedValue.Exists() {
cachedTokens = int(cachedValue.Int())
}
}🤖 Prompt for AI Agents
In proxy/metrics_monitor.go around lines 230 to 240, timings is checked for
existence but individual timing fields (prompt_n, predicted_n,
prompt_per_second, predicted_per_second, prompt_ms, predicted_ms) are read
unconditionally, causing zero-valued gjson results to overwrite valid
usage-derived token counts; change each assignment to first check that the
specific timings.Get("<field>").Exists() before converting and assigning
(similar to the existing cachedValue check) so you only overwrite values when
that timing field is actually present.
Add missing metrics extraction for anthropic specific usage information.
See #417
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.