proxy: fix metrics for non-llama.cpp backends (vLLM) and correct wall-clock timing#701
proxy: fix metrics for non-llama.cpp backends (vLLM) and correct wall-clock timing#701efschu wants to merge 2 commits into
Conversation
…timing - Capture requestStart before next() call instead of recorder.StartTime() which is only set on first Write(), causing ~0ms duration for non-streaming - Add fallback in parseMetrics: when timings field is absent (vLLM, etc.), estimate prompt_per_second and tokens_per_second from wall clock duration - Fixes -1 values shown in Activity page for vLLM models
WalkthroughMetrics monitor now accepts a Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@proxy/metrics_monitor.go`:
- Around line 487-496: The fallback calculation for promptPerSecond and
tokensPerSecond (when timings.Exists() is false) uses wallDurationMs for both
rates, conflating prompt evaluation time and decode time; update the code around
timings, wallDurationMs, promptPerSecond, tokensPerSecond, inputTokens and
outputTokens to make this explicit: either expand the comment to state these are
approximate rates computed over total wall time (including both prompt and
decode phases) or add a boolean/flag (e.g., estimatedRates or
ratesAreApproximate) or separate field to indicate “estimated vs measured” so
downstream consumers know these values are biased and not directly comparable to
phase-specific metrics.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 99d4370b-6c08-4ce2-ba2d-149811e994c7
📒 Files selected for processing (1)
proxy/metrics_monitor.go
| // Fallback: estimate speeds from wall clock when timings unavailable (e.g., vLLM) | ||
| if !timings.Exists() && wallDurationMs > 0 { | ||
| durationSec := float64(wallDurationMs) / 1000.0 | ||
| if inputTokens > 0 { | ||
| promptPerSecond = float64(inputTokens) / durationSec | ||
| } | ||
| if outputTokens > 0 { | ||
| tokensPerSecond = float64(outputTokens) / durationSec | ||
| } | ||
| } |
There was a problem hiding this comment.
Fallback speeds conflate prompt eval with token generation.
When timings is absent, both rates are divided by the same wall-clock duration that encompasses prompt processing and decode. This produces:
promptPerSecondthat is far lower than the true prompt-eval rate (prompt eval usually occupies a small fraction of the total request).tokensPerSecondthat is biased low by the prompt time (i.e. effectivelyoutputTokens / (prompt_ms + decode_ms)rather thanoutputTokens / decode_ms).
Given the backend doesn't expose per-phase timings, this is an acceptable approximation — but worth making the caveat explicit so downstream dashboards don't treat these as directly comparable to llama.cpp's prompt_per_second / predicted_per_second. Consider expanding the comment (e.g., "approximate rates over total wall time; includes both prompt and decode phases") or emitting a separate flag/field to distinguish estimated vs. measured rates.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@proxy/metrics_monitor.go` around lines 487 - 496, The fallback calculation
for promptPerSecond and tokensPerSecond (when timings.Exists() is false) uses
wallDurationMs for both rates, conflating prompt evaluation time and decode
time; update the code around timings, wallDurationMs, promptPerSecond,
tokensPerSecond, inputTokens and outputTokens to make this explicit: either
expand the comment to state these are approximate rates computed over total wall
time (including both prompt and decode phases) or add a boolean/flag (e.g.,
estimatedRates or ratesAreApproximate) or separate field to indicate “estimated
vs measured” so downstream consumers know these values are biased and not
directly comparable to phase-specific metrics.
The activity table now displays the model alias (e.g., qwen3.6-35b-a3b-uncensored) instead of the internal model ID (e.g., llama-cpp-5090) when aliases are configured.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
proxy/metrics_monitor.go (1)
136-139: Consider guarding against empty/whitespace alias strings.
modelConfig.Aliases[0]is used as the display name unconditionally. If a user configures an empty or whitespace-only alias entry in YAML, the metric'sModelfield will silently become empty, which is worse than falling back to the model ID. Other parts of the codebase (e.g.,listModelsHandleratproxy/proxymanager.golines 573–577) already guard withstrings.TrimSpacebefore using aliases.🔧 Suggested fix
- // Resolve modelID to display name (first alias or modelID itself) - if modelConfig, exists := mp.config.Models[metric.Model]; exists && len(modelConfig.Aliases) > 0 { - metric.Model = modelConfig.Aliases[0] - } + // Resolve modelID to display name (first non-empty alias, falling back to the model ID) + if modelConfig, exists := mp.config.Models[metric.Model]; exists { + for _, alias := range modelConfig.Aliases { + if a := strings.TrimSpace(alias); a != "" { + metric.Model = a + break + } + } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@proxy/metrics_monitor.go` around lines 136 - 139, The current resolution sets metric.Model to modelConfig.Aliases[0] unconditionally which can assign an empty/whitespace string; change the logic in the block that reads mp.config.Models and modelConfig.Aliases so you pick the first alias whose strings.TrimSpace(alias) != "" (or if none found, leave metric.Model as the original metric.Model/modelID) — i.e., iterate modelConfig.Aliases, use the first non-empty trimmed alias as the display name, otherwise fall back to the existing metric.Model; reference mp.config.Models, metric.Model and modelConfig.Aliases when locating the code to update.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@proxy/metrics_monitor.go`:
- Around line 16-19: The file fails gofmt due to import grouping/spacing and
misaligned struct/map fields: reorder the imports in metrics_monitor.go to
standard grouping (stdlib, external, then internal) and ensure the inserted
config import (github.com/mostlygeek/llama-swap/proxy/config) sits with other
external/internal imports consistent with the file's style; fix the tab
alignment of the config field in the metricsMonitor struct (symbol:
metricsMonitor) and the corresponding config: key in the NewMetricsMonitor (or
equivalent initializer) so field alignment matches mu, metrics, logger, etc.;
then run gofmt -w proxy/metrics_monitor.go and verify with gofmt -l . before
committing.
---
Nitpick comments:
In `@proxy/metrics_monitor.go`:
- Around line 136-139: The current resolution sets metric.Model to
modelConfig.Aliases[0] unconditionally which can assign an empty/whitespace
string; change the logic in the block that reads mp.config.Models and
modelConfig.Aliases so you pick the first alias whose strings.TrimSpace(alias)
!= "" (or if none found, leave metric.Model as the original
metric.Model/modelID) — i.e., iterate modelConfig.Aliases, use the first
non-empty trimmed alias as the display name, otherwise fall back to the existing
metric.Model; reference mp.config.Models, metric.Model and modelConfig.Aliases
when locating the code to update.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 807032b6-fbc1-4a85-bdcd-d419a19ed7a1
📒 Files selected for processing (3)
proxy/metrics_monitor.goproxy/metrics_monitor_test.goproxy/proxymanager.go
✅ Files skipped from review due to trivial changes (1)
- proxy/metrics_monitor_test.go
| "github.com/mostlygeek/llama-swap/proxy/config" | ||
| "github.com/klauspost/compress/zstd" | ||
| "github.com/mostlygeek/llama-swap/event" | ||
| "github.com/tidwall/gjson" |
There was a problem hiding this comment.
Fix gofmt formatting failures.
CI's gofmt -l . step failed on this file. Two likely culprits in the changed lines:
- Import block ordering/spacing at lines 16–19 mixes external modules (
configinserted betweengin-gonic/ginandklauspost/compress/zstd) in a way that appears to disagree with the existing formatting of this file. - The
configfield added tometricsMonitorat line 99 and theconfig:key at line 118 don't match the tab alignment used for the surrounding fields (mu,metrics,logger, etc.), whichgofmtwill re-flow.
Please run gofmt -w proxy/metrics_monitor.go (and double-check with gofmt -l .) before pushing.
As per coding guidelines: "Run gofmt -l . before committing to verify formatting. Fix any reported files with gofmt -w <file>."
🔧 Verification
#!/bin/bash
gofmt -l proxy/metrics_monitor.go proxy/proxymanager.go
# Show the diff gofmt would apply:
gofmt -d proxy/metrics_monitor.go🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@proxy/metrics_monitor.go` around lines 16 - 19, The file fails gofmt due to
import grouping/spacing and misaligned struct/map fields: reorder the imports in
metrics_monitor.go to standard grouping (stdlib, external, then internal) and
ensure the inserted config import
(github.com/mostlygeek/llama-swap/proxy/config) sits with other
external/internal imports consistent with the file's style; fix the tab
alignment of the config field in the metricsMonitor struct (symbol:
metricsMonitor) and the corresponding config: key in the NewMetricsMonitor (or
equivalent initializer) so field alignment matches mu, metrics, logger, etc.;
then run gofmt -w proxy/metrics_monitor.go and verify with gofmt -l . before
committing.
|
Closing in favor of #702 which supersedes this PR with a unified description covering all fixes. |
Problem
Fix
Testing