proxy: fix metrics for non-llama.cpp backends (vLLM) and correct wall-clock timing by efschu · Pull Request #701 · mostlygeek/llama-swap

efschu · 2026-04-24T12:31:46Z

Problem

recorder.StartTime() is only set on the first Write() call, which happens after response headers are sent. For non-streaming JSON responses this means wallDurationMs is ~0, resulting in meaningless speed metrics.
vLLM (and other non-llama.cpp backends) do not return a timings field in their response, so prompt_per_second and tokens_per_second stay at -1.0.

Fix

Capture requestStart := time.Now() before calling next() to get accurate wall-clock duration for all request types.
Add fallback in parseMetrics: when timings is absent, estimate speeds from inputTokens/durationSec and outputTokens/durationSec.

Testing

vLLM 35B TurboQuant: before fix prompt_per_second: -1, after fix prompt_per_second: 53.6 (warm cache, 224ms duration)
llama.cpp metrics unchanged (still uses precise timings field when available)

…timing - Capture requestStart before next() call instead of recorder.StartTime() which is only set on first Write(), causing ~0ms duration for non-streaming - Add fallback in parseMetrics: when timings field is absent (vLLM, etc.), estimate prompt_per_second and tokens_per_second from wall clock duration - Fixes -1 values shown in Activity page for vLLM models

coderabbitai · 2026-04-24T12:31:57Z

Walkthrough

Metrics monitor now accepts a config.Config, maps model IDs to display names via configured aliases, captures a wall-clock requestStart before proxying and uses it for token duration and parsing, and falls back to estimating per-second rates from wall-clock duration when timings are absent.

Changes

Cohort / File(s)	Summary
Metrics monitor core `proxy/metrics_monitor.go`	Constructor now takes `config.Config`; `metricsMonitor` maps model IDs to display names via first alias. `wrapHandler` snapshots wall-clock `requestStart` and forwards it to streaming and non-streaming parsing. `parseMetrics` uses the wall-clock duration and falls back to estimating `prompt_per_second` and `tokens_per_second` from usage token counts when `timings` are missing.
Tests updated `proxy/metrics_monitor_test.go`	All `newMetricsMonitor` test/benchmark call sites updated to pass a `config.Config{}` argument; no test logic changed beyond constructor usage.
Proxy initialization `proxy/proxymanager.go`	`ProxyManager.New` now constructs `metricsMonitor` by passing the full `proxyConfig` into `newMetricsMonitor` (changed constructor parameterization).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

proxy/config: create config package and migrate configuration #329: Changes metrics_monitor constructor to accept config.Config and updates usages in proxymanager.New and tests.
proxy: extract metrics for v1/messages #419: Modifies parsing flow in metrics_monitor.go, including how timestamps/timings are forwarded between handlers and parsing.
Include metrics from upstream chat requests #361: Updates streaming and non-streaming metrics parsing in proxy/metrics_monitor.go, overlapping with wall-clock and fallback logic changes.

Suggested reviewers

mostlygeek

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main changes: fixing metrics for non-llama.cpp backends (vLLM) and correcting wall-clock timing issues.
Description check	✅ Passed	The description provides clear context about the problem (recorder.StartTime() timing issues and missing timings field in vLLM responses), the fix (capturing requestStart before proxying and adding fallback calculations), and testing evidence.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@proxy/metrics_monitor.go`:
- Around line 487-496: The fallback calculation for promptPerSecond and
tokensPerSecond (when timings.Exists() is false) uses wallDurationMs for both
rates, conflating prompt evaluation time and decode time; update the code around
timings, wallDurationMs, promptPerSecond, tokensPerSecond, inputTokens and
outputTokens to make this explicit: either expand the comment to state these are
approximate rates computed over total wall time (including both prompt and
decode phases) or add a boolean/flag (e.g., estimatedRates or
ratesAreApproximate) or separate field to indicate “estimated vs measured” so
downstream consumers know these values are biased and not directly comparable to
phase-specific metrics.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 99d4370b-6c08-4ce2-ba2d-149811e994c7

📥 Commits

Reviewing files that changed from the base of the PR and between 3cd7837 and e29a602.

📒 Files selected for processing (1)

proxy/metrics_monitor.go

coderabbitai · 2026-04-24T12:35:17Z

+	// Fallback: estimate speeds from wall clock when timings unavailable (e.g., vLLM)
+	if !timings.Exists() && wallDurationMs > 0 {
+		durationSec := float64(wallDurationMs) / 1000.0
+		if inputTokens > 0 {
+			promptPerSecond = float64(inputTokens) / durationSec
+		}
+		if outputTokens > 0 {
+			tokensPerSecond = float64(outputTokens) / durationSec
+		}
+	}


⚠️ Potential issue | 🟡 Minor

Fallback speeds conflate prompt eval with token generation.

When timings is absent, both rates are divided by the same wall-clock duration that encompasses prompt processing and decode. This produces:

promptPerSecond that is far lower than the true prompt-eval rate (prompt eval usually occupies a small fraction of the total request).

tokensPerSecond that is biased low by the prompt time (i.e. effectively outputTokens / (prompt_ms + decode_ms) rather than outputTokens / decode_ms).

Given the backend doesn't expose per-phase timings, this is an acceptable approximation — but worth making the caveat explicit so downstream dashboards don't treat these as directly comparable to llama.cpp's prompt_per_second / predicted_per_second. Consider expanding the comment (e.g., "approximate rates over total wall time; includes both prompt and decode phases") or emitting a separate flag/field to distinguish estimated vs. measured rates.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@proxy/metrics_monitor.go` around lines 487 - 496, The fallback calculation for promptPerSecond and tokensPerSecond (when timings.Exists() is false) uses wallDurationMs for both rates, conflating prompt evaluation time and decode time; update the code around timings, wallDurationMs, promptPerSecond, tokensPerSecond, inputTokens and outputTokens to make this explicit: either expand the comment to state these are approximate rates computed over total wall time (including both prompt and decode phases) or add a boolean/flag (e.g., estimatedRates or ratesAreApproximate) or separate field to indicate “estimated vs measured” so downstream consumers know these values are biased and not directly comparable to phase-specific metrics.

The activity table now displays the model alias (e.g., qwen3.6-35b-a3b-uncensored) instead of the internal model ID (e.g., llama-cpp-5090) when aliases are configured.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

proxy/metrics_monitor.go (1)

136-139: Consider guarding against empty/whitespace alias strings.

modelConfig.Aliases[0] is used as the display name unconditionally. If a user configures an empty or whitespace-only alias entry in YAML, the metric's Model field will silently become empty, which is worse than falling back to the model ID. Other parts of the codebase (e.g., listModelsHandler at proxy/proxymanager.go lines 573–577) already guard with strings.TrimSpace before using aliases.

🔧 Suggested fix

-	// Resolve modelID to display name (first alias or modelID itself)
-	if modelConfig, exists := mp.config.Models[metric.Model]; exists && len(modelConfig.Aliases) > 0 {
-		metric.Model = modelConfig.Aliases[0]
-	}
+	// Resolve modelID to display name (first non-empty alias, falling back to the model ID)
+	if modelConfig, exists := mp.config.Models[metric.Model]; exists {
+		for _, alias := range modelConfig.Aliases {
+			if a := strings.TrimSpace(alias); a != "" {
+				metric.Model = a
+				break
+			}
+		}
+	}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@proxy/metrics_monitor.go` around lines 136 - 139, The current resolution sets
metric.Model to modelConfig.Aliases[0] unconditionally which can assign an
empty/whitespace string; change the logic in the block that reads
mp.config.Models and modelConfig.Aliases so you pick the first alias whose
strings.TrimSpace(alias) != "" (or if none found, leave metric.Model as the
original metric.Model/modelID) — i.e., iterate modelConfig.Aliases, use the
first non-empty trimmed alias as the display name, otherwise fall back to the
existing metric.Model; reference mp.config.Models, metric.Model and
modelConfig.Aliases when locating the code to update.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@proxy/metrics_monitor.go`:
- Around line 16-19: The file fails gofmt due to import grouping/spacing and
misaligned struct/map fields: reorder the imports in metrics_monitor.go to
standard grouping (stdlib, external, then internal) and ensure the inserted
config import (github.com/mostlygeek/llama-swap/proxy/config) sits with other
external/internal imports consistent with the file's style; fix the tab
alignment of the config field in the metricsMonitor struct (symbol:
metricsMonitor) and the corresponding config: key in the NewMetricsMonitor (or
equivalent initializer) so field alignment matches mu, metrics, logger, etc.;
then run gofmt -w proxy/metrics_monitor.go and verify with gofmt -l . before
committing.

---

Nitpick comments:
In `@proxy/metrics_monitor.go`:
- Around line 136-139: The current resolution sets metric.Model to
modelConfig.Aliases[0] unconditionally which can assign an empty/whitespace
string; change the logic in the block that reads mp.config.Models and
modelConfig.Aliases so you pick the first alias whose strings.TrimSpace(alias)
!= "" (or if none found, leave metric.Model as the original
metric.Model/modelID) — i.e., iterate modelConfig.Aliases, use the first
non-empty trimmed alias as the display name, otherwise fall back to the existing
metric.Model; reference mp.config.Models, metric.Model and modelConfig.Aliases
when locating the code to update.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 807032b6-fbc1-4a85-bdcd-d419a19ed7a1

📥 Commits

Reviewing files that changed from the base of the PR and between e29a602 and 3f58ca3.

📒 Files selected for processing (3)

proxy/metrics_monitor.go
proxy/metrics_monitor_test.go
proxy/proxymanager.go

✅ Files skipped from review due to trivial changes (1)

proxy/metrics_monitor_test.go

coderabbitai · 2026-04-24T13:09:28Z

+	"github.com/mostlygeek/llama-swap/proxy/config"
 	"github.com/klauspost/compress/zstd"
 	"github.com/mostlygeek/llama-swap/event"
 	"github.com/tidwall/gjson"


⚠️ Potential issue | 🟠 Major

Fix gofmt formatting failures.

CI's gofmt -l . step failed on this file. Two likely culprits in the changed lines:

Import block ordering/spacing at lines 16–19 mixes external modules (config inserted between gin-gonic/gin and klauspost/compress/zstd) in a way that appears to disagree with the existing formatting of this file.

The config field added to metricsMonitor at line 99 and the config: key at line 118 don't match the tab alignment used for the surrounding fields (mu, metrics, logger, etc.), which gofmt will re-flow.

Please run gofmt -w proxy/metrics_monitor.go (and double-check with gofmt -l .) before pushing.

As per coding guidelines: "Run gofmt -l . before committing to verify formatting. Fix any reported files with gofmt -w <file>."

🔧 Verification

#!/bin/bash gofmt -l proxy/metrics_monitor.go proxy/proxymanager.go # Show the diff gofmt would apply: gofmt -d proxy/metrics_monitor.go

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@proxy/metrics_monitor.go` around lines 16 - 19, The file fails gofmt due to import grouping/spacing and misaligned struct/map fields: reorder the imports in metrics_monitor.go to standard grouping (stdlib, external, then internal) and ensure the inserted config import (github.com/mostlygeek/llama-swap/proxy/config) sits with other external/internal imports consistent with the file's style; fix the tab alignment of the config field in the metricsMonitor struct (symbol: metricsMonitor) and the corresponding config: key in the NewMetricsMonitor (or equivalent initializer) so field alignment matches mu, metrics, logger, etc.; then run gofmt -w proxy/metrics_monitor.go and verify with gofmt -l . before committing.

efschu · 2026-04-24T13:20:22Z

Closing in favor of #702 which supersedes this PR with a unified description covering all fixes.

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

feat: show alias name instead of model ID in activity metrics

3f58ca3

The activity table now displays the model alias (e.g., qwen3.6-35b-a3b-uncensored) instead of the internal model ID (e.g., llama-cpp-5090) when aliases are configured.

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

efschu closed this Apr 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proxy: fix metrics for non-llama.cpp backends (vLLM) and correct wall-clock timing#701

proxy: fix metrics for non-llama.cpp backends (vLLM) and correct wall-clock timing#701
efschu wants to merge 2 commits into
mostlygeek:mainfrom
efschu:fix-vllm-metrics

efschu commented Apr 24, 2026

Uh oh!

coderabbitai Bot commented Apr 24, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Uh oh!

efschu commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

efschu commented Apr 24, 2026

Problem

Fix

Testing

Uh oh!

coderabbitai Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

efschu commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Apr 24, 2026 •

edited

Loading