Skip to content

fix: activity metrics wall-clock timing, vLLM speed fallback, and alias display#702

Open
efschu wants to merge 2 commits into
mostlygeek:mainfrom
efschu:fix-activity-model-alias
Open

fix: activity metrics wall-clock timing, vLLM speed fallback, and alias display#702
efschu wants to merge 2 commits into
mostlygeek:mainfrom
efschu:fix-activity-model-alias

Conversation

@efschu
Copy link
Copy Markdown

@efschu efschu commented Apr 24, 2026

Summary

This PR fixes three issues with the Activity page metrics:

1. Wall-clock timing for non-streaming responses

  • responseBodyCopier.StartTime() was only set on the first Write() call, which happens after headers are sent. For non-streaming JSON responses wallDurationMs was ~0, producing meaningless speed metrics.
  • Fix: Capture requestStart before next() for accurate wall-clock duration on all request types.

2. Speed metrics for non-llama.cpp backends (vLLM, etc.)

  • vLLM and other non-llama.cpp backends do not return a timings field in their response, so prompt_per_second and tokens_per_second stayed at -1.0.
  • Fix: Add fallback in parseMetrics: when timings is absent, estimate speeds from wall clock duration.

3. Display model alias instead of internal model ID

  • When a model has aliases (e.g. qwen3.6-35b-a3b-uncensored maps to llama-cpp-5090), the Activity page showed the internal model ID instead of the user-facing alias.
  • Fix: In addMetrics(), resolve the model ID to its first alias before recording metrics. Falls back to model ID when no aliases exist.

Changes

  • proxy/metrics_monitor.go - Wall-clock timing fix, vLLM speed fallback, config field and alias resolution
  • proxy/metrics_monitor_test.go - Updated newMetricsMonitor calls to include config.Config{}
  • proxy/proxymanager.go - Pass proxyConfig to newMetricsMonitor(), capture requestStart before proxying

Testing

  • vLLM 35B TurboQuant: prompt_per_second goes from -1 to 53.6 (warm cache, 224ms duration)
  • llama.cpp metrics unchanged (still uses precise timings field when available)
  • Activity table now shows qwen3.6-35b-a3b-uncensored instead of llama-cpp-5090 for aliased models

When aliases are configured for a model (e.g., qwen3.6-35b-a3b-uncensored -> llama-cpp-5090),
the Activity table now shows the user-facing alias instead of the internal model ID.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 463e0abc-d555-4652-8de6-e9524f07123f

📥 Commits

Reviewing files that changed from the base of the PR and between 3566b86 and f5123e4.

📒 Files selected for processing (4)
  • event/default_test.go
  • event/event_test.go
  • proxy/metrics_monitor.go
  • proxy/metrics_monitor_test.go
✅ Files skipped from review due to trivial changes (2)
  • event/default_test.go
  • event/event_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • proxy/metrics_monitor.go
  • proxy/metrics_monitor_test.go

Walkthrough

metricsMonitor now stores a config.Config and its constructor accepts it. Model identifiers in TokenMetrics are mapped to configured aliases. Request timing is measured from a wall-clock start propagated into streaming and non-streaming parsing; parseMetrics falls back to wall-clock-derived rates when internal timings are absent. Tests and ProxyManager updated accordingly.

Changes

Cohort / File(s) Summary
Metrics Monitor Implementation
proxy/metrics_monitor.go
metricsMonitor holds a config.Config; newMetricsMonitor signature changed to accept the config first. Model names in TokenMetrics are replaced by configured aliases (first alias). Request start time is captured via wall-clock and passed into streaming/non-streaming parsing; DurationMs computed from wall-clock. parseMetrics computes prompt/tokens-per-second from wall-clock when timings missing.
Metrics Monitor Tests
proxy/metrics_monitor_test.go
Unit tests and benchmarks updated to call newMetricsMonitor with config.Config{} as the leading argument; added proxy/config import.
Proxy Initialization
proxy/proxymanager.go
ProxyManager.New updated to pass proxyConfig into newMetricsMonitor (constructor call site adjusted).
Event Tests Formatting
event/default_test.go, event/event_test.go
Non-functional formatting/line-ending normalizations; test logic unchanged.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • mostlygeek
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.71% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the three main fixes: wall-clock timing, vLLM speed fallback, and alias display, directly matching the changeset objectives.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, detailing three specific issues fixed with clear explanations and testing results.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
proxy/metrics_monitor.go (1)

98-127: ⚠️ Potential issue | 🟠 Major

Fix gofmt alignment (CI blocker).

The Linux CI gofmt check is failing. The struct field alignment at line 99 and the composite literal alignment at line 118 are off. Run gofmt -w proxy/metrics_monitor.go to fix.

🛠 Proposed alignment fix
 type metricsMonitor struct {
-	config   config.Config
-	mu         sync.RWMutex
+	config     config.Config
+	mu         sync.RWMutex
 	metrics    []TokenMetrics
 	maxMetrics int
 	nextID     int
 	logger     *LogMonitor
 ...
 }

 func newMetricsMonitor(cfg config.Config, logger *LogMonitor, maxMetrics int, captureBufferMB int) *metricsMonitor {
 	return &metricsMonitor{
-		config:     cfg,
-		logger:         logger,
-		maxMetrics:     maxMetrics,
+		config:         cfg,
+		logger:         logger,
+		maxMetrics:     maxMetrics,
 		enableCaptures: captureBufferMB > 0,
 		captures:       make(map[int][]byte),
 		captureOrder:   make([]int, 0),
 		captureSize:    0,
 		maxCaptureSize: captureBufferMB * 1024 * 1024,
 	}
 }

As per coding guidelines: "Run gofmt -l . before committing to verify formatting. Fix any reported files with gofmt -w <file>."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@proxy/metrics_monitor.go` around lines 98 - 127, The struct field and
composite literal alignment in metricsMonitor and its constructor
newMetricsMonitor are misformatted causing gofmt CI failure; run `gofmt -w
proxy/metrics_monitor.go` (or manually align the struct fields like mu, metrics,
maxMetrics, nextID, logger and align the return composite literal fields) to fix
formatting so the metricsMonitor type declaration and the newMetricsMonitor
return struct use proper gofmt spacing and tabs.
🧹 Nitpick comments (2)
proxy/metrics_monitor.go (1)

99-99: Consider storing *config.Config instead of a copy.

config.Config is a large struct (many fields, several maps). Storing it by value copies the struct on every newMetricsMonitor call. Since the inner maps/slices share backing storage this is correct, but a pointer would avoid the copy and keep the monitor in sync if the config is ever mutated elsewhere. Other components in this package (e.g. ProxyManager.config) also hold a value copy, so this is consistent but worth flagging for consideration.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@proxy/metrics_monitor.go` at line 99, The MetricsMonitor currently stores a
config.Config by value (field named "config"), causing unnecessary copies and
preventing live updates; change the field to store a pointer (*config.Config)
and update constructor/newMetricsMonitor and any references that access
monitor.config to use the pointer (dereference where needed) so the monitor
keeps a single shared config instance and avoids struct copying; also adjust any
method receivers or assignments that pass/configure the config to accept or
provide *config.Config to maintain type consistency.
proxy/metrics_monitor_test.go (1)

15-20: Import grouping nit.

The new config import is placed between gin-gonic and event, splitting the github.com/mostlygeek/llama-swap/* block. goimports would group and sort these together. Consider:

♻️ Proposed reordering
 	"github.com/gin-gonic/gin"
-	"github.com/mostlygeek/llama-swap/proxy/config"
 	"github.com/mostlygeek/llama-swap/event"
+	"github.com/mostlygeek/llama-swap/proxy/config"
 	"github.com/stretchr/testify/assert"
 	"github.com/tidwall/gjson"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@proxy/metrics_monitor_test.go` around lines 15 - 20, Import ordering is
inconsistent: move the "github.com/mostlygeek/llama-swap/config" import into the
existing github.com/mostlygeek/llama-swap block so all same-module imports are
contiguous and then run goimports/gofmt to sort and format the import block
(i.e., reorder imports in the imports list to group
github.com/mostlygeek/llama-swap/* together with event and config adjacent).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@proxy/metrics_monitor.go`:
- Around line 98-127: The struct field and composite literal alignment in
metricsMonitor and its constructor newMetricsMonitor are misformatted causing
gofmt CI failure; run `gofmt -w proxy/metrics_monitor.go` (or manually align the
struct fields like mu, metrics, maxMetrics, nextID, logger and align the return
composite literal fields) to fix formatting so the metricsMonitor type
declaration and the newMetricsMonitor return struct use proper gofmt spacing and
tabs.

---

Nitpick comments:
In `@proxy/metrics_monitor_test.go`:
- Around line 15-20: Import ordering is inconsistent: move the
"github.com/mostlygeek/llama-swap/config" import into the existing
github.com/mostlygeek/llama-swap block so all same-module imports are contiguous
and then run goimports/gofmt to sort and format the import block (i.e., reorder
imports in the imports list to group github.com/mostlygeek/llama-swap/* together
with event and config adjacent).

In `@proxy/metrics_monitor.go`:
- Line 99: The MetricsMonitor currently stores a config.Config by value (field
named "config"), causing unnecessary copies and preventing live updates; change
the field to store a pointer (*config.Config) and update
constructor/newMetricsMonitor and any references that access monitor.config to
use the pointer (dereference where needed) so the monitor keeps a single shared
config instance and avoids struct copying; also adjust any method receivers or
assignments that pass/configure the config to accept or provide *config.Config
to maintain type consistency.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: dfb012ba-71dd-40f7-9173-e9701b0d26f9

📥 Commits

Reviewing files that changed from the base of the PR and between 3cd7837 and 84875ee.

📒 Files selected for processing (3)
  • proxy/metrics_monitor.go
  • proxy/metrics_monitor_test.go
  • proxy/proxymanager.go

@efschu efschu changed the title fix: display model alias instead of model ID in activity metrics fix: activity metrics wall-clock timing, vLLM speed fallback, and alias display Apr 24, 2026
@efschu efschu force-pushed the fix-activity-model-alias branch from 3566b86 to f5123e4 Compare April 24, 2026 13:26
@aw1597
Copy link
Copy Markdown

aw1597 commented May 11, 2026

Thumbs up! Today, with a VLLM backend (docker), duration has value, but prompt speed and Gen speed are either unknown or "0.00 t/s" on the activity page. Appreciate if this can be merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants