Skip to content

proxy: expose metrics via endpoint for visualization in Grafana#509

Open
acon96 wants to merge 1 commit intomostlygeek:mainfrom
acon96:main
Open

proxy: expose metrics via endpoint for visualization in Grafana#509
acon96 wants to merge 1 commit intomostlygeek:mainfrom
acon96:main

Conversation

@acon96
Copy link
Contributor

@acon96 acon96 commented Feb 5, 2026

Expose the in-memory metrics via the /metrics endpoint in Prometheus style format for easy scraping. This allows various metrics collectors to aggregate and visualize the metrics.

Here's an example of using it with Prometheus to scrape the metrics and visualize in Grafana:
grafana

Summary by CodeRabbit

Release Notes

  • New Features

    • Added metrics endpoint for Prometheus scraping with per-model token metrics and performance statistics
    • Token aggregation and per-second rate calculations now available for monitoring
  • Documentation

    • Included example Grafana dashboard configuration for visualizing system metrics and token usage

@coderabbitai
Copy link

coderabbitai bot commented Feb 5, 2026

Walkthrough

This PR introduces Prometheus metrics export capabilities to the Llama Swap proxy system. It implements per-model token metric aggregation and rollups, adds a /metrics endpoint that exports metrics in Prometheus text format, and includes an example Grafana dashboard configuration for visualizing these metrics.

Changes

Cohort / File(s) Summary
Prometheus Metrics Export
proxy/metrics_monitor.go, proxy/proxymanager_api.go, proxy/proxymanager.go
Introduces per-model and global token rollup aggregation with metricsRollup struct. Implements getPrometheusText() to generate Prometheus-formatted output with per-model breakdown, helper functions for label formatting and escaping, and computation of tokens-per-second statistics over configurable windows. Registers new GET /metrics endpoint handler and route registration.
Grafana Dashboard Configuration
docs/examples/grafana-dashboard.json
Adds example Grafana dashboard JSON with panels for monitoring token metrics including stat, gauge, and timeseries visualizations (Prompt Processing Speed, Generation Speed, Input/Output/Cached Tokens, Cache Efficiency) using Prometheus data source configured with UID reference.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • mostlygeek
🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: exposing metrics via an endpoint for Grafana visualization. It directly reflects the core purpose of the PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@cyberfox
Copy link

This would be amazing; I use Prometheus for all my systems, including my ML server, and this would help a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants