fix(chart+metrics): ServiceMonitor path /lobu/metrics + rename label job→task#1053
Conversation
… /metrics
The Prometheus exporter lives on the gateway, which the unified server mounts
under /lobu. The ServiceMonitor scraped /metrics (root), which 302-redirects to
the SPA — so Prometheus marked the target DOWN ('received unsupported
Content-Type') and never ingested the watcher/scheduler metrics. Point it at
/lobu/metrics (configurable via metrics.serviceMonitor.path). Verified live:
/lobu/metrics serves lobu_scheduled_job_runs_total etc.; /metrics redirects.
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthroughThis PR makes the ServiceMonitor scrape path configurable (default ChangesMetrics configuration and scheduled-job metrics
Possibly Related PRs
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Comment |
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
…lision) Prometheus reserves the `job` label for the scrape target and overwrites any same-named metric label, collapsing per-task series — so the WatcherAutomationSilent alert (filtering job="watcher-automation") matched nothing and read as a false 'silent'. Rename the label to `task` (app + rule queries). Avoids relying on honorLabels. Found via end-to-end verification in prod.
Two alerting-correctness bugs found by end-to-end verification in prod (rules loaded but ingesting nothing / false-positive):
/lobu/metrics(gateway is mounted under/lobu); the monitor scraped/metricswhich 302-redirects → target DOWN (unsupported Content-Type). Now/lobu/metrics(configurable viametrics.serviceMonitor.path).lobu_scheduled_job_runs_totalused labeljob, which Prometheus reserves for the scrape target and overwrites → per-task series collapsed →WatcherAutomationSilent(filteringjob="watcher-automation") matched nothing and read as a false 'silent'. Renamed label totask(app + rule queries); avoids relying on honorLabels.Verified live (via temporary ServiceMonitor patches as a bridge): target UP,
lobu_scheduled_job_runs_total{task="watcher-automation",outcome="success"}ingesting, all 3 alertsinactive(healthy). This PR makes that durable so the bridge patches can drop. tsc clean; helm lint/template OK.Ships on the next release.
Summary by CodeRabbit
/lobu/metrics