Skip to content

fix(chart+metrics): ServiceMonitor path /lobu/metrics + rename label job→task#1053

Merged
buremba merged 2 commits into
mainfrom
fix/servicemonitor-metrics-path
May 25, 2026
Merged

fix(chart+metrics): ServiceMonitor path /lobu/metrics + rename label job→task#1053
buremba merged 2 commits into
mainfrom
fix/servicemonitor-metrics-path

Conversation

@buremba
Copy link
Copy Markdown
Member

@buremba buremba commented May 25, 2026

Two alerting-correctness bugs found by end-to-end verification in prod (rules loaded but ingesting nothing / false-positive):

  1. ServiceMonitor path — metrics live at /lobu/metrics (gateway is mounted under /lobu); the monitor scraped /metrics which 302-redirects → target DOWN (unsupported Content-Type). Now /lobu/metrics (configurable via metrics.serviceMonitor.path).
  2. Reserved-label collisionlobu_scheduled_job_runs_total used label job, which Prometheus reserves for the scrape target and overwrites → per-task series collapsed → WatcherAutomationSilent (filtering job="watcher-automation") matched nothing and read as a false 'silent'. Renamed label to task (app + rule queries); avoids relying on honorLabels.

Verified live (via temporary ServiceMonitor patches as a bridge): target UP, lobu_scheduled_job_runs_total{task="watcher-automation",outcome="success"} ingesting, all 3 alerts inactive (healthy). This PR makes that durable so the bridge patches can drop. tsc clean; helm lint/template OK.

Ships on the next release.

Summary by CodeRabbit

  • Chores
    • Made Prometheus metrics scrape path configurable with default /lobu/metrics
    • Added clarifying documentation for the metrics endpoint and scrape path
  • Bug Fixes
    • Fixed metric labeling so task-specific counts are reported separately
    • Corrected alerting rules to group and reference alerts by task (improves watcher automation detection)

Review Change Stack

… /metrics

The Prometheus exporter lives on the gateway, which the unified server mounts
under /lobu. The ServiceMonitor scraped /metrics (root), which 302-redirects to
the SPA — so Prometheus marked the target DOWN ('received unsupported
Content-Type') and never ingested the watcher/scheduler metrics. Point it at
/lobu/metrics (configurable via metrics.serviceMonitor.path). Verified live:
/lobu/metrics serves lobu_scheduled_job_runs_total etc.; /metrics redirects.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 25, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This PR makes the ServiceMonitor scrape path configurable (default /lobu/metrics) to match the gateway-mounted exporter and changes scheduled-job metrics and PrometheusRule alerts to use a task label instead of job, updating metric help text and alert templates accordingly.

Changes

Metrics configuration and scheduled-job metrics

Layer / File(s) Summary
ServiceMonitor path configuration and default
charts/lobu/templates/servicemonitor.yaml, charts/lobu/values.yaml
ServiceMonitor template now uses {{ .Values.metrics.serviceMonitor.path }} for scrape path; values.yaml documents gateway-mounted exporter and sets default /lobu/metrics.
PrometheusRule alert label changes
charts/lobu/templates/prometheusrule.yaml
WatcherAutomationSilent selector filters lobu_scheduled_job_runs_total by task="watcher-automation"; LobuScheduledJobFailing now groups by task and templates use $labels.task.
Server metric help and label usage
packages/server/src/gateway/metrics/prometheus.ts, packages/server/src/scheduled/task-scheduler.ts
Metric help for lobu_scheduled_job_runs_total updated to reference “task name”; TaskScheduler increments the counter with label task: data.name for success and error paths.

Possibly Related PRs

  • lobu-ai/lobu#1047: Modifies related watcher/scheduler Prometheus alerting and metric wiring; overlaps with metric labeling and alert rule changes.

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hopped through charts and code today,
Tuned paths where metrics come to play,
Switched labels to task, alerts align,
/lobu/metrics sings in Prometheus time.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The pull request description provides comprehensive context about the bugs, fixes, and verification, but is missing explicit test plan checkboxes as required by the template. Add a 'Test plan' section with checkboxes showing which validation steps (bun run check:fix, typecheck, etc.) were performed, as required by the repository template.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title clearly and specifically describes both main fixes: ServiceMonitor path correction and label rename from job to task.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/servicemonitor-metrics-path

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

…lision)

Prometheus reserves the `job` label for the scrape target and overwrites any
same-named metric label, collapsing per-task series — so the WatcherAutomationSilent
alert (filtering job="watcher-automation") matched nothing and read as a false
'silent'. Rename the label to `task` (app + rule queries). Avoids relying on
honorLabels. Found via end-to-end verification in prod.
@buremba buremba changed the title fix(chart): ServiceMonitor scrapes /lobu/metrics (not /metrics) fix(chart+metrics): ServiceMonitor path /lobu/metrics + rename label job→task May 25, 2026
@buremba buremba merged commit a5c3de6 into main May 25, 2026
18 of 20 checks passed
@buremba buremba deleted the fix/servicemonitor-metrics-path branch May 25, 2026 22:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants