Skip to content
This repository was archived by the owner on Jan 16, 2026. It is now read-only.

Comments

metrics (supervisor): add reorg metrics to grafana#2727

Merged
itschaindev merged 6 commits intomainfrom
jk/grafana-dashboard
Aug 20, 2025
Merged

metrics (supervisor): add reorg metrics to grafana#2727
itschaindev merged 6 commits intomainfrom
jk/grafana-dashboard

Conversation

@itschaindev
Copy link
Collaborator

Closes part of #2713

@claude
Copy link
Contributor

claude bot commented Aug 19, 2025

Claude finished @itschaindev's task —— View job


PR Review: Add Reorg Metrics to Grafana Dashboard

Review Checklist

  • Read and analyze the modified Grafana dashboard JSON file
  • Check JSON structure and syntax validity
  • Review dashboard configuration for best practices
  • Analyze metrics definitions and queries
  • Check for security concerns
  • Assess performance considerations
  • Provide comprehensive feedback

Code Quality & Best Practices ✅

Positive aspects:

  • JSON structure is valid and properly formatted
  • Dashboard follows Grafana's standard schema (v40)
  • Consistent panel configuration with existing dashboard structure
  • New panels properly integrate with existing datasource configuration (${DS_PROMETHEUS})
  • Appropriate use of Grafana's templating system with $chain_id variable

The L1 Reorg section adds two well-designed panels:

  1. L1 Reorg Success (docker/recipes/kona-supervisor/grafana/dashboard/kona-supervisor.json:159) - tracks kona_supervisor_reorg_success{service="kona-supervisor-supervisor-superchain"}
  2. L1 Reorg Duration (docker/recipes/kona-supervisor/grafana/dashboard/kona-supervisor.json:261) - tracks kona_supervisor_reorg_duration_seconds{quantile="1.0"}

Potential Issues ⚠️

1. Metric Query Concerns:

# Line 261: Using quantile="1.0" (maximum) instead of more typical percentiles
kona_supervisor_reorg_duration_seconds{quantile="1.0"}

Issue: The quantile="1.0" represents the maximum duration, which may be noisy and less useful for monitoring than p95/p99.
Suggestion: Consider using quantile="0.95" or quantile="0.99" for more actionable alerting thresholds.

2. Service Filter Inconsistency:

# Line 159: Success metric has service filter
kona_supervisor_reorg_success{service="kona-supervisor-supervisor-superchain"}

# Line 261: Duration metric has no service filter
kona_supervisor_reorg_duration_seconds{quantile="1.0"}

Issue: Inconsistent filtering may lead to mismatched metrics if multiple services emit these metrics.
Suggestion: Apply consistent filtering across both queries.

3. Dashboard Layout:

  • New "L1 Reorg" section is positioned at the top (y=0), pushing existing content down
  • This is appropriate for high-priority monitoring metrics

Performance Considerations ⚠️

Query Efficiency:

  • Queries are simple and efficient for Prometheus
  • Service filtering on success metric is good for performance
  • Range queries use default interval (likely based on dashboard refresh rate)

Potential Improvement:

  • Consider adding rate() function if tracking reorg frequency is important:
rate(kona_supervisor_reorg_success{service="kona-supervisor-supervisor-superchain"}[5m])

Security Concerns ✅

No security issues identified:

  • No hardcoded credentials or sensitive data
  • Uses proper datasource templating (${DS_PROMETHEUS})
  • No external URL references or unsafe configurations
  • Follows principle of least exposure

Test Coverage 📋

Testing Recommendations:

  • Verify the new metrics are actually emitted by the supervisor service
  • Test dashboard renders correctly with and without data
  • Validate metric cardinality doesn't cause performance issues
  • Test variable interpolation works correctly with different chain IDs

Overall Assessment ✅

This is a solid addition that follows Grafana best practices and integrates well with the existing dashboard. The reorg monitoring capability addresses an important operational concern for L1 chain reorganizations.

Minor improvements suggested but not blocking:

  1. Consider using p95/p99 quantiles instead of max (1.0) for duration metrics
  2. Ensure consistent service filtering across related metrics
  3. Consider adding rate() functions if frequency monitoring is needed

The massive diff size (+2522/-307) appears to be primarily due to JSON reformatting rather than substantial new content - the actual additions are the two new panels in the L1 Reorg section.


@codecov
Copy link

codecov bot commented Aug 19, 2025

Codecov Report

❌ Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.2%. Comparing base (06cdba9) to head (93530d7).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
crates/supervisor/service/src/service.rs 0.0% 2 Missing ⚠️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@emhane emhane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice with a screenshot of the new panels in PR description

@itschaindev itschaindev self-assigned this Aug 19, 2025
@itschaindev
Copy link
Collaborator Author

Abs number view

Screenshot 2025-08-20 at 12 06 14

L1-reorg metrics

Screenshot 2025-08-20 at 12 06 25

cc: @emhane

@itschaindev itschaindev added this pull request to the merge queue Aug 20, 2025
Merged via the queue into main with commit b39c7b3 Aug 20, 2025
33 of 34 checks passed
@itschaindev itschaindev deleted the jk/grafana-dashboard branch August 20, 2025 11:47
theochap pushed a commit to ethereum-optimism/optimism that referenced this pull request Dec 10, 2025
theochap pushed a commit to ethereum-optimism/optimism that referenced this pull request Jan 14, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants