chore: add metrics and logs to debug db replica usage#3606
Conversation
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 2 Skipped Deployments
|
|
📝 WalkthroughWalkthroughThe changes introduce Prometheus-based metrics for database operations, including latency and operation counts, by instrumenting the Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Replica
participant DB
participant PrometheusMetrics
Client->>Replica: ExecContext / QueryContext / etc.
Replica->>PrometheusMetrics: Record start time
Replica->>DB: Perform DB operation
DB-->>Replica: Return result/error
Replica->>PrometheusMetrics: Observe latency, increment count (with labels)
Replica-->>Client: Return result/error
sequenceDiagram
participant App
participant DatabaseInit
participant Logger
App->>DatabaseInit: Initialize database (New)
DatabaseInit->>Logger: Log replica configuration info
DatabaseInit-->>App: Return database instance
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
Thank you for following the naming conventions for pull request titles! 🙏 |
Graphite Automations"Post a GIF when PR approved" took an action on this PR • (07/16/25)1 gif was posted to this PR based on Andreas Thomas's automation. |
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
go/pkg/db/database.go(1 hunks)go/pkg/db/replica.go(6 hunks)go/pkg/prometheus/metrics/database.go(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: chronark
PR: unkeyed/unkey#3420
File: go/pkg/hydra/store/gorm/gorm.go:486-498
Timestamp: 2025-07-02T11:51:58.572Z
Learning: The Hydra package (go/pkg/hydra) is planned to be migrated from GORM to sqlc for database operations, which explains why raw SQL queries are acceptable in the current implementation.
🧬 Code Graph Analysis (2)
go/pkg/db/database.go (1)
go/pkg/otel/logging/interface.go (1)
Logger(11-116)
go/pkg/db/replica.go (1)
go/pkg/prometheus/metrics/database.go (2)
DatabaseOperationLatency(24-33)DatabaseOperationTotal(42-50)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: Test Go API Local / Test
- GitHub Check: Build / Build
- GitHub Check: Test Agent Local / test_agent_local
- GitHub Check: Test Packages / Test ./packages/nextjs
- GitHub Check: Test Packages / Test ./packages/hono
- GitHub Check: Test Packages / Test ./apps/dashboard
- GitHub Check: Test Packages / Test ./internal/clickhouse
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (7)
go/pkg/prometheus/metrics/database.go (1)
14-51: Well-designed metrics structure for database monitoring.The metrics definitions follow Prometheus best practices with appropriate labels and metric types. The histogram for latency and counter for operations will provide valuable insights into database performance patterns.
go/pkg/db/database.go (1)
77-80: Helpful operational logging for database configuration.These log messages provide clear visibility into the database configuration, which is valuable operational context for understanding the metrics that will be collected from read-write vs read-only replicas.
go/pkg/db/replica.go (5)
35-48: Consistent and well-implemented metrics instrumentation.The metrics instrumentation follows a clean pattern: capture start time, execute operation, determine status, and record metrics. This provides accurate latency measurements and operation counts.
113-124: Consider potential limitation with QueryRowContext error tracking.The
QueryRowContextmethod always records operations as "success" since it doesn't return an error. However, SQL errors can still occur but are only discovered when scanning the row later. This could lead to misleading success rates in your metrics.Consider whether this is acceptable for your monitoring needs, or if you need a different approach to track actual query failures for
QueryRowContextoperations. You might want to:
- Accept this limitation and document it
- Implement a wrapper that tracks scan errors
- Use a different status like "completed" instead of "success" for QueryRowContext
What's your preference for handling this scenario?
61-76: Metrics instrumentation follows consistent pattern.The implementation correctly measures operation latency and tracks success/error status for the prepare operation.
87-102: Proper metrics tracking for query operations.The query operation metrics are implemented correctly with accurate timing and status determination.
135-150: Transaction begin operations properly instrumented.The Begin method correctly tracks transaction start latency and success/error status.


What does this PR do?
This PR adds Prometheus metrics for database operations to track performance and usage patterns. It instruments all database operations (exec, prepare, query, query_row, begin) to measure latency and count total operations with labels for replica type (rw/ro), operation type, and status (success/error). The PR also adds logging to indicate whether the database is configured with a separate read replica.
Fixes #
Type of change
How should this be tested?
Checklist
Required
pnpm buildpnpm fmtconsole.logsgit pull origin mainAppreciated
Summary by CodeRabbit
New Features
Other Improvements