fix: suppress connection_info metric when DB is unreachable using component-level ping#5708
Closed
gaantunes wants to merge 2 commits into
Closed
fix: suppress connection_info metric when DB is unreachable using component-level ping#5708gaantunes wants to merge 2 commits into
gaantunes wants to merge 2 commits into
Conversation
…ponent-level ping The connection_info metric was always emitted once on Start and never removed, even when the DB became unreachable. This caused stale metrics to be served for disconnected instances. Instead of a separate per-collector goroutine, the existing 30-second ticker in each component's Run() loop now performs a single DB ping and calls Unregister/Reregister on the ConnectionInfo collector after connectionInfoPingThreshold (3) consecutive failures/successes. This keeps the fix self-contained in the component, avoids multiplying DB pings, and applies uniformly to both postgres and mysql. ConnectionInfo gains Unregister(), Reregister(), and IsRegistered() methods that safely toggle registration state under a mutex. The label values resolved in Start() are stored on the struct so Reregister() can restore the metric with the correct labels. Made-with: Cursor
…ility package Eliminates the duplicated pingConnectionInfo method, ciPingState struct, and connectionInfoPingThreshold constant from both the postgres and mysql components by extracting them into a single PingConnectionInfo function in the shared database_observability package. The function is tested once using a lightweight mock; component-level tests that duplicated this coverage are removed. Both components now call database_observability.PingConnectionInfo directly from their existing ticker loop with a goroutine-local CIPingState. Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
connection_infometric was permanently emitted afterStart()regardless of DB reachability, causing stale metrics to persist for unreachable instances (both postgres and mysql).connection_infometric is only emitted for a given DB instance when it is available #5707), this fix consolidates the DB health check into the existing 30-second ticker loop already present in each component'sRun(). A singledb.PingContext()call per tick togglesUnregister()/Reregister()on theConnectionInfocollector afterconnectionInfoPingThreshold(3) consecutive failures/successes — no new goroutines, no new package-level types.ConnectionInfogains three new methods —Unregister(),Reregister(),IsRegistered()— that safely manage registry state under a mutex. Label values resolved inStart()are stored on the struct soReregister()can restore the metric with the correct labels. This approach is ~60 lines of net new code vs ~360 in fix(database_observability): Ensure thatconnection_infometric is only emitted for a given DB instance when it is available #5707.Test plan
TestConnectionInfo_Unregister— metric disappears from registry afterUnregister()TestConnectionInfo_Reregister— metric reappears with correct label values afterReregister()TestComponent_PingConnectionInfo_UnregistersAfterThresholdFailures— metric unregistered after 3 consecutive ping failures (postgres + mysql)TestComponent_PingConnectionInfo_ReregistersAfterThresholdSuccesses— metric re-registered after 3 consecutive ping successes (postgres + mysql)TestComponent_PingConnectionInfo_RemainsRegisteredWhilePingsSucceed— metric stays registered while pings succeed (postgres + mysql)Made with Cursor