fix: healthcheck not responding#6519
Conversation
WalkthroughRefactors sync-status update to an immutable read-clone-compute-write pattern, adds a 2s timeout to health probe operations to avoid stalled checks, and short-circuits peer-failure logging via a pre-check against Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@src/chain_sync/sync_status.rs`:
- Around line 162-168: The trace log is printing the old field
self.epochs_behind instead of the freshly computed local epochs_behind; update
the log::trace! call in the SyncStatus reporting code to use the local variable
epochs_behind (computed from
network_head_epoch.saturating_sub(current_head_epoch)) so the logged "epochs
behind" reflects the newly computed value rather than the stale
self.epochs_behind.
In `@src/health/endpoints.rs`:
- Around line 174-180: The current timeout check only tests that the timeout did
not elapse, not that F3 actually returned Ok(true); change the condition that
currently uses tokio::time::timeout(...).await.is_ok() to explicitly inspect the
nested Result from tokio::time::timeout and F3IsRunning::is_f3_running() (e.g.,
match or if let to require Ok(Ok(true))) so the branch only runs when the
timeout completed and the inner call returned Ok(true); update the code
surrounding F3IsRunning::is_f3_running() and MAX_REQ_DURATION_SECS accordingly.
- Around line 144-150: The current check uses
tokio::time::timeout(...).await.is_ok(), which only verifies the timeout didn't
elapse and not whether tokio::net::TcpStream::connect actually succeeded; change
the logic around tokio::time::timeout, MAX_REQ_DURATION_SECS, and
state.config.client.rpc_address to await the timeout result and explicitly
handle its nested Result: treat Ok(Ok(_stream)) as a successful connection, and
treat Ok(Err(_err)) or Err(_elapsed) as failures (log/return the RPC-down path
accordingly), so the endpoint only reports "rpc server running" when the connect
call truly succeeded.
a7c7a3d to
43be80a
Compare
43be80a to
a6bddf0
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files
... and 6 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
Summary of changes
Changes introduced in this pull request:
Reference issue to close (if applicable)
Closes #6515
Other information and links
Change checklist
Outside contributions
Summary by CodeRabbit
Bug Fixes
Refactor