fix(l1): clearer sync diagnostics during catch-up#6733
Conversation
- eth_syncing: report real highestBlock number instead of the truncated forkchoice head hash (was emitting garbage like 0x3abc579fc0fc31b8) - full sync: log the consensus-provided sync target and how far behind the local head is, and warn (when stale > 30m) that the consensus client may be catching up to chain head - full sync: log when a behind cycle reaches the consensus-provided head so idle-waiting is distinguishable from a hang
|
Lines of code reportTotal lines added: Detailed view |
🤖 Kimi Code ReviewThe changes look solid overall. The RPC fix is a clear bugfix, and the sync logging improvements are well-structured. A few specific points:
|
🤖 Claude Code ReviewNow I have enough context. Let me write up the review. Code Review: PR #6733 —
|
🤖 Codex Code Review
I didn’t see EVM/consensus-critical or memory-safety issues in this diff beyond those RPC/observability regressions. I did not run tests. Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
Greptile SummaryImproves execution-layer diagnostics during catch-up sync: fixes
Confidence Score: 4/5Safe to merge; changes are diagnostic-only with no effect on block execution or consensus logic. Both changed files touch only logging and the eth_syncing RPC response — no block execution, storage writes, or consensus paths are modified. The current_block fallback in eth_syncing is a deliberate trade-off that improves on the previous garbage value, though it may cause monitoring tools to show a zero sync gap during snap-sync. The humanize_secs edge case for sub-60 s inputs is unreachable today but could surface if the helper is reused. crates/networking/rpc/eth/client.rs — the highest_block fallback behaviour during snap-sync is worth a second look if any external monitoring relies on the eth_syncing gap.
|
| Filename | Overview |
|---|---|
| crates/networking/p2p/sync/full.rs | Adds per-cycle sync-target logging and a "caught up" message; timestamp-based staleness warning is correct but humanize_secs has an unreachable "0m" edge case for sub-60 s inputs. |
| crates/networking/rpc/eth/client.rs | Fixes eth_syncing.highestBlock by resolving the FCU head hash to a block number; the current_block fallback during snap-sync silences the previous garbage value but makes progress monitoring report 0 gap while syncing. |
Sequence Diagram
sequenceDiagram
participant CL as Consensus Client
participant EL as Execution (sync_cycle_full)
participant Store as Storage
participant RPC as eth_syncing RPC
CL->>EL: forkchoiceUpdated(head_hash)
EL->>Store: get_latest_block_number()
Store-->>EL: local_head
EL->>EL: "behind = target - local_head"
alt "behind > FOLLOW_DISTANCE (8)"
EL->>EL: log sync target + distance
EL->>EL: "age = now - first_header.timestamp"
alt "age > 1800s"
EL->>EL: warn consensus may be catching up
end
end
EL->>Store: "fetch & execute headers/bodies"
EL->>Store: get_latest_block_number()
alt started_behind
EL->>EL: info Reached consensus-provided head
end
Note over RPC: eth_syncing
RPC->>Store: get_latest_block_number() → current_block
RPC->>EL: get_last_fcu_head() → head_hash
RPC->>Store: get_block_number(head_hash)
alt header in storage
Store-->>RPC: block number → highest_block
else header not yet stored (snap-sync)
Store-->>RPC: None → fallback to current_block
end
RPC-->>CL: "{startingBlock, currentBlock, highestBlock}"
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
crates/networking/p2p/sync/full.rs:40-51
**`humanize_secs` returns `"0m"` for sub-minute inputs**
If `secs` is 0–59, `days`, `hours`, and `mins` are all 0, so the function returns `"0m"`. In the current call site this is unreachable (the guard is `age > STALE_FORKCHOICE_HEAD_SECS` = 1 800 s, so `mins ≥ 30`), but if the function is ever called with a smaller value — e.g. for a future diagnostic or unit test — it will silently produce a misleading output instead of e.g. `"< 1m"` or a seconds fallback.
### Issue 2 of 2
crates/networking/rpc/eth/client.rs:68-72
**`highest_block` falls back to `current_block`, hiding sync progress from consumers**
When the FCU head hash isn't in storage yet (e.g. during snap-sync), `get_block_number` returns `None` and `highest_block` is set to `current_block`. The `eth_syncing` response then shows `highestBlock == currentBlock` while the node is still actively syncing. Monitoring tools and dashboards that use the `highestBlock − currentBlock` gap to render a progress bar will show 0 % remaining and may treat the node as caught-up. Returning `0` (or any sentinel that is clearly ≤ `current_block`) would at least not falsely indicate full sync, though the PR description explicitly acknowledges this as an intentional trade-off over the previous garbage value.
Reviews (1): Last reviewed commit: "fix(l1): drop color handling from stale-..." | Re-trigger Greptile
| fn humanize_secs(secs: u64) -> String { | ||
| let days = secs / 86_400; | ||
| let hours = (secs % 86_400) / 3_600; | ||
| let mins = (secs % 3_600) / 60; | ||
| if days > 0 { | ||
| format!("{days}d {hours}h") | ||
| } else if hours > 0 { | ||
| format!("{hours}h {mins}m") | ||
| } else { | ||
| format!("{mins}m") | ||
| } | ||
| } |
There was a problem hiding this comment.
humanize_secs returns "0m" for sub-minute inputs
If secs is 0–59, days, hours, and mins are all 0, so the function returns "0m". In the current call site this is unreachable (the guard is age > STALE_FORKCHOICE_HEAD_SECS = 1 800 s, so mins ≥ 30), but if the function is ever called with a smaller value — e.g. for a future diagnostic or unit test — it will silently produce a misleading output instead of e.g. "< 1m" or a seconds fallback.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/networking/p2p/sync/full.rs
Line: 40-51
Comment:
**`humanize_secs` returns `"0m"` for sub-minute inputs**
If `secs` is 0–59, `days`, `hours`, and `mins` are all 0, so the function returns `"0m"`. In the current call site this is unreachable (the guard is `age > STALE_FORKCHOICE_HEAD_SECS` = 1 800 s, so `mins ≥ 30`), but if the function is ever called with a smaller value — e.g. for a future diagnostic or unit test — it will silently produce a misleading output instead of e.g. `"< 1m"` or a seconds fallback.
How can I resolve this? If you propose a fix, please make it concise.| let highest_block = context | ||
| .storage | ||
| .get_block_number(head_hash) | ||
| .await? | ||
| .unwrap_or(current_block); |
There was a problem hiding this comment.
highest_block falls back to current_block, hiding sync progress from consumers
When the FCU head hash isn't in storage yet (e.g. during snap-sync), get_block_number returns None and highest_block is set to current_block. The eth_syncing response then shows highestBlock == currentBlock while the node is still actively syncing. Monitoring tools and dashboards that use the highestBlock − currentBlock gap to render a progress bar will show 0 % remaining and may treat the node as caught-up. Returning 0 (or any sentinel that is clearly ≤ current_block) would at least not falsely indicate full sync, though the PR description explicitly acknowledges this as an intentional trade-off over the previous garbage value.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/networking/rpc/eth/client.rs
Line: 68-72
Comment:
**`highest_block` falls back to `current_block`, hiding sync progress from consumers**
When the FCU head hash isn't in storage yet (e.g. during snap-sync), `get_block_number` returns `None` and `highest_block` is set to `current_block`. The `eth_syncing` response then shows `highestBlock == currentBlock` while the node is still actively syncing. Monitoring tools and dashboards that use the `highestBlock − currentBlock` gap to render a progress bar will show 0 % remaining and may treat the node as caught-up. Returning `0` (or any sentinel that is clearly ≤ `current_block`) would at least not falsely indicate full sync, though the PR description explicitly acknowledges this as an intentional trade-off over the previous garbage value.
How can I resolve this? If you propose a fix, please make it concise.
ElFantasma
left a comment
There was a problem hiding this comment.
highestBlock fallback is 2-tier, but the PR description claims 3-tier. The body says it resolves the head hash "via storage, then the pending-block store, falling back to current_block" — but the code is get_block_number(head_hash).unwrap_or(current_block), and get_block_number only reads the BLOCK_NUMBERS table (verified — no pending/fork-choice store lookup). The missing middle tier matters precisely for the case this PR targets: a forkchoice head that's been received but not yet executed (mid snap-sync) won't be in BLOCK_NUMBERS, so highestBlock silently collapses to current_block and eth_syncing reports distance 0 — which is the same class of "looks synced when it isn't" the PR set out to fix. Either (a) add the pending-store lookup the body promises, or (b) correct the description to match the honest 2-tier code + the inline comment (which already says only "if the head's header isn't stored yet … fall back to the current block"). See inline.
| .storage | ||
| .get_block_number(head_hash) | ||
| .await? | ||
| .unwrap_or(current_block); |
There was a problem hiding this comment.
The PR description says this resolves the head "via storage, then the pending-block store, falling back to current_block" — three tiers — but this is two: get_block_number(head_hash) (which only reads the BLOCK_NUMBERS table; I checked) then current_block. There's no pending-block-store tier.
That middle tier is exactly the interesting case: a forkchoice head that's been received but not yet executed (the mid-snap-sync scenario this PR is about) isn't in BLOCK_NUMBERS yet, so highest_block falls straight to current_block and eth_syncing reports highestBlock == currentBlock — distance 0, i.e. "looks synced" while still catching up. That's a milder version of the garbage you're fixing, but the same failure mode.
Two options:
- If the pending/fork-choice store can resolve a received-but-unexecuted head to a number, add that lookup between the two (matches the body).
- Otherwise, reword the PR body to the honest 2-tier behavior — the inline comment above already describes it correctly, so it's just the description that overstates.
Non-blocking, but worth closing the gap since accurate eth_syncing during catch-up is the whole point.
| // warn if that head is stale (a strong signal the consensus client is behind). | ||
| if !sync_target_logged { | ||
| sync_target_logged = true; | ||
| let target = first_header.number; |
There was a problem hiding this comment.
nit: target = first_header.number is the first header of the first fetched batch, which is a good-enough proxy for the diagnostic. The PR body says the target is read from "the pre-rewind forkchoice head, not the rewound ancestor we request headers from" — but first_header is the header we request from, so after a rewind to a common ancestor this target (and therefore the behind distance) understates the true forkchoice head the consensus client advertised.
For a log line that's fine — it errs toward under-warning, not false alarms. But if you want the distance to reflect the actual FCU head (matching the body's claim), resolve sync_head to its number once at the top of the cycle and use that. Low priority.
Motivation
A home validator reported their node "not finalizing" after 2 days of snap sync. ethrex was healthy and keeping pace; the consensus client was ~2 weeks behind and only feeding forkchoice heads a few blocks ahead of execution. The execution logs made this hard to see, and
eth_syncingactively misled them.Changes
eth_syncing.highestBlockreturned garbage. It reportedlast_fcu_head().to_low_u64_be(); the low 64 bits of the forkchoice head hash, not a block number. Now resolves the head hash to a block number via storage, then the pending-block store, falling back tocurrent_blockonly when the target isn't downloaded yet (mid snap-sync).Sync target + staleness hint. On the first header batch of a cycle, log the consensus-provided forkchoice head and how far behind the local head is. If the head is more than 8 blocks ahead and older than 30 min, hint that the consensus client may still be catching up. Gated on distance so a synced node stays quiet. The target is read from the pre-rewind forkchoice head, not the rewound ancestor we request headers from.
"Caught up" log. When a cycle that started behind finishes, log that it reached the consensus-provided head and is waiting for the next forkchoice update, so idle-waiting is distinguishable from a hang.