feat: gate mining on local tip catching up to peers' best head by constwz · Pull Request #356 · bnb-chain/reth-bsc

constwz · 2026-05-20T07:29:22Z

Description

Add a peer-head-vs-local-tip lag gate in the BSC miner so a validator refuses to extend a stale tip while it is still behind the network, preventing the validator from persisting a divergent fork to disk when other sync paths (fork_recover, staged-sync pipeline) are temporarily unavailable.

Rationale

cef78d6 deliberately removed the is_syncing() mining gate that main carries, on the grounds that network.is_syncing() never clears in the all-validators-restart cold start and would deadlock mining. The replacement gate (is_network_ready_to_mine) only checks num_connected_peers() > 0, so the miner now produces blocks the instant any peer is connected — without ever checking whether the local tip has caught up to that peer.

In the same commit, the Syncing handling on new_payload was redirected from "synthesize an FCU, let engine-tree's optimistic-sync trigger the staged-sync pipeline" to "spawn fork_recover over bsc/2". This means small-gap recovery (< MAX_FORK_DEPTH) no longer flips network.is_syncing() to true, because the staged-sync pipeline never runs for those gaps — fork_recover runs in its own task.

Combined, the two changes leave a window where:

the node has connected peers (peer-ready gate passes),
the node is genuinely behind by a small gap (under PIPELINE_TRIGGER_DELTA = 2048),
fork_recover is the only active recovery and it is silently failing (bsc/2 stale tx, peer timeout, packet loss, etc.),
and the miner sees no gating signal at all and starts producing blocks on top of the stale local tip.

Each mined block is immediately FCU'd to canonical, written to MDBX, and accumulated into TrieDB pathdb difflayers. Periodic pathdb flush pushes the disk layer onto that divergent chain. After ~20 minutes a 3-validator qanet validator can carry a >1700-block divergent fork on disk that cannot self-heal — decide_startup_alignment (reth #175) re-aligns MDBX to TrieDB consistently on the wrong chain, and fork_recover cannot bridge across pathdb-gap'd ancestors.

This PR adds an independent gate that does not depend on is_syncing() or on any specific recovery path: ask peers directly what they think the head is, and refuse to mine if the local tip lags beyond a threshold.

Example

Without the gate (today on develop), a validator that fell behind during a bsc/2 startup race logs:

INFO  bsc::miner: Try new work tip_block=20004789 ...
DEBUG bsc::block_import: Sending new block to import service: number=20004790 ...
INFO  bsc::block_import: New payload returned Syncing - spawning fork recovery block_number=20004790
WARN  bsc::block_import: Fork recovery failed ... err=failed to send GetBlocksByRange command
                                       (block 20004790 NOT imported — local tip stays at 20004789)
INFO  bsc::miner: Sealing local block on parent 20004789 ...
                                       (mined on stale tip → divergent fork written to disk)

With the gate, the same race instead logs:

DEBUG bsc::miner: Skip mining: local tip lags peers' best head beyond threshold local_tip=20004789 peer_best=20006498 lag=1709 threshold=5

Local tip stays put. No fork is persisted. Once fork_recover / pipeline catches up (or operator intervenes), mining resumes automatically the next time the lag closes under threshold.

Changes

src/node/miner/bsc_miner.rs — one new async gate is_caught_up_to_peers(local_tip) invoked from try_new_work right after is_network_ready_to_mine:

network.get_all_peers().await → take max(p.best_number) across connected peers.
If at least one peer has a head: compute lag = peer_best.saturating_sub(local_tip); skip mining when lag > MINING_LAG_THRESHOLD (default 5, loose enough to absorb the one-block-in-flight window between an inbound NewBlock and our canonical-head update).
If no peer has published a head yet (cold start, all best_number = None): grant a PEER_HEAD_WAIT_TIMEOUT_SECS (5 s) grace window via a OnceLock<Instant>, then fall through and permit mining. This preserves cef78d6's all-validators-restart deadlock-break intent without using the is_syncing() signal that no longer fires for small gaps. The fallthrough emits a WARN so it is visible in operator logs.
On get_all_peers() error or missing network handle, skip mining (fail-closed).

Potential Impacts

Self-recovery is automatic: once the local tip is within 5 blocks of the highest peer head, mining resumes on the next canonical-state notification — no operator action required for routine sync delays.
Diagnostics improved: when the gate fires, the log line carries local_tip, peer_best, lag, threshold. Previously a behind-network miner produced silent divergent forks; this turns silent failure into a visible, structured WARN/DEBUG signal.
No new dependencies and no protocol-wire changes: uses reth_network::Peers::get_all_peers, which is already used elsewhere in the import service.
Compatibility with cef78d6's deadlock-break: the grace window covers the exact "no peer ever reported a head" cold start that motivated removing is_syncing(); after grace, mining proceeds.
Compatibility with fix: bsc protocol stale registry tx #355: the two PRs are complementary, not overlapping. fix: bsc protocol stale registry tx #355 lets bsc/2 self-heal when its sender channel goes stale; this PR ensures that during the window where bsc/2 (or any other recovery path) is unhealthy, the miner does not extend the chain locally. Both should land together to fully close the qanet failure mode that motivated them.
Threshold (5) deliberately loose: BSC's 0.75 s blocks mean a 5-block lag is ~3.75 s of head drift. Tighter would risk false negatives during normal NewBlock propagation; looser would risk missing the multi-second divergence window that produces deep forks.

chore: add Osaka and Mendel hardfork timestamps for testnet (#280)

hashdit-bot · 2026-05-20T07:29:31Z

Pull Request Review

This PR adds a new mining safety gate in src/node/miner/bsc_miner.rs that prevents mining when the local tip is lagging behind peers’ reported best head by more than a fixed threshold. It introduces is_caught_up_to_peers(local_tip) with fail-closed behavior on missing network handle or peer-query errors, a lag threshold (MINING_LAG_THRESHOLD = 5), and a bounded cold-start fallback window (PEER_HEAD_WAIT_TIMEOUT_SECS = 5) using OnceLock<Instant>. The gate is invoked in try_new_work immediately after the existing network-readiness check.

Sensitive Content

No sensitive content detected.

Security Issues

No serious security issues detected.

Generated by Hashdit Bot. This tool can absolutely NOT replace manual audits.

hashdit-bot · 2026-05-20T08:28:43Z

Pull Request Review

This PR adds a new mining safety gate in bsc_miner.rs that suppresses mining when the local tip lags behind peers’ reported best head by more than a threshold, with a bounded cold-start grace period when no peer head is yet available. It also introduces fail-closed behavior for missing network handle / peer query errors and wires this gate into try_new_work before block production. Additionally, it increases BSC protocol handshake timeout in stream.rs from 5s to 30s and expands documentation/comments explaining startup race conditions and rationale.

Sensitive Content

No sensitive content detected.

Security Issues

No serious security issues detected.

Generated by Hashdit Bot. This tool can absolutely NOT replace manual audits.

joey0612 and others added 9 commits March 13, 2026 14:56

Merge pull request #281 from bnb-chain/develop

dedf722

chore: add Osaka and Mendel hardfork timestamps for testnet (#280)

Merge branch 'develop' of github.com:bnb-chain/reth-bsc into develop

c954935

Merge branch 'develop' of github.com:bnb-chain/reth-bsc into develop

b1dbf04

Merge branch 'develop' of github.com:bnb-chain/reth-bsc into develop

041bf41

Merge branch 'develop' of github.com:bnb-chain/reth-bsc into develop

440994d

Merge branch 'develop' of github.com:bnb-chain/reth-bsc into develop

a4a8c17

Merge branch 'develop' of github.com:bnb-chain/reth-bsc into develop

3e2b006

Merge branch 'develop' of github.com:bnb-chain/reth-bsc into develop

778d46f

feat: mining anti fork gate

ca67d36

constwz requested a review from joey0612 as a code owner May 20, 2026 07:29

chore: relax-bsc-handshake-timeout

dc60571

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: gate mining on local tip catching up to peers' best head#356

feat: gate mining on local tip catching up to peers' best head#356
constwz wants to merge 10 commits into
developfrom
feat/mining-anti-fork-gate

constwz commented May 20, 2026

Uh oh!

hashdit-bot Bot commented May 20, 2026

Uh oh!

hashdit-bot Bot commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

constwz commented May 20, 2026

Description

Rationale

Example

Changes

Potential Impacts

Uh oh!

hashdit-bot Bot commented May 20, 2026

Pull Request Review

Sensitive Content

Security Issues

Uh oh!

hashdit-bot Bot commented May 20, 2026

Pull Request Review

Sensitive Content

Security Issues

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants