Skip to content

Canonical fix refactor#10972

Merged
svlachakis merged 3 commits intocanonical-fixfrom
canonical-fix-refactor
Mar 27, 2026
Merged

Canonical fix refactor#10972
svlachakis merged 3 commits intocanonical-fixfrom
canonical-fix-refactor

Conversation

@LukaszRozmej
Copy link
Copy Markdown
Member

De-duplicatetests + small changes in BlockTree

@svlachakis svlachakis merged commit e2a9322 into canonical-fix Mar 27, 2026
390 of 392 checks passed
@svlachakis svlachakis deleted the canonical-fix-refactor branch March 27, 2026 12:44
svlachakis added a commit that referenced this pull request Mar 27, 2026
…ealing) (#10876)

* FCU to canonical ancestor silently ignored, leaving descendants canonical

* fix comments

* fix comment

* change approach

* remove empty line

* HealCanonicalChain implementation & tests

* fix build

* ePBS FCU fix

* geth parity with https://github.com/ethereum/go-ethereum/blob/745b0a8c09ad9d0866da67403ffa99d11ba70ec3/core/rawdb/accessors_chain.go#L47

return null for orphaned heights post-merge in GetBlockHashOnMainOrBestDifficultyHash

* GetBlockHashOnMainOrBestDifficultyHash now returns null when
HasBlockOnMainChain=true but WasProcessed=false and blockNumber > Head.Number.

Beacon sync calls UpdateMainChain(wereProcessed=false), setting
HasBlockOnMainChain without advancing Head. If this races with a
cleanup FCU where previousHeadNumber == lastNumber, the upward scan
runs before the marker is set and cannot clear it — leaving a stale
marker that eth_getBlockByNumber would return as canonical.

The write-time scan cannot close this window; a read-time guard can.
WasProcessed=false precisely identifies beacon-sync markers: processed
canonical blocks always have WasProcessed=true, so startup/reload
paths are unaffected.

* revert change for WasProcessed=false

* test: add beacon sync + reorg stale marker reproduction test

Reproduction of the stale canonical markers bug from the Engine API
test generator: beacon sync marks H+1, H+2, H+3 canonical without
advancing Head, then FCU reorgs to a sibling at the same height as
Head. Verifies all orphaned levels are de-canonicalized.

* test: add failing gap test for beacon sync race condition

Adds UpdateMainChain_beacon_sync_gap_in_stale_markers_leaves_orphan_after_reorg
which reproduces the scenario where a concurrent MoveToMain creates a gap in
stale canonical markers during beacon sync. The break-on-first-gap upward scan
stops at the gap and leaves d3 orphaned as stale canonical.

This test FAILS on canonical-fix (expected) and PASSES on bounded-scan.

* fix: skip gaps in upward scan instead of breaking

Change Phase 2 upward scan and ClearStaleMarkersAbove to continue past
levels where HasBlockOnMainChain is false, breaking only when the level
does not exist. This handles gaps left by concurrent MoveToMain without
needing a BestKnownNumber bound.

* PR cleanup

* remove misleading comment

* minor copilot comments

* review comments

* Canonical fix refactor (#10972)

* small refactors in BlockTree

* de-duplicate tests

* Lukasz review

* fixes

* Revert "fixes"

This reverts commit b7a54d1.

* Lukasz review - Refactoring

* fixes

* revert blocktree registration

---------

Co-authored-by: Kamil Chodoła <43241881+kamilchodola@users.noreply.github.com>
Co-authored-by: Kamil Chodoła <kamil.chodola@gmail.com>
Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com>
stdevMac pushed a commit that referenced this pull request Mar 27, 2026
…ealing) (#10876)

* FCU to canonical ancestor silently ignored, leaving descendants canonical

* fix comments

* fix comment

* change approach

* remove empty line

* HealCanonicalChain implementation & tests

* fix build

* ePBS FCU fix

* geth parity with https://github.com/ethereum/go-ethereum/blob/745b0a8c09ad9d0866da67403ffa99d11ba70ec3/core/rawdb/accessors_chain.go#L47

return null for orphaned heights post-merge in GetBlockHashOnMainOrBestDifficultyHash

* GetBlockHashOnMainOrBestDifficultyHash now returns null when
HasBlockOnMainChain=true but WasProcessed=false and blockNumber > Head.Number.

Beacon sync calls UpdateMainChain(wereProcessed=false), setting
HasBlockOnMainChain without advancing Head. If this races with a
cleanup FCU where previousHeadNumber == lastNumber, the upward scan
runs before the marker is set and cannot clear it — leaving a stale
marker that eth_getBlockByNumber would return as canonical.

The write-time scan cannot close this window; a read-time guard can.
WasProcessed=false precisely identifies beacon-sync markers: processed
canonical blocks always have WasProcessed=true, so startup/reload
paths are unaffected.

* revert change for WasProcessed=false

* test: add beacon sync + reorg stale marker reproduction test

Reproduction of the stale canonical markers bug from the Engine API
test generator: beacon sync marks H+1, H+2, H+3 canonical without
advancing Head, then FCU reorgs to a sibling at the same height as
Head. Verifies all orphaned levels are de-canonicalized.

* test: add failing gap test for beacon sync race condition

Adds UpdateMainChain_beacon_sync_gap_in_stale_markers_leaves_orphan_after_reorg
which reproduces the scenario where a concurrent MoveToMain creates a gap in
stale canonical markers during beacon sync. The break-on-first-gap upward scan
stops at the gap and leaves d3 orphaned as stale canonical.

This test FAILS on canonical-fix (expected) and PASSES on bounded-scan.

* fix: skip gaps in upward scan instead of breaking

Change Phase 2 upward scan and ClearStaleMarkersAbove to continue past
levels where HasBlockOnMainChain is false, breaking only when the level
does not exist. This handles gaps left by concurrent MoveToMain without
needing a BestKnownNumber bound.

* PR cleanup

* remove misleading comment

* minor copilot comments

* review comments

* Canonical fix refactor (#10972)

* small refactors in BlockTree

* de-duplicate tests

* Lukasz review

* fixes

* Revert "fixes"

This reverts commit b7a54d1.

* Lukasz review - Refactoring

* fixes

* revert blocktree registration

---------

Co-authored-by: Kamil Chodoła <43241881+kamilchodola@users.noreply.github.com>
Co-authored-by: Kamil Chodoła <kamil.chodola@gmail.com>
Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com>
svlachakis added a commit that referenced this pull request Apr 3, 2026
…ealing) (#10876)

* FCU to canonical ancestor silently ignored, leaving descendants canonical

* fix comments

* fix comment

* change approach

* remove empty line

* HealCanonicalChain implementation & tests

* fix build

* ePBS FCU fix

* geth parity with https://github.com/ethereum/go-ethereum/blob/745b0a8c09ad9d0866da67403ffa99d11ba70ec3/core/rawdb/accessors_chain.go#L47

return null for orphaned heights post-merge in GetBlockHashOnMainOrBestDifficultyHash

* GetBlockHashOnMainOrBestDifficultyHash now returns null when
HasBlockOnMainChain=true but WasProcessed=false and blockNumber > Head.Number.

Beacon sync calls UpdateMainChain(wereProcessed=false), setting
HasBlockOnMainChain without advancing Head. If this races with a
cleanup FCU where previousHeadNumber == lastNumber, the upward scan
runs before the marker is set and cannot clear it — leaving a stale
marker that eth_getBlockByNumber would return as canonical.

The write-time scan cannot close this window; a read-time guard can.
WasProcessed=false precisely identifies beacon-sync markers: processed
canonical blocks always have WasProcessed=true, so startup/reload
paths are unaffected.

* revert change for WasProcessed=false

* test: add beacon sync + reorg stale marker reproduction test

Reproduction of the stale canonical markers bug from the Engine API
test generator: beacon sync marks H+1, H+2, H+3 canonical without
advancing Head, then FCU reorgs to a sibling at the same height as
Head. Verifies all orphaned levels are de-canonicalized.

* test: add failing gap test for beacon sync race condition

Adds UpdateMainChain_beacon_sync_gap_in_stale_markers_leaves_orphan_after_reorg
which reproduces the scenario where a concurrent MoveToMain creates a gap in
stale canonical markers during beacon sync. The break-on-first-gap upward scan
stops at the gap and leaves d3 orphaned as stale canonical.

This test FAILS on canonical-fix (expected) and PASSES on bounded-scan.

* fix: skip gaps in upward scan instead of breaking

Change Phase 2 upward scan and ClearStaleMarkersAbove to continue past
levels where HasBlockOnMainChain is false, breaking only when the level
does not exist. This handles gaps left by concurrent MoveToMain without
needing a BestKnownNumber bound.

* PR cleanup

* remove misleading comment

* minor copilot comments

* review comments

* Canonical fix refactor (#10972)

* small refactors in BlockTree

* de-duplicate tests

* Lukasz review

* fixes

* Revert "fixes"

This reverts commit b7a54d1.

* Lukasz review - Refactoring

* fixes

* revert blocktree registration

---------

Co-authored-by: Kamil Chodoła <43241881+kamilchodola@users.noreply.github.com>
Co-authored-by: Kamil Chodoła <kamil.chodola@gmail.com>
Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com>
# Conflicts:
#	src/Nethermind/Nethermind.Init/Modules/BlockTreeModule.cs
#	src/Nethermind/Nethermind.TxPool.Test/TestBlockTree.cs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants