Skip to content

snapshot, blockchain, pruner: fix ZK/MPT state root compatibility for prune-state#303

Closed
curryxbo wants to merge 5 commits intomainfrom
fix/snapshot-zk-mpt-compat
Closed

snapshot, blockchain, pruner: fix ZK/MPT state root compatibility for prune-state#303
curryxbo wants to merge 5 commits intomainfrom
fix/snapshot-zk-mpt-compat

Conversation

@curryxbo
Copy link
Copy Markdown
Contributor

@curryxbo curryxbo commented Mar 19, 2026

Summary

Changes

snapshot/generate.go: Resolve zkRoot → mptRoot before snapshot generation so trie walk uses correct on-disk root

snapshot/journal.go: Accept snapshot whose disk root equals MPT translation of requested zkRoot, avoiding unnecessary rebuild

snapshot/snapshot.go: Use translated base.root as map key in Rebuild so subsequent lookups work

blockchain.go: Translate zkRoot → mptRoot in repair backward-walk, head rewind boundary search, and shutdown journal path

pruner.go:

  • Translate head zkRoot → mptRoot before Snapshots() lookup
  • Fall back to DiskRoot() when < 128 diff layers available (e.g. after Rebuild)
  • Guard Cap(root, 0) when root is already the disk layer
  • Translate genesis zkRoot → mptRoot in extractGenesis()

Test plan

  • Tested prune-state on node with ZK-era blocks — completes successfully
  • Run existing snapshot tests: go test ./core/state/snapshot/...
  • New ZK-MPT compatibility tests pass: go test -run TestZkMpt ./core/state/snapshot/...

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Snapshot recovery, journaling, generation and rebuild now consistently resolve and use on-disk state roots so on-disk layers are compared and persisted correctly.
    • Pruning selects correct disk-layer targets, retries gracefully when needed, and handles cases with insufficient diff layers more robustly.
    • Snapshot loading now accepts translated-root matches to avoid false mismatch errors; genesis reconstruction uses on-disk roots for trie lookups.
  • Tests

    • Added compatibility tests covering snapshot generation, loading, rebuilds, and root-translation scenarios.

corey and others added 2 commits March 20, 2026 00:19
PR #300 added a fallback path in NewPruner that retried snapshot
initialisation with the persisted disk snapshot root when the journal
was missing, and bypassed the 128-layer requirement in Prune(). This
was a workaround for symptoms caused by ZK/MPT root mismatches and
missing diff layers, not a fix for the root cause.

Revert pruner.go to upstream geth behavior. The actual root causes are
addressed in the following commit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… prune-state

MPT nodes syncing ZK-era blocks store a zkStateRoot (Poseidon hash) in
the block header while the on-disk trie and snapshot layers use the
locally-computed mptStateRoot (Keccak256). This mismatch causes several
failures in the snapshot and pruning paths.

snapshot/generate.go:
  Resolve zkStateRoot → mptStateRoot before generateSnapshot() so the
  trie walk uses the correct on-disk root and WriteSnapshotRoot records
  the right key.

snapshot/journal.go (loadSnapshot):
  Accept an existing snapshot whose disk root equals the MPT translation
  of the requested zkStateRoot, avoiding an unnecessary full rebuild.

snapshot/snapshot.go (Rebuild):
  Use base.root (post-translation) as the map key so subsequent
  Snapshot()/Update() lookups find the layer correctly.

blockchain.go (NewBlockChain):
  Translate a persisted zkSnapshotRoot to mptStateRoot before the
  backward-walk repair loop so apples-to-apples root comparison works.

blockchain.go (setHeadBeyondRoot):
  Translate each block's zkRoot to mptRoot when searching for the
  beyondRoot boundary during head rewind.

blockchain.go (Stop):
  Translate CurrentBlock().Root() to mptRoot before calling
  snaps.Journal() so the correct snapshot layer is found and journalled.

pruner.go (Prune):
  Translate head zkRoot → mptRoot before Snapshots() lookup. When fewer
  than 128 diff layers are available (e.g. after a Rebuild or fresh
  snapshot), fall back to DiskRoot() as the pruning target.

pruner.go (prune):
  Guard Cap(root, 0) so it is skipped when root is already the disk
  layer — Cap requires a diffLayer and returns an error otherwise.

pruner.go (extractGenesis):
  Translate genesis zkRoot → mptRoot so trie.NewSecure can find nodes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@curryxbo curryxbo requested a review from a team as a code owner March 19, 2026 16:23
@curryxbo curryxbo requested review from Web3Jumb0 and removed request for a team March 19, 2026 16:23
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 19, 2026

📝 Walkthrough

Walkthrough

Resolve ZK-era (Poseidon) state roots to on-disk MPT (Keccak) state roots via rawdb.ReadDiskStateRoot across startup recovery, snapshot generation/loading/journaling, pruning, trie construction, and snapshot layer keying.

Changes

Cohort / File(s) Summary
Blockchain root handling
core/blockchain.go
Translate block/state roots via rawdb.ReadDiskStateRoot during startup snapshot recovery, setHeadBeyondRoot checks, and when journaling snapshot state so comparisons and journal keys use on-disk MPT roots.
Pruner & genesis trie
core/state/pruner/pruner.go
Removed persisted snapDiskRoot; retry snapshot initialization from rawdb.ReadSnapshotRoot on failure; translate HEAD/root via rawdb.ReadDiskStateRoot before snapshot lookups and when selecting prune targets; translate genesis root before constructing secure trie; updated prune Cap behavior and recovery checks.
Snapshot generation
core/state/snapshot/generate.go
Translate provided root (zk → mpt) via rawdb.ReadDiskStateRoot before initializing snapshot; log resolved roots and persist the translated disk root for generation and background work.
Snapshot journal loading
core/state/snapshot/journal.go
Accept a snapshot whose stored head equals the translated disk root of the requested root (rawdb.ReadDiskStateRoot) — log and proceed instead of erroring on mismatch when translation accounts for it.
Snapshot layer keying
core/state/snapshot/snapshot.go
Tree.Rebuild keys new disk layer by the actual root returned from generateSnapshot (the translated disk root) rather than the originally requested root.
Compatibility tests
core/state/snapshot/zk_mpt_compat_test.go
Added tests for generate/load/rebuild with and without zk↔mpt root translation, verifying persisted roots, generation completion, load acceptance/rejection, and internal layer keying.

Sequence Diagram(s)

sequenceDiagram
    participant Node as Node/Blockchain
    participant RawDB as rawdb (disk)
    participant Snaps as Snapshot subsystem
    participant Pruner as Pruner

    Node->>RawDB: ReadSnapshotRoot() / ReadDiskStateRoot(requestedRoot)
    RawDB-->>Node: return mptRoot or error
    Node->>Snaps: generate/load/journal using mptRoot
    Snaps-->>RawDB: WriteSnapshotRoot(mptRoot)
    Node->>Pruner: Prune/Recover with translated head root (mptRoot)
    Pruner->>Snaps: Snapshots(mptRoot)
    Snaps-->>Pruner: return diff layers / disk root
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Possibly related PRs

Suggested labels

bug

Suggested reviewers

  • panos-xyz
  • twcctop

Poem

🐰 I hopped through roots both Poseidon and Keccak,

Translated each hash so snapshots line up quick,
I nudged the pruner, journal, and tree,
Now disk roots and layers match happily,
A tiny hop — all roots sing in sync!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly identifies the main change: fixing ZK/MPT state root compatibility issues in snapshot, blockchain, and pruner code paths related to prune-state functionality.
Docstring Coverage ✅ Passed Docstring coverage is 84.62% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/snapshot-zk-mpt-compat

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.4)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
core/state/snapshot/journal.go (1)

168-171: Consider adding a defensive check against zero-hash matches.

The condition err == nil && head == translated could theoretically produce a false positive if both head and translated are zero hashes. While this is unlikely in practice (a valid snapshot's Root() shouldn't be zero, and ReadDiskStateRoot returns an error for missing keys), a defensive check would prevent accepting a corrupted or incomplete state.

🛡️ Optional defensive enhancement
-		if translated, err := rawdb.ReadDiskStateRoot(diskdb, root); err == nil && head == translated {
+		if translated, err := rawdb.ReadDiskStateRoot(diskdb, root); err == nil && translated != (common.Hash{}) && head == translated {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/state/snapshot/journal.go` around lines 168 - 171, The current
acceptance condition uses rawdb.ReadDiskStateRoot(diskdb, root) and compares
translated == head, which can incorrectly match if both are zero hashes; modify
the condition in the snapshot acceptance branch to also check that neither head
nor translated is the zero/root-zero value (e.g., compare against common zero
hash representation used in this codebase) before logging "Snapshot root is MPT
translation..." and accepting; update the if that references
rawdb.ReadDiskStateRoot, head, translated and root to include the non-zero check
and keep existing recovery handling unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@core/state/snapshot/journal.go`:
- Around line 168-171: The current acceptance condition uses
rawdb.ReadDiskStateRoot(diskdb, root) and compares translated == head, which can
incorrectly match if both are zero hashes; modify the condition in the snapshot
acceptance branch to also check that neither head nor translated is the
zero/root-zero value (e.g., compare against common zero hash representation used
in this codebase) before logging "Snapshot root is MPT translation..." and
accepting; update the if that references rawdb.ReadDiskStateRoot, head,
translated and root to include the non-zero check and keep existing recovery
handling unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a9bd838d-dbba-462a-a78a-0cff403e9688

📥 Commits

Reviewing files that changed from the base of the PR and between b3c5552 and 41186b6.

📒 Files selected for processing (6)
  • core/blockchain.go
  • core/state/pruner/pruner.go
  • core/state/snapshot/generate.go
  • core/state/snapshot/journal.go
  • core/state/snapshot/snapshot.go
  • core/state/snapshot/zk_mpt_compat_test.go

When data is copied from a running node or after an unclean shutdown,
the snapshot journal is not written and in-memory diff layers are lost.
snapshot.New(headBlock.Root()) fails because the on-disk snapshot root
lags behind the chain head by up to 128 blocks.

Retry with the persisted disk snapshot root so prune-state can proceed
without requiring the operator to run the node first. The pruning target
will be at most 128 blocks behind the chain head; the node will
re-execute those blocks on the next restart.

Co-authored-by: corey <corey.zhang@bitget.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
core/state/pruner/pruner.go (1)

419-434: ⚠️ Potential issue | 🟠 Major

Missing ZK→MPT root translation in RecoverPruning.

The Prune() method now translates headRoot via rawdb.ReadDiskStateRoot before calling Snapshots() (lines 279-283), but RecoverPruning at line 421 still uses headBlock.Root() directly. Since snapshot layers are keyed by mptRoot, this lookup will fail for ZK-era blocks, causing the recovery to incorrectly report "non-existent target state".

🐛 Proposed fix to add root translation
 	// still feasible to recover the pruning correctly.
 	snaptree, err := snapshot.New(db, trie.NewDatabase(db), 256, headBlock.Root(), false, false, true)
 	if err != nil {
 		return err // The relevant snapshot(s) might not exist
 	}
 	stateBloom, err := NewStateBloomFromDisk(stateBloomPath)
 	if err != nil {
 		return err
 	}
 	log.Info("Loaded state bloom filter", "path", stateBloomPath)

 	// Before start the pruning, delete the clean trie cache first.
 	// It's necessary otherwise in the next restart we will hit the
 	// deleted state root in the "clean cache" so that the incomplete
 	// state is picked for usage.
 	deleteCleanTrieCache(trieCachePath)

 	// All the state roots of the middle layers should be forcibly pruned,
 	// otherwise the dangling state will be left.
+	// The block header may carry a zkStateRoot while snapshot layers are
+	// keyed by mptStateRoot. Translate before lookup.
+	headRoot := headBlock.Root()
+	if mptRoot, err := rawdb.ReadDiskStateRoot(db, headRoot); err == nil {
+		headRoot = mptRoot
+	}
 	var (
 		found       bool
-		layers      = snaptree.Snapshots(headBlock.Root(), 128, true)
+		layers      = snaptree.Snapshots(headRoot, 128, true)
 		middleRoots = make(map[common.Hash]struct{})
 	)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/state/pruner/pruner.go` around lines 419 - 434, RecoverPruning currently
uses headBlock.Root() to look up snapshot layers but Snapshots() expects an MPT
root (the same translation done in Prune via rawdb.ReadDiskStateRoot); update
RecoverPruning to translate the head root before calling snaptree.Snapshots by
calling rawdb.ReadDiskStateRoot (same logic used in Prune) and use that
translated mptRoot in the Snapshots() call and comparisons (replace usages of
headBlock.Root() in this block with the translated root), ensuring
stateBloomRoot is compared against the correct MPT-rooted layers and middleRoots
is populated with those MPT roots.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@core/state/pruner/pruner.go`:
- Around line 419-434: RecoverPruning currently uses headBlock.Root() to look up
snapshot layers but Snapshots() expects an MPT root (the same translation done
in Prune via rawdb.ReadDiskStateRoot); update RecoverPruning to translate the
head root before calling snaptree.Snapshots by calling rawdb.ReadDiskStateRoot
(same logic used in Prune) and use that translated mptRoot in the Snapshots()
call and comparisons (replace usages of headBlock.Root() in this block with the
translated root), ensuring stateBloomRoot is compared against the correct
MPT-rooted layers and middleRoots is populated with those MPT roots.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0681860a-02b8-47db-808e-04132b501231

📥 Commits

Reviewing files that changed from the base of the PR and between 41186b6 and 7511887.

📒 Files selected for processing (1)
  • core/state/pruner/pruner.go

@curryxbo
Copy link
Copy Markdown
Contributor Author

curryxbo commented Mar 24, 2026

Re: the observation about missing ZK→MPT root translation in RecoverPruning (line 407):

Valid observation — this has now been addressed in the latest commit.

Three issues were fixed in RecoverPruning:

  1. Missing snapshot.New fallback: Added the same fallback as NewPruner — if snapshot.New(headBlock.Root()) fails, retry with rawdb.ReadSnapshotRoot(). This handles the case where the journal is missing (e.g. node was stopped uncleanly before the journal fix).

  2. Missing ZK→MPT root translation: snaptree.Snapshots() now translates headBlock.Root() (zkStateRoot) to mptRoot before the call, consistent with the fix in Prune().

  3. DiskRoot excluded by nodisk=true: Even with correct translation, if the node has fewer than 128 diff layers (e.g. freshly copied archive data), Snapshots(..., nodisk=true) returns empty. Added an explicit snaptree.DiskRoot() == stateBloomRoot check as a fallback so recovery can proceed using the disk layer directly.

After #304 merged, NewPruner can succeed via the disk-root fallback and
create a bloom filter. If prune-state is then interrupted, RecoverPruning
is triggered on the next run. Three issues needed fixing:

1. snapshot.New fallback: same as NewPruner — when the journal is missing,
   retry with ReadSnapshotRoot() so recovery can initialise the snapshot tree.

2. Snapshots() translation: headBlock.Root() may be a zkStateRoot; translate
   to mptStateRoot before the lookup so diff layers (keyed by mptRoot) can
   be found.

3. DiskRoot check: when the original prune used DiskRoot() as target (our
   fallback path), stateBloomRoot equals the disk layer root. Snapshots()
   with nodisk=true excludes the disk layer, so check snaptree.DiskRoot()
   explicitly to avoid a spurious "non-existent target state" error.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@curryxbo curryxbo self-assigned this Mar 24, 2026
@curryxbo curryxbo added this to the improvement milestone Mar 24, 2026
Copy link
Copy Markdown
Contributor

@panos-xyz panos-xyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the latest stack on this PR and found two issues that still look unresolved:

  1. core/state/snapshot/journal.go / core/state/snapshot/snapshot.go fixed the load/rebuild side by keying snapshot layers with the translated mptRoot, but runtime consumers still probe the snapshot tree with the header zkRoot. The main gap is state.New() in core/state/statedb.go: it still does snaps.Snapshot(root) with the header root, and StateDB.Commit() only calls snaps.Update() when that lookup succeeds. In other words, the snapshot can now load/rebuild successfully, but once new ZK-era blocks are imported the snapshot may stop advancing because the runtime path still misses the translated layer. skipBlock() in core/blockchain.go has the same lookup pattern, so it can also keep treating the snapshot as missing. I think the lookup side needs the same root normalization before calling Tree.Snapshot() / Tree.Snapshots(), plus an end-to-end regression test that loads the translated snapshot and then imports another ZK-era block.

  2. core/state/pruner/pruner.go only applies the ZK->MPT translation in the default target-selection path (root == 0). If the operator explicitly runs geth snapshot prune-state <root>, the supplied root is still treated as a trie/snapshot root directly. For ZK-era blocks that explicit argument is usually the header zkRoot, so ReadTrieNode / snapshot.GenerateTrie will still fail or behave incorrectly. I think the user-specified root should also be normalized through ReadDiskStateRoot, and this needs a focused test for Prune(explicitZkRoot).

When the operator runs `geth snapshot prune-state <root>` with a
zkStateRoot (Poseidon hash), the supplied root was previously used
directly for trie/snapshot lookups, causing ReadTrieNode and snapshot
layer lookups to fail silently for ZK-era blocks.

Apply the same zkRoot→mptRoot translation (via ReadDiskStateRoot) to
the user-supplied root before entering the pruning logic, consistent
with the translation already applied in the auto-select path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@curryxbo
Copy link
Copy Markdown
Contributor Author

curryxbo commented Mar 30, 2026

Thanks for the thorough review, @panos-xyz.

Issue 2 (explicit root in Prune()) — Fixed in commit 6c8cc2d. The user-supplied root is now normalized through ReadDiskStateRoot before entering the pruning logic, consistent with the auto-select path.

Issue 1 (runtime Snapshot(zkRoot) lookup) — Confirmed. Tree.Snapshot() / Tree.Snapshots() do a direct map lookup without zkRoot→mptRoot translation, so statedb.New gets nil, snaps.Update() is never called, and diff layers never accumulate during block import. skipBlock has the same gap.

This has been addressed in a separate PR: #306 (tracked by issue #305). It adds ReadDiskStateRoot normalization at the top of both Snapshot() and Snapshots(). This is kept separate from the prune-state fixes because it touches the hot block-processing path and requires broader testing.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@core/state/pruner/pruner.go`:
- Around line 273-278: After calling rawdb.ReadDiskStateRoot in the
explicit-root branch, add a validation that the returned mptRoot is not the zero
value (common.Hash{}); if it is zero, do not overwrite root (preserve the
user-supplied explicit root), log a warning including the zkRoot and that the
translated disk mapping was empty/corrupted, and avoid falling through into
auto-selection; implement this check immediately after the successful
ReadDiskStateRoot call that sets mptRoot so you only replace root when mptRoot
!= (common.Hash{}).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 98cd609f-799c-4cda-86c3-a8443aafaf5a

📥 Commits

Reviewing files that changed from the base of the PR and between 915bac9 and 6c8cc2d.

📒 Files selected for processing (1)
  • core/state/pruner/pruner.go

Comment thread core/state/pruner/pruner.go
@curryxbo curryxbo closed this Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants