snapshot, blockchain, pruner: fix ZK/MPT state root compatibility for prune-state#303
snapshot, blockchain, pruner: fix ZK/MPT state root compatibility for prune-state#303
Conversation
PR #300 added a fallback path in NewPruner that retried snapshot initialisation with the persisted disk snapshot root when the journal was missing, and bypassed the 128-layer requirement in Prune(). This was a workaround for symptoms caused by ZK/MPT root mismatches and missing diff layers, not a fix for the root cause. Revert pruner.go to upstream geth behavior. The actual root causes are addressed in the following commit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… prune-state MPT nodes syncing ZK-era blocks store a zkStateRoot (Poseidon hash) in the block header while the on-disk trie and snapshot layers use the locally-computed mptStateRoot (Keccak256). This mismatch causes several failures in the snapshot and pruning paths. snapshot/generate.go: Resolve zkStateRoot → mptStateRoot before generateSnapshot() so the trie walk uses the correct on-disk root and WriteSnapshotRoot records the right key. snapshot/journal.go (loadSnapshot): Accept an existing snapshot whose disk root equals the MPT translation of the requested zkStateRoot, avoiding an unnecessary full rebuild. snapshot/snapshot.go (Rebuild): Use base.root (post-translation) as the map key so subsequent Snapshot()/Update() lookups find the layer correctly. blockchain.go (NewBlockChain): Translate a persisted zkSnapshotRoot to mptStateRoot before the backward-walk repair loop so apples-to-apples root comparison works. blockchain.go (setHeadBeyondRoot): Translate each block's zkRoot to mptRoot when searching for the beyondRoot boundary during head rewind. blockchain.go (Stop): Translate CurrentBlock().Root() to mptRoot before calling snaps.Journal() so the correct snapshot layer is found and journalled. pruner.go (Prune): Translate head zkRoot → mptRoot before Snapshots() lookup. When fewer than 128 diff layers are available (e.g. after a Rebuild or fresh snapshot), fall back to DiskRoot() as the pruning target. pruner.go (prune): Guard Cap(root, 0) so it is skipped when root is already the disk layer — Cap requires a diffLayer and returns an error otherwise. pruner.go (extractGenesis): Translate genesis zkRoot → mptRoot so trie.NewSecure can find nodes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughResolve ZK-era (Poseidon) state roots to on-disk MPT (Keccak) state roots via Changes
Sequence Diagram(s)sequenceDiagram
participant Node as Node/Blockchain
participant RawDB as rawdb (disk)
participant Snaps as Snapshot subsystem
participant Pruner as Pruner
Node->>RawDB: ReadSnapshotRoot() / ReadDiskStateRoot(requestedRoot)
RawDB-->>Node: return mptRoot or error
Node->>Snaps: generate/load/journal using mptRoot
Snaps-->>RawDB: WriteSnapshotRoot(mptRoot)
Node->>Pruner: Prune/Recover with translated head root (mptRoot)
Pruner->>Snaps: Snapshots(mptRoot)
Snaps-->>Pruner: return diff layers / disk root
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.11.4)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
core/state/snapshot/journal.go (1)
168-171: Consider adding a defensive check against zero-hash matches.The condition
err == nil && head == translatedcould theoretically produce a false positive if bothheadandtranslatedare zero hashes. While this is unlikely in practice (a valid snapshot'sRoot()shouldn't be zero, andReadDiskStateRootreturns an error for missing keys), a defensive check would prevent accepting a corrupted or incomplete state.🛡️ Optional defensive enhancement
- if translated, err := rawdb.ReadDiskStateRoot(diskdb, root); err == nil && head == translated { + if translated, err := rawdb.ReadDiskStateRoot(diskdb, root); err == nil && translated != (common.Hash{}) && head == translated {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/state/snapshot/journal.go` around lines 168 - 171, The current acceptance condition uses rawdb.ReadDiskStateRoot(diskdb, root) and compares translated == head, which can incorrectly match if both are zero hashes; modify the condition in the snapshot acceptance branch to also check that neither head nor translated is the zero/root-zero value (e.g., compare against common zero hash representation used in this codebase) before logging "Snapshot root is MPT translation..." and accepting; update the if that references rawdb.ReadDiskStateRoot, head, translated and root to include the non-zero check and keep existing recovery handling unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@core/state/snapshot/journal.go`:
- Around line 168-171: The current acceptance condition uses
rawdb.ReadDiskStateRoot(diskdb, root) and compares translated == head, which can
incorrectly match if both are zero hashes; modify the condition in the snapshot
acceptance branch to also check that neither head nor translated is the
zero/root-zero value (e.g., compare against common zero hash representation used
in this codebase) before logging "Snapshot root is MPT translation..." and
accepting; update the if that references rawdb.ReadDiskStateRoot, head,
translated and root to include the non-zero check and keep existing recovery
handling unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a9bd838d-dbba-462a-a78a-0cff403e9688
📒 Files selected for processing (6)
core/blockchain.gocore/state/pruner/pruner.gocore/state/snapshot/generate.gocore/state/snapshot/journal.gocore/state/snapshot/snapshot.gocore/state/snapshot/zk_mpt_compat_test.go
When data is copied from a running node or after an unclean shutdown, the snapshot journal is not written and in-memory diff layers are lost. snapshot.New(headBlock.Root()) fails because the on-disk snapshot root lags behind the chain head by up to 128 blocks. Retry with the persisted disk snapshot root so prune-state can proceed without requiring the operator to run the node first. The pruning target will be at most 128 blocks behind the chain head; the node will re-execute those blocks on the next restart. Co-authored-by: corey <corey.zhang@bitget.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
core/state/pruner/pruner.go (1)
419-434:⚠️ Potential issue | 🟠 MajorMissing ZK→MPT root translation in
RecoverPruning.The
Prune()method now translatesheadRootviarawdb.ReadDiskStateRootbefore callingSnapshots()(lines 279-283), butRecoverPruningat line 421 still usesheadBlock.Root()directly. Since snapshot layers are keyed bymptRoot, this lookup will fail for ZK-era blocks, causing the recovery to incorrectly report "non-existent target state".🐛 Proposed fix to add root translation
// still feasible to recover the pruning correctly. snaptree, err := snapshot.New(db, trie.NewDatabase(db), 256, headBlock.Root(), false, false, true) if err != nil { return err // The relevant snapshot(s) might not exist } stateBloom, err := NewStateBloomFromDisk(stateBloomPath) if err != nil { return err } log.Info("Loaded state bloom filter", "path", stateBloomPath) // Before start the pruning, delete the clean trie cache first. // It's necessary otherwise in the next restart we will hit the // deleted state root in the "clean cache" so that the incomplete // state is picked for usage. deleteCleanTrieCache(trieCachePath) // All the state roots of the middle layers should be forcibly pruned, // otherwise the dangling state will be left. + // The block header may carry a zkStateRoot while snapshot layers are + // keyed by mptStateRoot. Translate before lookup. + headRoot := headBlock.Root() + if mptRoot, err := rawdb.ReadDiskStateRoot(db, headRoot); err == nil { + headRoot = mptRoot + } var ( found bool - layers = snaptree.Snapshots(headBlock.Root(), 128, true) + layers = snaptree.Snapshots(headRoot, 128, true) middleRoots = make(map[common.Hash]struct{}) )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/state/pruner/pruner.go` around lines 419 - 434, RecoverPruning currently uses headBlock.Root() to look up snapshot layers but Snapshots() expects an MPT root (the same translation done in Prune via rawdb.ReadDiskStateRoot); update RecoverPruning to translate the head root before calling snaptree.Snapshots by calling rawdb.ReadDiskStateRoot (same logic used in Prune) and use that translated mptRoot in the Snapshots() call and comparisons (replace usages of headBlock.Root() in this block with the translated root), ensuring stateBloomRoot is compared against the correct MPT-rooted layers and middleRoots is populated with those MPT roots.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@core/state/pruner/pruner.go`:
- Around line 419-434: RecoverPruning currently uses headBlock.Root() to look up
snapshot layers but Snapshots() expects an MPT root (the same translation done
in Prune via rawdb.ReadDiskStateRoot); update RecoverPruning to translate the
head root before calling snaptree.Snapshots by calling rawdb.ReadDiskStateRoot
(same logic used in Prune) and use that translated mptRoot in the Snapshots()
call and comparisons (replace usages of headBlock.Root() in this block with the
translated root), ensuring stateBloomRoot is compared against the correct
MPT-rooted layers and middleRoots is populated with those MPT roots.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0681860a-02b8-47db-808e-04132b501231
📒 Files selected for processing (1)
core/state/pruner/pruner.go
|
Re: the observation about missing ZK→MPT root translation in Valid observation — this has now been addressed in the latest commit. Three issues were fixed in
|
After #304 merged, NewPruner can succeed via the disk-root fallback and create a bloom filter. If prune-state is then interrupted, RecoverPruning is triggered on the next run. Three issues needed fixing: 1. snapshot.New fallback: same as NewPruner — when the journal is missing, retry with ReadSnapshotRoot() so recovery can initialise the snapshot tree. 2. Snapshots() translation: headBlock.Root() may be a zkStateRoot; translate to mptStateRoot before the lookup so diff layers (keyed by mptRoot) can be found. 3. DiskRoot check: when the original prune used DiskRoot() as target (our fallback path), stateBloomRoot equals the disk layer root. Snapshots() with nodisk=true excludes the disk layer, so check snaptree.DiskRoot() explicitly to avoid a spurious "non-existent target state" error. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
panos-xyz
left a comment
There was a problem hiding this comment.
I reviewed the latest stack on this PR and found two issues that still look unresolved:
-
core/state/snapshot/journal.go/core/state/snapshot/snapshot.gofixed the load/rebuild side by keying snapshot layers with the translatedmptRoot, but runtime consumers still probe the snapshot tree with the headerzkRoot. The main gap isstate.New()incore/state/statedb.go: it still doessnaps.Snapshot(root)with the header root, andStateDB.Commit()only callssnaps.Update()when that lookup succeeds. In other words, the snapshot can now load/rebuild successfully, but once new ZK-era blocks are imported the snapshot may stop advancing because the runtime path still misses the translated layer.skipBlock()incore/blockchain.gohas the same lookup pattern, so it can also keep treating the snapshot as missing. I think the lookup side needs the same root normalization before callingTree.Snapshot()/Tree.Snapshots(), plus an end-to-end regression test that loads the translated snapshot and then imports another ZK-era block. -
core/state/pruner/pruner.goonly applies the ZK->MPT translation in the default target-selection path (root == 0). If the operator explicitly runsgeth snapshot prune-state <root>, the supplied root is still treated as a trie/snapshot root directly. For ZK-era blocks that explicit argument is usually the headerzkRoot, soReadTrieNode/snapshot.GenerateTriewill still fail or behave incorrectly. I think the user-specified root should also be normalized throughReadDiskStateRoot, and this needs a focused test forPrune(explicitZkRoot).
When the operator runs `geth snapshot prune-state <root>` with a zkStateRoot (Poseidon hash), the supplied root was previously used directly for trie/snapshot lookups, causing ReadTrieNode and snapshot layer lookups to fail silently for ZK-era blocks. Apply the same zkRoot→mptRoot translation (via ReadDiskStateRoot) to the user-supplied root before entering the pruning logic, consistent with the translation already applied in the auto-select path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thanks for the thorough review, @panos-xyz. Issue 2 (explicit root in Issue 1 (runtime This has been addressed in a separate PR: #306 (tracked by issue #305). It adds |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@core/state/pruner/pruner.go`:
- Around line 273-278: After calling rawdb.ReadDiskStateRoot in the
explicit-root branch, add a validation that the returned mptRoot is not the zero
value (common.Hash{}); if it is zero, do not overwrite root (preserve the
user-supplied explicit root), log a warning including the zkRoot and that the
translated disk mapping was empty/corrupted, and avoid falling through into
auto-selection; implement this check immediately after the successful
ReadDiskStateRoot call that sets mptRoot so you only replace root when mptRoot
!= (common.Hash{}).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 98cd609f-799c-4cda-86c3-a8443aafaf5a
📒 Files selected for processing (1)
core/state/pruner/pruner.go
Summary
Changes
snapshot/generate.go: Resolve zkRoot → mptRoot before snapshot generation so trie walk uses correct on-disk root
snapshot/journal.go: Accept snapshot whose disk root equals MPT translation of requested zkRoot, avoiding unnecessary rebuild
snapshot/snapshot.go: Use translated base.root as map key in Rebuild so subsequent lookups work
blockchain.go: Translate zkRoot → mptRoot in repair backward-walk, head rewind boundary search, and shutdown journal path
pruner.go:
Snapshots()lookupDiskRoot()when < 128 diff layers available (e.g. after Rebuild)Cap(root, 0)when root is already the disk layerextractGenesis()Test plan
prune-stateon node with ZK-era blocks — completes successfullygo test ./core/state/snapshot/...go test -run TestZkMpt ./core/state/snapshot/...🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
Tests