Skip to content

snapshot: resolve zkStateRoot to mptStateRoot for generation and loading#302

Closed
curryxbo wants to merge 9 commits intomainfrom
fix/snapshot-generate-disk-state-root
Closed

snapshot: resolve zkStateRoot to mptStateRoot for generation and loading#302
curryxbo wants to merge 9 commits intomainfrom
fix/snapshot-generate-disk-state-root

Conversation

@curryxbo
Copy link
Copy Markdown
Contributor

@curryxbo curryxbo commented Mar 18, 2026

Summary

  • generate.go: Resolve zkStateRoot → mptStateRoot via ReadDiskStateRoot at the entry of generateSnapshot, so WriteSnapshotRoot and trie.New both use the correct on-disk MPT root
  • journal.go: In loadSnapshot, accept ZK/MPT root mismatch when ReadDiskStateRoot(zkRoot) == snapshotRoot, avoiding unnecessary rebuilds
  • pruner.go: In the journal-missing fallback path, translate the persisted SnapshotRoot from zkStateRoot to mptStateRoot and persist the correction

Background

MPT nodes syncing ZK-era blocks store trie data under the locally-computed mptStateRoot (Keccak256), but the block header carries a zkStateRoot (Poseidon hash). The state system translates between these via ReadDiskStateRoot/WriteDiskStateRoot in OpenTrie and trie.Database.Commit.

However, the snapshot generator calls trie.New(root, triedb) directly — bypassing the OpenTrie translation layer. When root is a zkStateRoot, the trie lookup fails immediately with MissingNodeError. The generator goroutine then blocks forever on <-genAbort, and waitBuild() hangs the caller (pruner or blockchain startup).

This means:

  1. Snapshot flat data was never generated on MPT nodes running ZK-era blocks (accounts=0)
  2. The pruner hangs indefinitely when attempting to prune state
  3. After pruning (with fix), third-party nodes loading the snapshot would trigger an unnecessary rebuild

Test plan

  • Build passes: go build ./core/state/pruner/ ./core/state/snapshot/
  • Run geth snapshot prune-state on an MPT node with ZK-era blocks — verify snapshot generates from mptStateRoot (no "Trie missing" error)
  • Verify pruner completes successfully and produces a valid snapshot
  • Verify a fresh MPT node can load the pruned snapshot without triggering a rebuild (journal.go fix)
  • Verify no regression on post-Jade-fork blocks (where zkStateRoot == mptStateRoot, ReadDiskStateRoot returns error, root unchanged)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Fixed persisted vs on-disk snapshot root mismatches to prevent "missing trie node" errors, persist corrected roots, and accept validated snapshots without unnecessary rebuilds; recovery now logs discontinuities and proceeds.
    • Updated warning/help text for snapshot prune guidance.
  • Improvements

    • Startup recovery avoids unnecessary chain rewinds by matching on-disk state roots and applies smarter truncation rules when trimming ancients.
  • Tests

    • Added end-to-end tests covering root-translation, snapshot generate/load/rebuild, and lifecycle scenarios.

… and loading

MPT nodes syncing ZK-era blocks store trie data under the locally-computed
mptStateRoot (Keccak256), but the block header and snapshot system use the
zkStateRoot (Poseidon hash). The snapshot generator calls trie.New(root, triedb)
directly — bypassing the OpenTrie/ReadDiskStateRoot translation layer — so it
can never find MPT trie nodes when root is a zkStateRoot. This causes the
generator goroutine to block forever on genAbort, hanging the caller.

Fix three sites:

1. generate.go (generateSnapshot): resolve zkStateRoot → mptStateRoot via
   ReadDiskStateRoot before writing SnapshotRoot and starting the generator,
   so trie.New uses the correct on-disk root.

2. journal.go (loadSnapshot): when the on-disk snapshot root (mptStateRoot)
   differs from the requested root (zkStateRoot), check ReadDiskStateRoot
   to verify they correspond — accept the snapshot without a costly rebuild.

3. pruner.go (NewPruner): in the journal-missing fallback path, translate
   the persisted SnapshotRoot from zkStateRoot to mptStateRoot and update
   the DB so that subsequent snapshot.New calls load correctly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@curryxbo curryxbo requested a review from a team as a code owner March 18, 2026 14:13
@curryxbo curryxbo requested review from secmgt and removed request for a team March 18, 2026 14:13
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 18, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Resolve persisted zkStateRoots to on-disk MPT roots across pruner, snapshot generation/load/rebuild, and genesis extraction; persist corrected snapshot roots; accept snapshots when MPT translation matches in-memory head; adjust startup head-recovery/truncation to compare mapped disk roots.

Changes

Cohort / File(s) Summary
Pruner & genesis extraction
core/state/pruner/pruner.go
Read persisted snapshot root and translate zkStateRoot→mptStateRoot via rawdb.ReadDiskStateRoot; persist corrected root; resolve genesis root before creating secure trie; update warning text.
Snapshot generation & disk layer
core/state/snapshot/generate.go, core/state/snapshot/snapshot.go
Translate zkStateRoot→mptStateRoot before initializing disk layer; ensure generated/rebuilt snapshots and internal layers use the translated on-disk root as map keys.
Snapshot load & journal validation
core/state/snapshot/journal.go, core/state/snapshot/snapshot.go
When on-disk root ≠ requested root, accept snapshot if MPT translation matches in-memory head; otherwise error (unless recovery mode, then warn). Avoid unnecessary rebuilds by recognizing translated-root equivalence.
Snapshot compatibility tests
core/state/snapshot/zk_mpt_compat_test.go
New tests exercising zk↔mpt translation for generate, load, rebuild, layer map-key behavior, and recovery scenarios.
Blockchain head recovery & truncation
core/blockchain.go
Startup repair walks blocks mapping each block Root() via rawdb.ReadDiskStateRoot before comparing to snapshot disk root; updates full-block head markers without rewinding header/fast chain; refines ancient truncation logic considering fast/full positions.
Manifest
go.mod
Minor manifest edits.

Sequence Diagram(s)

sequenceDiagram
participant Pruner
participant Generator
participant RawDB
participant InMemHead as InMem
participant Blockchain

Pruner->>RawDB: Read persisted snapDiskRoot
RawDB-->>Pruner: snapDiskRoot (zk or mpt)
Pruner->>RawDB: ReadDiskStateRoot(snapDiskRoot)
alt mapping exists
RawDB-->>Pruner: mptStateRoot
Pruner->>RawDB: WriteSnapshotRoot(mptStateRoot)
end
Pruner->>Generator: init snapshot with translated root
Generator->>RawDB: open disk layer at mptStateRoot
Generator->>InMem: compare disk root ↔ in-memory head
alt translation matches head
InMem-->>Generator: accept snapshot (no rebuild)
else mismatch & recovery==true
Generator-->>InMem: warn and proceed (rebuild)
else mismatch & recovery==false
Generator-->>InMem: reject snapshot (error)
end
Blockchain->>RawDB: ReadDiskStateRoot(block.Root()) during recovery walk
RawDB-->>Blockchain: mapped mpt root
Blockchain->>Blockchain: compare mapped root ↔ snapshot disk root to find match
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related PRs

Suggested labels

bug

Suggested reviewers

  • panos-xyz
  • twcctop

Poem

🐇 I chased a root beneath the loam,
Mapped zk to mpt and brought it home,
I nudged the layers, fixed the map,
Watched snapshots hum and closed my lap,
Now state sleeps safe beneath my dome 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and specifically describes the main change: resolving zkStateRoot to mptStateRoot in snapshot generation and loading, which directly addresses the core problem in the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 92.86% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/snapshot-generate-disk-state-root
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d784990b-a845-4682-83d1-62b8a0d49c20

📥 Commits

Reviewing files that changed from the base of the PR and between b3c5552 and d15a37c.

📒 Files selected for processing (3)
  • core/state/pruner/pruner.go
  • core/state/snapshot/generate.go
  • core/state/snapshot/journal.go

Comment on lines +114 to +121
if mptRoot, err2 := rawdb.ReadDiskStateRoot(db, snapDiskRoot); err2 == nil {
log.Info("Pruner: resolved snapshot ZK root to MPT root",
"zkRoot", snapDiskRoot, "mptRoot", mptRoot)
snapDiskRoot = mptRoot
// Persist the corrected root so that subsequent snapshot.New
// calls (inside loadSnapshot) can locate the disk layer.
rawdb.WriteSnapshotRoot(db, snapDiskRoot)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== ReadDiskStateRoot implementation =="
fd 'accessors_state.go$' --exec sed -n '95,125p' {}

echo
echo "== NewPruner fallback segment =="
fd 'pruner.go$' --exec sed -n '100,135p' {}

echo
echo "== All ReadDiskStateRoot callsites =="
rg -n -C2 '\bReadDiskStateRoot\s*\(' --type=go

Repository: morph-l2/go-ethereum

Length of output: 15914


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Search for ReadTrieNode implementation =="
rg -n 'func\s+ReadTrieNode' --type=go

echo
echo "== Check snapshot.New signature and early validation =="
fd 'snapshot.go$' -path '*/state/snapshot/*' --exec grep -n "func New" {}

echo
echo "== Check if snapshot.New validates root =="
fd 'snapshot.go$' -path '*/state/snapshot/*' --exec sed -n '1,100p' {} | head -60

Repository: morph-l2/go-ethereum

Length of output: 454


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== ReadTrieNode implementation =="
sed -n '79,95p' core/rawdb/accessors_state.go

echo
echo "== ReadDiskStateRoot full implementation =="
sed -n '107,115p' core/rawdb/accessors_state.go

echo
echo "== Snapshot.New signature and validation =="
fd 'snapshot.go' -path '*/state/snapshot/*' | head -1 | xargs sed -n '1,150p' | grep -A 20 "^func New"

echo
echo "== Check if snapshot.New validates root early =="
fd 'snapshot.go' -path '*/state/snapshot/*' | head -1 | xargs sed -n '1,300p' | tail -200

Repository: morph-l2/go-ethereum

Length of output: 1379


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Find snapshot.New function =="
rg -n 'func New\(' core/state/snapshot/ --type=go -A 10

echo
echo "== Check pruner fallback and snapshot loading context =="
sed -n '100,145p' core/state/pruner/pruner.go

Repository: morph-l2/go-ethereum

Length of output: 3162


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Check if pruner code validates mptRoot before WriteSnapshotRoot =="
sed -n '110,130p' core/state/pruner/pruner.go

echo
echo "== Check other ReadDiskStateRoot callsites for validation patterns =="
rg -B2 -A5 'WriteSnapshotRoot|snapshot.New' core/state/snapshot/generate.go

echo
echo "== Check if mptRoot validation exists elsewhere in codebase =="
rg -n 'mptRoot.*Hash\{\}' --type=go

echo
echo "== Check snapshot journal handling =="
sed -n '160,180p' core/state/snapshot/journal.go

Repository: morph-l2/go-ethereum

Length of output: 2194


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Check snapshot.New full implementation for root validation =="
sed -n '186,250p' core/state/snapshot/snapshot.go

echo
echo "== Check if snapshot.New or its callees validate root existence =="
rg -n 'func.*loadDiskLayer|func.*load' core/state/snapshot/snapshot.go | head -5

echo
echo "== Check snapshot/generate.go more context around WriteSnapshotRoot =="
sed -n '155,175p' core/state/snapshot/generate.go

echo
echo "== Look for any tests that trigger this pruner path =="
fd 'pruner.*test' --type=f

Repository: morph-l2/go-ethereum

Length of output: 2019


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Find loadSnapshot implementation =="
rg -n 'func loadSnapshot' core/state/snapshot/ --type=go -A 20 | head -40

echo
echo "== Check if loadSnapshot validates root =="
rg -n 'func.*loadDiskLayer' core/state/snapshot/ --type=go -A 15 | head -40

echo
echo "== Summary: Check all places where mptRoot is used after ReadDiskStateRoot =="
rg -B3 -A8 'ReadDiskStateRoot.*err.*== nil' core/state/pruner/pruner.go

Repository: morph-l2/go-ethereum

Length of output: 1779


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Check WriteSnapshotRoot implementation =="
rg -n 'func WriteSnapshotRoot' core/rawdb/ --type=go -A 5

echo
echo "== Check if snapshot.New/loadSnapshot handles bad roots gracefully =="
sed -n '130,170p' core/state/snapshot/journal.go

echo
echo "== Examine what happens after snapshot.New fails in pruner =="
sed -n '125,145p' core/state/pruner/pruner.go

echo
echo "== Check if there's recovery logic for corrupted roots =="
rg -n 'WriteSnapshotRoot' core/state/snapshot/ --type=go -B 3 -A 3

Repository: morph-l2/go-ethereum

Length of output: 7832


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Full context of snapshot/generate.go root handling =="
sed -n '155,180p' core/state/snapshot/generate.go

echo
echo "== Verify if generate.go also has the same unvalidated persist issue =="
sed -n '162,175p' core/state/snapshot/generate.go

echo
echo "== Check if both pruner and generate paths have the same problem =="
echo "Pruner WriteSnapshotRoot location:"
grep -n 'WriteSnapshotRoot' core/state/pruner/pruner.go

echo
echo "Generate WriteSnapshotRoot location:"
grep -n 'WriteSnapshotRoot' core/state/snapshot/generate.go

Repository: morph-l2/go-ethereum

Length of output: 2068


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Confirm the exact line numbers in pruner.go =="
sed -n '107,125p' core/state/pruner/pruner.go | cat -n

echo
echo "== Confirm the exact pattern in generate.go also needs fixing =="
sed -n '157,180p' core/state/snapshot/generate.go | cat -n

Repository: morph-l2/go-ethereum

Length of output: 2543


Validate translated root before persisting to prevent corrupted snapshot state.

At lines 114–120, ReadDiskStateRoot returns the translated root without validating it. If the mapping is malformed or stale, line 120 persists an invalid snapshot root that causes all subsequent snapshot.New calls to fail on restart. The same vulnerability exists in core/state/snapshot/generate.go:162–173.

Add validation to confirm the translated root points to an actual trie node before persisting:

Suggested fix
if mptRoot, err2 := rawdb.ReadDiskStateRoot(db, snapDiskRoot); err2 == nil {
+	if mptRoot == (common.Hash{}) || len(rawdb.ReadTrieNode(db, mptRoot)) == 0 {
+		log.Warn("Pruner: translated MPT root is not usable, keeping persisted snapshot root",
+			"zkRoot", snapDiskRoot, "mptRoot", mptRoot)
+	} else {
		log.Info("Pruner: resolved snapshot ZK root to MPT root",
			"zkRoot", snapDiskRoot, "mptRoot", mptRoot)
		snapDiskRoot = mptRoot
		rawdb.WriteSnapshotRoot(db, snapDiskRoot)
+	}
}

corey and others added 2 commits March 18, 2026 22:39
generateSnapshot may translate zkStateRoot to mptStateRoot internally
(for MPT nodes syncing ZK-era blocks). Rebuild was still using the
original root argument as the layers map key, creating a mismatch:
key=zkRoot but diskLayer.root=mptRoot.

This caused diffToDisk's stale-layer cleanup to miss the disk layer's
children (children map is keyed by parent.Root()=mptRoot, but the
stale disk layer was keyed by zkRoot), leaving orphaned diffLayers
with a stale origin — triggering ErrSnapshotStale on subsequent reads.

Fix: capture the returned base and use base.root as the map key.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Test generateSnapshot root translation, loadSnapshot ZK/MPT mismatch
tolerance, and Rebuild map-key consistency to ensure prune-state
succeeds on MPT nodes syncing ZK-era blocks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@core/state/snapshot/zk_mpt_compat_test.go`:
- Around line 80-102: The test currently manually shuts down the snapshot
generator with stop := make(chan *generatorStats); snap.genAbort <- stop; <-stop
at the end, which is skipped on t.Fatal and leaks goroutines; instead register a
t.Cleanup handler immediately after creating snap (or after generateSnapshot)
that sends a stop channel into snap.genAbort and waits for the reply (i.e.
t.Cleanup(func(){ stop := make(chan *generatorStats); snap.genAbort <- stop;
<-stop })), and remove the manual shutdown code at the end of the test; apply
the same change to the other occurrences referenced (lines around the other
tests) so the generator is always stopped even on failures.
- Around line 40-56: The test helper buildMPTTrie currently ignores errors from
functions that do return them; update it to check and fail the test on errors
from trie.NewSecure, rlp.EncodeToBytes, accTrie.Commit and triedb.Commit (use
t.Fatalf or t.Fatal with the wrapped error) while leaving accTrie.Update
unchecked since it does not return an error. Specifically, after calling
trie.NewSecure(common.Hash{}, triedb) verify the returned error before
proceeding, after rlp.EncodeToBytes(acc) check its error and abort on failure,
check the error returned by accTrie.Commit(nil) and handle it, and check/handle
any error returned by triedb.Commit(mptRoot, false, nil).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a447f6da-6291-4d5d-9f6f-586476870a4b

📥 Commits

Reviewing files that changed from the base of the PR and between d0ec38b and fed227c.

📒 Files selected for processing (1)
  • core/state/snapshot/zk_mpt_compat_test.go

Comment on lines +40 to +56
func buildMPTTrie(t *testing.T, diskdb *memorydb.Database) (common.Hash, *trie.Database) {
t.Helper()
triedb := trie.NewDatabase(diskdb)
accTrie, _ := trie.NewSecure(common.Hash{}, triedb)
for _, key := range []string{"acc-1", "acc-2", "acc-3"} {
acc := &Account{
Balance: big.NewInt(1),
Root: emptyRoot.Bytes(),
KeccakCodeHash: emptyKeccakCode.Bytes(),
PoseidonCodeHash: emptyPoseidonCode.Bytes(),
}
val, _ := rlp.EncodeToBytes(acc)
accTrie.Update([]byte(key), val)
}
mptRoot, _, _ := accTrie.Commit(nil)
triedb.Commit(mptRoot, false, nil)
return mptRoot, triedb
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Verify trie/RLP APIs used here actually return errors that should be handled.
rg -nP --type go 'func\s+NewSecure\(' -C2
rg -nP --type go 'func\s+\(.*\)\s+Update\([^)]*\)\s+error' -C2
rg -nP --type go 'func\s+\(.*\)\s+Commit\([^)]*\)\s*\([^)]*error' -C2
rg -nP --type go 'func\s+EncodeToBytes\(' -C2

Repository: morph-l2/go-ethereum

Length of output: 4538


🏁 Script executed:

# Find SecureTrie.Update method definition
rg -nP --type go 'func\s+\(\w+\s+\*SecureTrie\)\s+Update' -C3

Repository: morph-l2/go-ethereum

Length of output: 485


🏁 Script executed:

# Find trie.Database.Commit method definition
rg -nP --type go 'func\s+\(\w+\s+\*Database\)\s+Commit' -C3

Repository: morph-l2/go-ethereum

Length of output: 665


🏁 Script executed:

# Read the specific lines of the test file to confirm context
sed -n '40,70p' core/state/snapshot/zk_mpt_compat_test.go

Repository: morph-l2/go-ethereum

Length of output: 1265


Fix error handling in test fixture builders.

The helpers currently discard returned errors from setup operations, which can hide initialization failures and produce misleading test failures.

The accTrie.Update method does not return an error (it logs internally), so error-checking logic should not be added there. The following error checks should be added:

🔧 Proposed fix
 func buildMPTTrie(t *testing.T, diskdb *memorydb.Database) (common.Hash, *trie.Database) {
 	t.Helper()
 	triedb := trie.NewDatabase(diskdb)
-	accTrie, _ := trie.NewSecure(common.Hash{}, triedb)
+	accTrie, err := trie.NewSecure(common.Hash{}, triedb)
+	if err != nil {
+		t.Fatalf("failed to create secure trie: %v", err)
+	}
 	for _, key := range []string{"acc-1", "acc-2", "acc-3"} {
 		acc := &Account{
 			Balance:          big.NewInt(1),
 			Root:             emptyRoot.Bytes(),
 			KeccakCodeHash:   emptyKeccakCode.Bytes(),
 			PoseidonCodeHash: emptyPoseidonCode.Bytes(),
 		}
-		val, _ := rlp.EncodeToBytes(acc)
+		val, err := rlp.EncodeToBytes(acc)
+		if err != nil {
+			t.Fatalf("failed to encode account: %v", err)
+		}
 		accTrie.Update([]byte(key), val)
 	}
-	mptRoot, _, _ := accTrie.Commit(nil)
-	triedb.Commit(mptRoot, false, nil)
+	mptRoot, _, err := accTrie.Commit(nil)
+	if err != nil {
+		t.Fatalf("failed to commit trie: %v", err)
+	}
+	if err := triedb.Commit(mptRoot, false, nil); err != nil {
+		t.Fatalf("failed to commit trie db: %v", err)
+	}
 	return mptRoot, triedb
 }

-func writeDoneGenerator(db ethdb.KeyValueWriter) {
-	blob, _ := rlp.EncodeToBytes(&journalGenerator{Done: true})
+func writeDoneGenerator(t *testing.T, db ethdb.KeyValueWriter) {
+	t.Helper()
+	blob, err := rlp.EncodeToBytes(&journalGenerator{Done: true})
+	if err != nil {
+		t.Fatalf("failed to encode generator journal: %v", err)
+	}
 	rawdb.WriteSnapshotGenerator(db, blob)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/state/snapshot/zk_mpt_compat_test.go` around lines 40 - 56, The test
helper buildMPTTrie currently ignores errors from functions that do return them;
update it to check and fail the test on errors from trie.NewSecure,
rlp.EncodeToBytes, accTrie.Commit and triedb.Commit (use t.Fatalf or t.Fatal
with the wrapped error) while leaving accTrie.Update unchecked since it does not
return an error. Specifically, after calling trie.NewSecure(common.Hash{},
triedb) verify the returned error before proceeding, after
rlp.EncodeToBytes(acc) check its error and abort on failure, check the error
returned by accTrie.Commit(nil) and handle it, and check/handle any error
returned by triedb.Commit(mptRoot, false, nil).

Comment on lines +80 to +102
snap := generateSnapshot(diskdb, triedb, 16, zkRoot)

// diskLayer.root must be mptRoot, not zkRoot.
if snap.root != mptRoot {
t.Fatalf("diskLayer.root: got %x, want mptRoot %x", snap.root, mptRoot)
}
// SnapshotRoot persisted to DB must be mptRoot.
if stored := rawdb.ReadSnapshotRoot(diskdb); stored != mptRoot {
t.Fatalf("SnapshotRoot in DB: got %x, want mptRoot %x", stored, mptRoot)
}
// Generation must complete successfully.
select {
case <-snap.genPending:
case <-time.After(3 * time.Second):
t.Fatal("snapshot generation timed out — trie walk likely used zkRoot instead of mptRoot")
}
// The generated snapshot data must reproduce mptRoot exactly.
checkSnapRoot(t, snap, mptRoot)

stop := make(chan *generatorStats)
snap.genAbort <- stop
<-stop
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use t.Cleanup for generator shutdown to prevent leaked goroutines on failing paths.

Manual shutdown at the end of each test is skipped if an earlier t.Fatal fires, leaving background generator goroutines alive and making the suite flaky.

🔧 Proposed fix
+func stopGeneratorOnCleanup(t *testing.T, dl *diskLayer) {
+	t.Helper()
+	t.Cleanup(func() {
+		stop := make(chan *generatorStats)
+		dl.genAbort <- stop
+		<-stop
+	})
+}

 func TestGenerateSnapshotTranslatesZkRoot(t *testing.T) {
 	...
 	snap := generateSnapshot(diskdb, triedb, 16, zkRoot)
+	stopGeneratorOnCleanup(t, snap)
 	...
-	stop := make(chan *generatorStats)
-	snap.genAbort <- stop
-	<-stop
 }

 func TestGenerateSnapshotNoTranslation(t *testing.T) {
 	...
 	snap := generateSnapshot(diskdb, triedb, 16, mptRoot)
+	stopGeneratorOnCleanup(t, snap)
 	...
-	stop := make(chan *generatorStats)
-	snap.genAbort <- stop
-	<-stop
 }

 func TestRebuildZkRootMapKeyConsistency(t *testing.T) {
 	...
 	snaps.Rebuild(zkRoot)
+	stopGeneratorOnCleanup(t, snaps.disklayer())
 	...
-	stop := make(chan *generatorStats)
-	snaps.disklayer().genAbort <- stop
-	<-stop
 }

 func TestRebuildWithoutTranslation(t *testing.T) {
 	...
 	snaps.Rebuild(mptRoot)
+	stopGeneratorOnCleanup(t, snaps.disklayer())
 	...
-	stop := make(chan *generatorStats)
-	snaps.disklayer().genAbort <- stop
-	<-stop
 }

Also applies to: 112-130, 201-244, 260-278

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/state/snapshot/zk_mpt_compat_test.go` around lines 80 - 102, The test
currently manually shuts down the snapshot generator with stop := make(chan
*generatorStats); snap.genAbort <- stop; <-stop at the end, which is skipped on
t.Fatal and leaks goroutines; instead register a t.Cleanup handler immediately
after creating snap (or after generateSnapshot) that sends a stop channel into
snap.genAbort and waits for the reply (i.e. t.Cleanup(func(){ stop := make(chan
*generatorStats); snap.genAbort <- stop; <-stop })), and remove the manual
shutdown code at the end of the test; apply the same change to the other
occurrences referenced (lines around the other tests) so the generator is always
stopped even on failures.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
core/blockchain.go (1)

318-347: Consider adding a log or sanity check when backward walk falls back to genesis.

If the loop cannot find any block whose on-disk root matches diskRoot and whose state is accessible, it falls back to genesis (lines 336-338). This could indicate a data corruption scenario where the snapshot's disk root doesn't correspond to any available state. Consider adding a warning log or assertion when the final newHead is genesis but diskRoot doesn't match genesis state, to aid debugging.

🔧 Suggested improvement
 			newHead = parent
 		}
-		log.Warn("Rewound full block to snapshot state",
-			"from", head.NumberU64(), "to", newHead.NumberU64(), "snaproot", diskRoot)
+		if newHead.NumberU64() == 0 && diskRoot != newHead.Root() {
+			log.Error("Backward walk fell back to genesis without matching snapshot root",
+				"snaproot", diskRoot, "genesisRoot", newHead.Root())
+		}
+		log.Warn("Rewound full block to snapshot state",
+			"from", head.NumberU64(), "to", newHead.NumberU64(), "snaproot", diskRoot)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/blockchain.go` around lines 318 - 347, The backward-walk in the block
rewind (loop using newHead, head, rawdb.ReadDiskStateRoot and state.New) can
fall back to bc.genesisBlock without verifying the snapshot diskRoot matches
genesis state; add a sanity check and warning when newHead == bc.genesisBlock
and the diskRoot was not matched earlier: attempt to resolve the genesis state's
on-disk root (using rawdb.ReadDiskStateRoot and state.New on
bc.genesisBlock.Root()), and if it still doesn't equal diskRoot or state.New
fails, emit a clear log.Warn/Error including diskRoot, genesis hash and any
error details to aid debugging and consider returning an error or metric
increment to highlight possible data corruption.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@core/blockchain.go`:
- Around line 318-347: The backward-walk in the block rewind (loop using
newHead, head, rawdb.ReadDiskStateRoot and state.New) can fall back to
bc.genesisBlock without verifying the snapshot diskRoot matches genesis state;
add a sanity check and warning when newHead == bc.genesisBlock and the diskRoot
was not matched earlier: attempt to resolve the genesis state's on-disk root
(using rawdb.ReadDiskStateRoot and state.New on bc.genesisBlock.Root()), and if
it still doesn't equal diskRoot or state.New fails, emit a clear log.Warn/Error
including diskRoot, genesis hash and any error details to aid debugging and
consider returning an error or metric increment to highlight possible data
corruption.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dcc2aa06-814a-4bce-bf16-0e14e9670b6c

📥 Commits

Reviewing files that changed from the base of the PR and between fed227c and 507d82f.

📒 Files selected for processing (2)
  • core/blockchain.go
  • core/state/pruner/pruner.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • core/state/pruner/pruner.go

…rd walk

ReadSnapshotRoot may return a zkStateRoot written by an older code path.
Translate it via ReadDiskStateRoot before comparing with block roots in the
post-prune repair walk, so the walk can find the correct block instead of
falling through to genesis.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@core/blockchain.go`:
- Around line 330-354: The loop rewinds newHead until it finds a block whose
translated root equals diskRoot but then persists newHead.NumberU64(), which can
be lower if that block lacked trie state; instead, capture and keep the matched
snapshot-layer height when you detect blockRoot == diskRoot (before the
state.New check) into a separate variable (e.g., matchedSnapNumber), continue
the existing rewind logic to set bc.currentBlock and head hash as now, and call
rawdb.WriteSnapshotRecoveryNumber(bc.db, matchedSnapNumber) so the snapshot
recovery number persisted reflects the actual snapshot layer rather than the
possibly lower rewound full-block head; reference symbols: diskRoot, newHead,
head, state.New, rawdb.WriteSnapshotRecoveryNumber, rawdb.WriteHeadBlockHash,
bc.currentBlock.Store.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7ff1f568-07cf-4ce9-8498-ac1841955178

📥 Commits

Reviewing files that changed from the base of the PR and between 507d82f and 18b9b3b.

📒 Files selected for processing (1)
  • core/blockchain.go

Comment thread core/blockchain.go Outdated
corey and others added 4 commits March 19, 2026 18:39
…rsisted root

The old fallback used ReadSnapshotRoot which returns the root from the
last restart's generateSnapshot call. If the node synced N more blocks
since that restart, the snapshot and prune target would be N blocks
behind the current head, causing a post-prune rollback of N blocks on
restart.

Always resolve the current headBlock root (translating zkStateRoot →
mptStateRoot via ReadDiskStateRoot) and use that as both the snapshot
generation and pruning target, so the node restarts at the head height.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When snapshot.New(headRoot) succeeds via the normal path (e.g. journal.go
accepts the mptRoot translation), the fallback block was skipped and
snapDiskRoot stayed zero. Prune() then tried to find 128 diff layers
which never exist for the pruner CLI, causing "snapshot not old enough"
error.

Always capture the disk root from the successfully loaded snapshot tree
so that Prune() uses the disk-layer root as the pruning target.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Revert the custom backward walk in NewBlockChain back to the standard
setHeadBeyondRoot call, which already handles ZK→MPT root translation
since commit 507d82f. The custom walk only rewound currentBlock
without touching currentFastBlock/currentHeader, creating an
inconsistent chain state. Also restore the original freezer truncation
conditions.

In the pruner, only fall back to the snapshot disk-layer root when
fewer than 128 diff layers are available, preserving the standard
HEAD-127 pruning target when sufficient layers exist.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… shutdown

BlockChain.Stop() calls snaps.Journal(currentBlock.Root()) to persist the
snapshot tree, but currentBlock.Root() is a zkStateRoot while the snapshot
disk layer is keyed by mptStateRoot. When no diff layers exist (e.g. right
after prune-state or fresh snapshot generation), Snapshot(zkRoot) returns
nil and the journal write silently fails. On next startup the journal is
missing (diffs=missing), which cascades into pruner failures.

Resolve zkRoot to mptRoot via ReadDiskStateRoot before calling Journal()
so the disk layer can always be found in the layers map.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@curryxbo curryxbo closed this Mar 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant