Skip to content

pruner: use teeWriter and HEAD as prune target to support unclean shutdown#307

Closed
curryxbo wants to merge 2 commits intomainfrom
revert/pr-300-pruner-journal-fallback
Closed

pruner: use teeWriter and HEAD as prune target to support unclean shutdown#307
curryxbo wants to merge 2 commits intomainfrom
revert/pr-300-pruner-journal-fallback

Conversation

@curryxbo
Copy link
Copy Markdown
Contributor

@curryxbo curryxbo commented Mar 31, 2026

Problems

1. Genesis root validation failure during pruning

During prune-state, extractGenesis calls trie.NewSecure with the genesis block's root. On Morph, the genesis root stored in the block header may be a zkTrie root (overridden via GenesisStateRoot for block hash compatibility) rather than the actual MPT disk root, causing trie.NewSecure to fail and aborting the entire pruning process.

2. Block height regression after pruning

The pruner previously targeted HEAD-127 as the prune target, meaning after pruning the node would restart at a state 127 blocks behind the actual chain head. On L2 chains like Morph where reorgs are essentially non-existent, this safety margin is unnecessary and causes undesirable height rollback.

Solution

  • Genesis root fix: In extractGenesis, resolve the genesis root via rawdb.ReadDiskStateRoot before opening the trie, so the actual MPT disk root is used even when the header contains a zkTrie root.
  • teeWriter: Writes trie nodes to both the real database and the bloom filter during GenerateTrie, ensuring trie data is fully persisted to disk regardless of how the node was previously shut down.
  • HEAD as prune target: Use HEAD directly instead of HEAD-127. Combined with teeWriter, this guarantees zero height drop after pruning.

@curryxbo curryxbo requested a review from a team as a code owner March 31, 2026 02:09
@curryxbo curryxbo requested review from secmgt and removed request for a team March 31, 2026 02:09
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 31, 2026

📝 Walkthrough

Walkthrough

Removed snapshot fallback and disk-root branching from the pruner. Snapshot creation failures are now returned immediately; state bloom updates are routed via a new internal teeWriter that writes to DB and bloom filter; pruning always calls snaptree.Cap and selects HEAD root when none provided.

Changes

Cohort / File(s) Summary
Pruner & snapshot logic
core/state/pruner/pruner.go
Added an internal teeWriter to write trie nodes to both the DB and state bloom filter. Removed snapDiskRoot field and all fallback logic that read a persisted snapshot root. NewPruner now returns snapshot creation errors immediately. Simplified Prune/prune to always use p.headHeader.Root (when unspecified) and to always call snaptree.Cap(root, 0). Removed middleRoots computation and forcible middle-layer pruning flow.
Trie/genesis opening
core/state/pruner/pruner.go (extractGenesis)
Resolve genesis.Root() via rawdb.ReadDiskStateRoot when available before calling trie.NewSecure, ensuring the trie opens against the actual MPT disk root for non-MPT/zkTrie-compatible genesis roots.

Sequence Diagram(s)

sequenceDiagram
  participant Pruner
  participant Snapshot as Snapshot Generator
  participant tee as teeWriter
  participant DB
  participant Bloom as StateBloom

  Pruner->>Snapshot: snapshot.New(db, headRoot, ...)
  Snapshot->>Pruner: snaptree (snapshot tree)
  Pruner->>snaptree: Cap(root, 0)
  Pruner->>snaptree: GenerateTrie(teeWriter)
  tee->>DB: write(nodeKey, nodeValue)
  tee->>Bloom: update(nodeKey) / add(nodeValue)
  DB-->>Pruner: write acks
  Bloom-->>Pruner: updates ack
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

bug

Suggested reviewers

  • twcctop
  • panos-xyz

Poem

🐰 I nibble at roots, now tidy and neat,

No hidden fallbacks, no twice-taken seat.
I tee up the bloom as I hop to the DB,
Cleaner paths forward — a pruner's agree! 🌱✂️

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title describes a specific implementation approach (using teeWriter and HEAD as prune target) to support unclean shutdown, but the PR objectives clarify this is a revert of PR #300 that removed the journal-missing fallback behavior—a different framing than the title suggests. Clarify whether the title should reflect the actual intent (revert PR #300) or if it accurately represents the new behavior being introduced. Consider using 'Revert: fall back to disk snapshot root when journal is missing' to match the PR's actual purpose.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch revert/pr-300-pruner-journal-fallback

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@core/state/pruner/pruner.go`:
- Around line 191-193: The code unconditionally calls snaptree.Cap(root, 0)
assuming root is a diff layer, but a user-specified root (or stateBloomRoot from
findBloomFilter()/RecoverPruning()) can be the disk root which makes Cap return
a confusing disk-layer error; before calling snaptree.Cap(root, 0) check if root
== snaptree.DiskRoot() (or otherwise detect disk-layer) and return a clear
formatted error indicating the supplied root is a disk layer and invalid for
pruning; update the same defensive check where stateBloomRoot is used in
RecoverPruning() and ensure Snapshots() usage with nodisk remains unchanged for
auto-selected roots.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ab4dfae7-e90b-40c5-9ff8-664f87ae0e3c

📥 Commits

Reviewing files that changed from the base of the PR and between b3c5552 and 7c3a0c9.

📒 Files selected for processing (1)
  • core/state/pruner/pruner.go

Comment on lines +191 to 193
if err := snaptree.Cap(root, 0); err != nil {
return err
}
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if there were any guards against disk layer roots before PR `#300`
# Look for any DiskRoot() usage patterns in pruner history or related files

echo "=== Checking current DiskRoot() usage in pruner ==="
rg -n "DiskRoot" core/state/pruner/

echo ""
echo "=== Checking Cap() call patterns in codebase ==="
rg -n -B3 -A3 '\.Cap\(' --type go

echo ""
echo "=== Checking how 'root' parameter flows into prune() ==="
ast-grep --pattern 'prune($_, $ROOT, $_, $_, $_, $_, $_)'

Repository: morph-l2/go-ethereum

Length of output: 19376


🏁 Script executed:

# Get context around the Cap() call at line 191
head -n 200 core/state/pruner/pruner.go | tail -n 50

Repository: morph-l2/go-ethereum

Length of output: 1808


🏁 Script executed:

# Look at the prune() function signature and how root is used
sed -n '150,220p' core/state/pruner/pruner.go

Repository: morph-l2/go-ethereum

Length of output: 2607


🏁 Script executed:

# Check the Cap() implementation in snapshot code
grep -n "func.*Cap" core/state/snapshot/snapshot.go | head -5

Repository: morph-l2/go-ethereum

Length of output: 124


🏁 Script executed:

# Get the full Cap() implementation to see disk layer check
sed -n '370,390p' core/state/snapshot/snapshot.go

Repository: morph-l2/go-ethereum

Length of output: 646


🏁 Script executed:

# Look at the function signature and callers of prune()
sed -n '120,160p' core/state/pruner/pruner.go

Repository: morph-l2/go-ethereum

Length of output: 1369


🏁 Script executed:

# See how prune() is called at lines 335 and 399
sed -n '330,345p' core/state/pruner/pruner.go
sed -n '390,410p' core/state/pruner/pruner.go

Repository: morph-l2/go-ethereum

Length of output: 1626


🏁 Script executed:

# Find the full prune() function signature
grep -n "^func prune" core/state/pruner/pruner.go

Repository: morph-l2/go-ethereum

Length of output: 254


🏁 Script executed:

# Get the prune() function signature and early logic
sed -n '90,140p' core/state/pruner/pruner.go

Repository: morph-l2/go-ethereum

Length of output: 1932


🏁 Script executed:

# Find the Prune() method that calls prune() at line 335
sed -n '280,340p' core/state/pruner/pruner.go

Repository: morph-l2/go-ethereum

Length of output: 2319


🏁 Script executed:

# Check the Prune() method signature
grep -B5 "return prune(p.snaptree, root" core/state/pruner/pruner.go | head -20

Repository: morph-l2/go-ethereum

Length of output: 380


🏁 Script executed:

# Look at RecoverPruning to understand stateBloomRoot source
sed -n '340,400p' core/state/pruner/pruner.go

Repository: morph-l2/go-ethereum

Length of output: 2538


🏁 Script executed:

# Check if there's a DiskRoot() method and understand its usage
grep -rn "DiskRoot" core/state/snapshot/ | head -20

Repository: morph-l2/go-ethereum

Length of output: 139


🏁 Script executed:

# Look at the git history to see if there was a guard removed
# First, check the current file for any comments about guards or PR `#300`
grep -n "guard\|PR `#300`\|DiskRoot" core/state/pruner/pruner.go

Repository: morph-l2/go-ethereum

Length of output: 46


🏁 Script executed:

# Check snapshot test to confirm Cap() fails on disk layer
sed -n '320,330p' core/state/snapshot/snapshot_test.go

Repository: morph-l2/go-ethereum

Length of output: 353


🏁 Script executed:

# Look for any validation before calling prune() to ensure root is not disk layer
sed -n '250,285p' core/state/pruner/pruner.go

Repository: morph-l2/go-ethereum

Length of output: 1796


🏁 Script executed:

# Check the Snapshots() method to understand what it returns
grep -A20 "func (t \*Tree) Snapshots" core/state/snapshot/snapshot.go

Repository: morph-l2/go-ethereum

Length of output: 449


🏁 Script executed:

# Check DiskRoot() implementation and how it relates to layers
sed -n '843,860p' core/state/snapshot/snapshot.go

Repository: morph-l2/go-ethereum

Length of output: 165


🏁 Script executed:

# Look for how the snapshot tree distinguishes disk layer from diff layers
grep -n "diskRoot\|diskLayer" core/state/snapshot/snapshot.go | head -15

Repository: morph-l2/go-ethereum

Length of output: 696


🏁 Script executed:

# Confirm the diskRoot() private method implementation
sed -n '817,835p' core/state/snapshot/snapshot.go

Repository: morph-l2/go-ethereum

Length of output: 630


🏁 Script executed:

# Verify that root parameter in both Prune() and RecoverPruning uses nodisk=true
grep -n "Snapshots.*true" core/state/pruner/pruner.go

Repository: morph-l2/go-ethereum

Length of output: 197


🏁 Script executed:

# Double check the exact snapshot call in Prune()
sed -n '263,264p' core/state/pruner/pruner.go

Repository: morph-l2/go-ethereum

Length of output: 104


🏁 Script executed:

# Check if there's validation before Cap() call for user-specified roots
# Look for any validation that would prevent DiskRoot from being passed
sed -n '250,295p' core/state/pruner/pruner.go

Repository: morph-l2/go-ethereum

Length of output: 1988


🏁 Script executed:

# Check findBloomFilter to understand stateBloomRoot
grep -A30 "func findBloomFilter" core/state/pruner/pruner.go

Repository: morph-l2/go-ethereum

Length of output: 859


🏁 Script executed:

# Verify what snapshot types are returned from Snapshots with nodisk=true
sed -n '310,340p' core/state/snapshot/snapshot.go

Repository: morph-l2/go-ethereum

Length of output: 947


🏁 Script executed:

# Check if there's any validation of root being a diff layer before Cap() call
sed -n '188,198p' core/state/pruner/pruner.go

Repository: morph-l2/go-ethereum

Length of output: 608


🏁 Script executed:

# Look at the full context around user-specified root handling
sed -n '255,298p' core/state/pruner/pruner.go

Repository: morph-l2/go-ethereum

Length of output: 1913


🏁 Script executed:

# Verify that GenerateTrie would fail if root is disk layer
grep -A10 "func GenerateTrie" core/state/snapshot/generate.go | head -15

Repository: morph-l2/go-ethereum

Length of output: 46


User-specified roots cannot be validated against disk layer before Cap() call.

The unconditional snaptree.Cap(root, 0) call assumes root is a diff layer. Per the Cap() implementation (core/state/snapshot/snapshot.go:370-378), it returns an error "snapshot [%#x] is disk layer" if passed a disk layer root.

For auto-selected roots (when root == empty), this is safe: Snapshots() with nodisk=true returns only diff layers. However, when root is user-specified (lines 290+), only existence is validated—not that it's a non-disk layer. Similarly in RecoverPruning(), the stateBloomRoot from findBloomFilter() is validated to exist in layers but could theoretically be the current disk root.

While this would fail with a clear error message rather than silently corrupt data, adding a defensive check (e.g., if root == snaptree.DiskRoot() { return fmt.Errorf(...) }) before the Cap() call would provide earlier validation and clearer error messaging for invalid user input.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/state/pruner/pruner.go` around lines 191 - 193, The code unconditionally
calls snaptree.Cap(root, 0) assuming root is a diff layer, but a user-specified
root (or stateBloomRoot from findBloomFilter()/RecoverPruning()) can be the disk
root which makes Cap return a confusing disk-layer error; before calling
snaptree.Cap(root, 0) check if root == snaptree.DiskRoot() (or otherwise detect
disk-layer) and return a clear formatted error indicating the supplied root is a
disk layer and invalid for pruning; update the same defensive check where
stateBloomRoot is used in RecoverPruning() and ensure Snapshots() usage with
nodisk remains unchanged for auto-selected roots.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Analyzed this and concluded the defensive check is unnecessary — all code paths are already safe:

  1. Auto-selected root (root == empty): defaults to p.headHeader.Root, which is the chain HEAD. Since NewPruner creates the snapshot tree with the same headBlock.Root(), HEAD is always the topmost diff layer (if it were the disk layer, there'd be no diff layers and pruning would be meaningless). Safe.

  2. User-specified root: before reaching Cap(root, 0), the code already calls GenerateTrie(snaptree, root, ...) at L291. If a user somehow passes the disk layer root, Cap() returns a clear error: "snapshot [%#x] is disk layer". No silent corruption or data loss.

  3. RecoverPruning: stateBloomRoot originates from a previous Prune() call (saved in the bloom filter filename). Additionally, L359-368 explicitly validates stateBloomRoot against Snapshots(nodisk=true), which only returns diff layers — a disk layer root would be filtered out and fail with "non-existent target state".

Adding a pre-check would only shift the error a few lines earlier with a slightly friendlier message, but provides no functional benefit. Leaving as-is.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@curryxbo, your analysis is well-reasoned and correct on all three points. The existing error surfacing from Cap() is clear and non-corrupting, and the Snapshots(nodisk=true) guard in RecoverPruning already provides the necessary validation for the bloom-filter recovery path. The suggested pre-check would have been purely cosmetic — acknowledged, and happy to leave it as-is.

(˶• ᴗ •˶)ᵒᵏᵎᵎ

Previously, pruning used HEAD-127 as the target and GenerateTrie only
wrote trie node hashes to the bloom filter (not to the actual database).
This caused two problems:

1. After an unclean shutdown, trie nodes between the last periodic
   commit and HEAD-127 were never persisted to disk. The bloom filter
   protected non-existent nodes, causing unpredictable rewinds (e.g.
   969 blocks instead of the expected 127).

2. Even with a clean shutdown, the node always rewound 127 blocks
   after pruning — unnecessary on L2 where reorgs don't occur.

This commit introduces a teeWriter that writes trie nodes to both the
database and the bloom filter during GenerateTrie. Combined with using
HEAD (instead of HEAD-127) as the default pruning target, the node now
restarts at exactly the same height it stopped at, regardless of how
it was shut down.

Also fixes extractGenesis to resolve the genesis root via
ReadDiskStateRoot before opening the trie, since genesis.Root() may
return a zkTrie root (overridden via GenesisStateRoot for block hash
compatibility) that cannot be opened as an MPT trie directly.

Made-with: Cursor
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@core/state/pruner/pruner.go`:
- Around line 384-391: ReadDiskStateRoot may return a zero/empty common.Hash
with nil error; update the logic around genesisRoot to validate the returned
mptRoot is non-empty before assigning it. Specifically, in the block using
rawdb.ReadDiskStateRoot(db, genesisRoot) and trie.NewSecure, only set
genesisRoot = mptRoot when err == nil AND mptRoot is not the zero hash (e.g.
!mptRoot.IsZero() / mptRoot != common.Hash{}); otherwise keep the original
genesis.Root() (or return/log an error if you prefer) so you never pass an empty
root into trie.NewSecure.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ef19d753-6d1d-452d-83e4-f2d9bd9f8be2

📥 Commits

Reviewing files that changed from the base of the PR and between 7c3a0c9 and f3988e3.

📒 Files selected for processing (1)
  • core/state/pruner/pruner.go

Comment on lines +384 to +391
// genesis.Root() may be a zkTrie root (overridden via GenesisStateRoot
// for block hash compatibility). Resolve it to the actual MPT disk root
// so trie.NewSecure can open the trie.
genesisRoot := genesis.Root()
if mptRoot, err := rawdb.ReadDiskStateRoot(db, genesisRoot); err == nil {
genesisRoot = mptRoot
}
t, err := trie.NewSecure(genesisRoot, trie.NewDatabase(db))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify ReadDiskStateRoot behavior for missing keys and check if there's additional validation

echo "=== Check ReadDiskStateRoot implementation ==="
rg -A 10 "func ReadDiskStateRoot" --type go

echo ""
echo "=== Check if diskStateRootKey is used elsewhere for validation ==="
rg "diskStateRootKey" --type go -C 3

echo ""
echo "=== Check how trie.NewSecure handles empty root ==="
rg -A 5 "func NewSecure" --type go | head -30

Repository: morph-l2/go-ethereum

Length of output: 3857


🏁 Script executed:

#!/bin/bash

echo "=== Check trie.New (called by NewSecure) with empty root ==="
rg -A 15 "^func New\(root common.Hash" trie/trie.go

echo ""
echo "=== Check if genesis root mapping is always written before read ==="
rg -B 5 -A 5 "WriteDiskStateRoot.*genesis" core/

echo ""
echo "=== Check genesis initialization in pruner or state package ==="
rg "WriteDiskStateRoot" core/state/ core/genesis.go -B 2 -A 2

echo ""
echo "=== Check if there's any nil root handling in trie operations ==="
rg "common.Hash\{\}" trie/ -C 2 | head -50

Repository: morph-l2/go-ethereum

Length of output: 6160


Missing validation for empty hash from ReadDiskStateRoot could silently create invalid genesis state.

The ReadDiskStateRoot function returns (common.Hash{}, nil) when the mapping key doesn't exist in the database (normal for fresh nodes or when genesis state root mapping was never written). The current condition if mptRoot, err := rawdb.ReadDiskStateRoot(...); err == nil evaluates to true for this case, causing genesisRoot to be set to an empty hash.

While trie.NewSecure succeeds with an empty root (returning an empty trie with no error), using an empty root for genesis state results in silent data corruption—the node will operate with an empty genesis state instead of the correct one. This is worse than an explicit error because it fails silently and inconsistently.

Proposed fix: validate non-empty hash
 	genesisRoot := genesis.Root()
-	if mptRoot, err := rawdb.ReadDiskStateRoot(db, genesisRoot); err == nil {
+	if mptRoot, err := rawdb.ReadDiskStateRoot(db, genesisRoot); err == nil && mptRoot != (common.Hash{}) {
 		genesisRoot = mptRoot
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/state/pruner/pruner.go` around lines 384 - 391, ReadDiskStateRoot may
return a zero/empty common.Hash with nil error; update the logic around
genesisRoot to validate the returned mptRoot is non-empty before assigning it.
Specifically, in the block using rawdb.ReadDiskStateRoot(db, genesisRoot) and
trie.NewSecure, only set genesisRoot = mptRoot when err == nil AND mptRoot is
not the zero hash (e.g. !mptRoot.IsZero() / mptRoot != common.Hash{}); otherwise
keep the original genesis.Root() (or return/log an error if you prefer) so you
never pass an empty root into trie.NewSecure.

@curryxbo curryxbo changed the title Revert "pruner: fall back to disk snapshot root when journal is missing (#300)" pruner: use teeWriter and HEAD as prune target to support unclean shutdown Apr 7, 2026
@curryxbo curryxbo closed this Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant