pruner: use teeWriter and HEAD as prune target to support unclean shutdown#307
pruner: use teeWriter and HEAD as prune target to support unclean shutdown#307
Conversation
📝 WalkthroughWalkthroughRemoved snapshot fallback and disk-root branching from the pruner. Snapshot creation failures are now returned immediately; state bloom updates are routed via a new internal teeWriter that writes to DB and bloom filter; pruning always calls snaptree.Cap and selects HEAD root when none provided. Changes
Sequence Diagram(s)sequenceDiagram
participant Pruner
participant Snapshot as Snapshot Generator
participant tee as teeWriter
participant DB
participant Bloom as StateBloom
Pruner->>Snapshot: snapshot.New(db, headRoot, ...)
Snapshot->>Pruner: snaptree (snapshot tree)
Pruner->>snaptree: Cap(root, 0)
Pruner->>snaptree: GenerateTrie(teeWriter)
tee->>DB: write(nodeKey, nodeValue)
tee->>Bloom: update(nodeKey) / add(nodeValue)
DB-->>Pruner: write acks
Bloom-->>Pruner: updates ack
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@core/state/pruner/pruner.go`:
- Around line 191-193: The code unconditionally calls snaptree.Cap(root, 0)
assuming root is a diff layer, but a user-specified root (or stateBloomRoot from
findBloomFilter()/RecoverPruning()) can be the disk root which makes Cap return
a confusing disk-layer error; before calling snaptree.Cap(root, 0) check if root
== snaptree.DiskRoot() (or otherwise detect disk-layer) and return a clear
formatted error indicating the supplied root is a disk layer and invalid for
pruning; update the same defensive check where stateBloomRoot is used in
RecoverPruning() and ensure Snapshots() usage with nodisk remains unchanged for
auto-selected roots.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ab4dfae7-e90b-40c5-9ff8-664f87ae0e3c
📒 Files selected for processing (1)
core/state/pruner/pruner.go
| if err := snaptree.Cap(root, 0); err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if there were any guards against disk layer roots before PR `#300`
# Look for any DiskRoot() usage patterns in pruner history or related files
echo "=== Checking current DiskRoot() usage in pruner ==="
rg -n "DiskRoot" core/state/pruner/
echo ""
echo "=== Checking Cap() call patterns in codebase ==="
rg -n -B3 -A3 '\.Cap\(' --type go
echo ""
echo "=== Checking how 'root' parameter flows into prune() ==="
ast-grep --pattern 'prune($_, $ROOT, $_, $_, $_, $_, $_)'Repository: morph-l2/go-ethereum
Length of output: 19376
🏁 Script executed:
# Get context around the Cap() call at line 191
head -n 200 core/state/pruner/pruner.go | tail -n 50Repository: morph-l2/go-ethereum
Length of output: 1808
🏁 Script executed:
# Look at the prune() function signature and how root is used
sed -n '150,220p' core/state/pruner/pruner.goRepository: morph-l2/go-ethereum
Length of output: 2607
🏁 Script executed:
# Check the Cap() implementation in snapshot code
grep -n "func.*Cap" core/state/snapshot/snapshot.go | head -5Repository: morph-l2/go-ethereum
Length of output: 124
🏁 Script executed:
# Get the full Cap() implementation to see disk layer check
sed -n '370,390p' core/state/snapshot/snapshot.goRepository: morph-l2/go-ethereum
Length of output: 646
🏁 Script executed:
# Look at the function signature and callers of prune()
sed -n '120,160p' core/state/pruner/pruner.goRepository: morph-l2/go-ethereum
Length of output: 1369
🏁 Script executed:
# See how prune() is called at lines 335 and 399
sed -n '330,345p' core/state/pruner/pruner.go
sed -n '390,410p' core/state/pruner/pruner.goRepository: morph-l2/go-ethereum
Length of output: 1626
🏁 Script executed:
# Find the full prune() function signature
grep -n "^func prune" core/state/pruner/pruner.goRepository: morph-l2/go-ethereum
Length of output: 254
🏁 Script executed:
# Get the prune() function signature and early logic
sed -n '90,140p' core/state/pruner/pruner.goRepository: morph-l2/go-ethereum
Length of output: 1932
🏁 Script executed:
# Find the Prune() method that calls prune() at line 335
sed -n '280,340p' core/state/pruner/pruner.goRepository: morph-l2/go-ethereum
Length of output: 2319
🏁 Script executed:
# Check the Prune() method signature
grep -B5 "return prune(p.snaptree, root" core/state/pruner/pruner.go | head -20Repository: morph-l2/go-ethereum
Length of output: 380
🏁 Script executed:
# Look at RecoverPruning to understand stateBloomRoot source
sed -n '340,400p' core/state/pruner/pruner.goRepository: morph-l2/go-ethereum
Length of output: 2538
🏁 Script executed:
# Check if there's a DiskRoot() method and understand its usage
grep -rn "DiskRoot" core/state/snapshot/ | head -20Repository: morph-l2/go-ethereum
Length of output: 139
🏁 Script executed:
# Look at the git history to see if there was a guard removed
# First, check the current file for any comments about guards or PR `#300`
grep -n "guard\|PR `#300`\|DiskRoot" core/state/pruner/pruner.goRepository: morph-l2/go-ethereum
Length of output: 46
🏁 Script executed:
# Check snapshot test to confirm Cap() fails on disk layer
sed -n '320,330p' core/state/snapshot/snapshot_test.goRepository: morph-l2/go-ethereum
Length of output: 353
🏁 Script executed:
# Look for any validation before calling prune() to ensure root is not disk layer
sed -n '250,285p' core/state/pruner/pruner.goRepository: morph-l2/go-ethereum
Length of output: 1796
🏁 Script executed:
# Check the Snapshots() method to understand what it returns
grep -A20 "func (t \*Tree) Snapshots" core/state/snapshot/snapshot.goRepository: morph-l2/go-ethereum
Length of output: 449
🏁 Script executed:
# Check DiskRoot() implementation and how it relates to layers
sed -n '843,860p' core/state/snapshot/snapshot.goRepository: morph-l2/go-ethereum
Length of output: 165
🏁 Script executed:
# Look for how the snapshot tree distinguishes disk layer from diff layers
grep -n "diskRoot\|diskLayer" core/state/snapshot/snapshot.go | head -15Repository: morph-l2/go-ethereum
Length of output: 696
🏁 Script executed:
# Confirm the diskRoot() private method implementation
sed -n '817,835p' core/state/snapshot/snapshot.goRepository: morph-l2/go-ethereum
Length of output: 630
🏁 Script executed:
# Verify that root parameter in both Prune() and RecoverPruning uses nodisk=true
grep -n "Snapshots.*true" core/state/pruner/pruner.goRepository: morph-l2/go-ethereum
Length of output: 197
🏁 Script executed:
# Double check the exact snapshot call in Prune()
sed -n '263,264p' core/state/pruner/pruner.goRepository: morph-l2/go-ethereum
Length of output: 104
🏁 Script executed:
# Check if there's validation before Cap() call for user-specified roots
# Look for any validation that would prevent DiskRoot from being passed
sed -n '250,295p' core/state/pruner/pruner.goRepository: morph-l2/go-ethereum
Length of output: 1988
🏁 Script executed:
# Check findBloomFilter to understand stateBloomRoot
grep -A30 "func findBloomFilter" core/state/pruner/pruner.goRepository: morph-l2/go-ethereum
Length of output: 859
🏁 Script executed:
# Verify what snapshot types are returned from Snapshots with nodisk=true
sed -n '310,340p' core/state/snapshot/snapshot.goRepository: morph-l2/go-ethereum
Length of output: 947
🏁 Script executed:
# Check if there's any validation of root being a diff layer before Cap() call
sed -n '188,198p' core/state/pruner/pruner.goRepository: morph-l2/go-ethereum
Length of output: 608
🏁 Script executed:
# Look at the full context around user-specified root handling
sed -n '255,298p' core/state/pruner/pruner.goRepository: morph-l2/go-ethereum
Length of output: 1913
🏁 Script executed:
# Verify that GenerateTrie would fail if root is disk layer
grep -A10 "func GenerateTrie" core/state/snapshot/generate.go | head -15Repository: morph-l2/go-ethereum
Length of output: 46
User-specified roots cannot be validated against disk layer before Cap() call.
The unconditional snaptree.Cap(root, 0) call assumes root is a diff layer. Per the Cap() implementation (core/state/snapshot/snapshot.go:370-378), it returns an error "snapshot [%#x] is disk layer" if passed a disk layer root.
For auto-selected roots (when root == empty), this is safe: Snapshots() with nodisk=true returns only diff layers. However, when root is user-specified (lines 290+), only existence is validated—not that it's a non-disk layer. Similarly in RecoverPruning(), the stateBloomRoot from findBloomFilter() is validated to exist in layers but could theoretically be the current disk root.
While this would fail with a clear error message rather than silently corrupt data, adding a defensive check (e.g., if root == snaptree.DiskRoot() { return fmt.Errorf(...) }) before the Cap() call would provide earlier validation and clearer error messaging for invalid user input.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@core/state/pruner/pruner.go` around lines 191 - 193, The code unconditionally
calls snaptree.Cap(root, 0) assuming root is a diff layer, but a user-specified
root (or stateBloomRoot from findBloomFilter()/RecoverPruning()) can be the disk
root which makes Cap return a confusing disk-layer error; before calling
snaptree.Cap(root, 0) check if root == snaptree.DiskRoot() (or otherwise detect
disk-layer) and return a clear formatted error indicating the supplied root is a
disk layer and invalid for pruning; update the same defensive check where
stateBloomRoot is used in RecoverPruning() and ensure Snapshots() usage with
nodisk remains unchanged for auto-selected roots.
There was a problem hiding this comment.
Analyzed this and concluded the defensive check is unnecessary — all code paths are already safe:
-
Auto-selected root (
root == empty): defaults top.headHeader.Root, which is the chain HEAD. SinceNewPrunercreates the snapshot tree with the sameheadBlock.Root(), HEAD is always the topmost diff layer (if it were the disk layer, there'd be no diff layers and pruning would be meaningless). Safe. -
User-specified root: before reaching
Cap(root, 0), the code already callsGenerateTrie(snaptree, root, ...)at L291. If a user somehow passes the disk layer root,Cap()returns a clear error:"snapshot [%#x] is disk layer". No silent corruption or data loss. -
RecoverPruning:
stateBloomRootoriginates from a previousPrune()call (saved in the bloom filter filename). Additionally, L359-368 explicitly validatesstateBloomRootagainstSnapshots(nodisk=true), which only returns diff layers — a disk layer root would be filtered out and fail with"non-existent target state".
Adding a pre-check would only shift the error a few lines earlier with a slightly friendlier message, but provides no functional benefit. Leaving as-is.
There was a problem hiding this comment.
@curryxbo, your analysis is well-reasoned and correct on all three points. The existing error surfacing from Cap() is clear and non-corrupting, and the Snapshots(nodisk=true) guard in RecoverPruning already provides the necessary validation for the bloom-filter recovery path. The suggested pre-check would have been purely cosmetic — acknowledged, and happy to leave it as-is.
(˶• ᴗ •˶)ᵒᵏᵎᵎ
Previously, pruning used HEAD-127 as the target and GenerateTrie only wrote trie node hashes to the bloom filter (not to the actual database). This caused two problems: 1. After an unclean shutdown, trie nodes between the last periodic commit and HEAD-127 were never persisted to disk. The bloom filter protected non-existent nodes, causing unpredictable rewinds (e.g. 969 blocks instead of the expected 127). 2. Even with a clean shutdown, the node always rewound 127 blocks after pruning — unnecessary on L2 where reorgs don't occur. This commit introduces a teeWriter that writes trie nodes to both the database and the bloom filter during GenerateTrie. Combined with using HEAD (instead of HEAD-127) as the default pruning target, the node now restarts at exactly the same height it stopped at, regardless of how it was shut down. Also fixes extractGenesis to resolve the genesis root via ReadDiskStateRoot before opening the trie, since genesis.Root() may return a zkTrie root (overridden via GenesisStateRoot for block hash compatibility) that cannot be opened as an MPT trie directly. Made-with: Cursor
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@core/state/pruner/pruner.go`:
- Around line 384-391: ReadDiskStateRoot may return a zero/empty common.Hash
with nil error; update the logic around genesisRoot to validate the returned
mptRoot is non-empty before assigning it. Specifically, in the block using
rawdb.ReadDiskStateRoot(db, genesisRoot) and trie.NewSecure, only set
genesisRoot = mptRoot when err == nil AND mptRoot is not the zero hash (e.g.
!mptRoot.IsZero() / mptRoot != common.Hash{}); otherwise keep the original
genesis.Root() (or return/log an error if you prefer) so you never pass an empty
root into trie.NewSecure.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ef19d753-6d1d-452d-83e4-f2d9bd9f8be2
📒 Files selected for processing (1)
core/state/pruner/pruner.go
| // genesis.Root() may be a zkTrie root (overridden via GenesisStateRoot | ||
| // for block hash compatibility). Resolve it to the actual MPT disk root | ||
| // so trie.NewSecure can open the trie. | ||
| genesisRoot := genesis.Root() | ||
| if mptRoot, err := rawdb.ReadDiskStateRoot(db, genesisRoot); err == nil { | ||
| genesisRoot = mptRoot | ||
| } | ||
| t, err := trie.NewSecure(genesisRoot, trie.NewDatabase(db)) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify ReadDiskStateRoot behavior for missing keys and check if there's additional validation
echo "=== Check ReadDiskStateRoot implementation ==="
rg -A 10 "func ReadDiskStateRoot" --type go
echo ""
echo "=== Check if diskStateRootKey is used elsewhere for validation ==="
rg "diskStateRootKey" --type go -C 3
echo ""
echo "=== Check how trie.NewSecure handles empty root ==="
rg -A 5 "func NewSecure" --type go | head -30Repository: morph-l2/go-ethereum
Length of output: 3857
🏁 Script executed:
#!/bin/bash
echo "=== Check trie.New (called by NewSecure) with empty root ==="
rg -A 15 "^func New\(root common.Hash" trie/trie.go
echo ""
echo "=== Check if genesis root mapping is always written before read ==="
rg -B 5 -A 5 "WriteDiskStateRoot.*genesis" core/
echo ""
echo "=== Check genesis initialization in pruner or state package ==="
rg "WriteDiskStateRoot" core/state/ core/genesis.go -B 2 -A 2
echo ""
echo "=== Check if there's any nil root handling in trie operations ==="
rg "common.Hash\{\}" trie/ -C 2 | head -50Repository: morph-l2/go-ethereum
Length of output: 6160
Missing validation for empty hash from ReadDiskStateRoot could silently create invalid genesis state.
The ReadDiskStateRoot function returns (common.Hash{}, nil) when the mapping key doesn't exist in the database (normal for fresh nodes or when genesis state root mapping was never written). The current condition if mptRoot, err := rawdb.ReadDiskStateRoot(...); err == nil evaluates to true for this case, causing genesisRoot to be set to an empty hash.
While trie.NewSecure succeeds with an empty root (returning an empty trie with no error), using an empty root for genesis state results in silent data corruption—the node will operate with an empty genesis state instead of the correct one. This is worse than an explicit error because it fails silently and inconsistently.
Proposed fix: validate non-empty hash
genesisRoot := genesis.Root()
- if mptRoot, err := rawdb.ReadDiskStateRoot(db, genesisRoot); err == nil {
+ if mptRoot, err := rawdb.ReadDiskStateRoot(db, genesisRoot); err == nil && mptRoot != (common.Hash{}) {
genesisRoot = mptRoot
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@core/state/pruner/pruner.go` around lines 384 - 391, ReadDiskStateRoot may
return a zero/empty common.Hash with nil error; update the logic around
genesisRoot to validate the returned mptRoot is non-empty before assigning it.
Specifically, in the block using rawdb.ReadDiskStateRoot(db, genesisRoot) and
trie.NewSecure, only set genesisRoot = mptRoot when err == nil AND mptRoot is
not the zero hash (e.g. !mptRoot.IsZero() / mptRoot != common.Hash{}); otherwise
keep the original genesis.Root() (or return/log an error if you prefer) so you
never pass an empty root into trie.NewSecure.
Problems
1. Genesis root validation failure during pruning
During
prune-state,extractGenesiscallstrie.NewSecurewith the genesis block's root. On Morph, the genesis root stored in the block header may be a zkTrie root (overridden viaGenesisStateRootfor block hash compatibility) rather than the actual MPT disk root, causingtrie.NewSecureto fail and aborting the entire pruning process.2. Block height regression after pruning
The pruner previously targeted HEAD-127 as the prune target, meaning after pruning the node would restart at a state 127 blocks behind the actual chain head. On L2 chains like Morph where reorgs are essentially non-existent, this safety margin is unnecessary and causes undesirable height rollback.
Solution
extractGenesis, resolve the genesis root viarawdb.ReadDiskStateRootbefore opening the trie, so the actual MPT disk root is used even when the header contains a zkTrie root.GenerateTrie, ensuring trie data is fully persisted to disk regardless of how the node was previously shut down.teeWriter, this guarantees zero height drop after pruning.