perf(l1): remove redundant ancestor deletes in healing trie writes#6384
perf(l1): remove redundant ancestor deletes in healing trie writes#6384ElFantasma wants to merge 1 commit into
Conversation
🤖 Kimi Code ReviewAutomated review by Kimi (Moonshot AI) |
🤖 Codex Code ReviewNo findings. The diff looks sound. The removed parent-prefix writes in state.rs and storage.rs appear to be redundant write amplification rather than required correctness machinery, because trie child presence checks are hash-validated via Residual risk is coverage, not the implementation itself: I couldn’t find targeted tests for interrupted/stale healing resume paths. A regression test around partial flushes and next-cycle resume for the logic around state.rs and storage.rs would make this safer. Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
Lines of code reportTotal lines added: Detailed view |
🤖 Claude Code ReviewReview:
|
Greptile SummaryThis PR removes an O(depth) redundant delete pattern from the snap-sync healing write paths in both The correctness argument is sound and verified against the implementation:
Changes are clean and well-motivated:
One gap to be aware of: Both mainnet test checklist items ( Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| crates/networking/p2p/sync/healing/state.rs | Removes O(depth) empty-vec ancestor insertions in the state trie write path, replacing the BTreeMap batch with a simple Vec. The change is safe: commit_node already adds all ancestors bottom-up (so they overwrite any stale data in the same atomic batch), get_node_checked validates hashes to detect stale nodes, and unreachable orphan nodes are harmless. |
| crates/networking/p2p/sync/healing/storage.rs | Removes the same O(depth) empty-vec ancestor-delete pattern for storage trie healing writes. Semantically identical to the state.rs change. One pre-existing concern (not introduced by this PR): the result of db_joinset.join_next().await at line 217 is discarded, so a panic in the previous write task is silently swallowed rather than propagated. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Download trie node] --> B{Has missing children?}
B -- Yes --> C[Add to healing_queue\npending_children_count = N]
B -- No --> D[commit_node - leaf]
D --> E[Push leaf to nodes_to_write]
E --> F{Parent pending_children_count == 0?}
F -- No --> G[Decrement counter, keep in healing_queue]
F -- Yes --> H[commit_node - parent\nPush parent to nodes_to_write]
H --> I{More ancestors?}
I -- Yes --> F
I -- No / root reached --> J[nodes_to_write has full bottom-up chain]
J --> K{Flush threshold\nor is_done or is_stale?}
K -- Yes --> L[OLD: BTreeMap inserts empty vecs for\nall ancestor paths O-depth deletes\nNEW: Vec of path/node pairs only]
L --> M[Atomic put_batch / write_storage_trie_nodes_batch]
M --> N[DB updated]
N --> O{stale ancestor in DB?}
O -- Yes but different hash --> P[get_node_checked returns None\n= re-download on next traversal]
O -- Same hash = correct node --> Q[Accept as valid, skip download]
P --> R[Fresh download overwrites stale node]
G --> C
Last reviewed commit: "perf(l1): remove red..."
Closing — the optimization is incorrectRemoving the ancestor deletes breaks correctness under pivot staleness. The deletes are not redundant — they're a critical invariant. The bugDuring healing, nodes are committed bottom-up: a child is added to The ancestor delete loop ( The scenarioAt T+3, healing downloads the root, checks children:
With ancestor deletes: writing B' at Why the Merkle argument was wrongThe Merkle property ("matching hash = correct subtree") holds for the node itself, but NOT for the DB state underneath it. Note for future optimizationA valid optimization would need to ensure that either:
Option 2 is the most promising — if A proper test for this scenario requires mocking the peer network to drive |
Motivation
Storage healing is the #1 bottleneck in snap sync, taking ~1h 10m (44% of total sync time) on mainnet. Profiling the write path revealed that for every node written, the code generated O(depth) delete operations for every ancestor path — deleting entries at the root, then
[0], then[0,1], etc. For a node at trie depth 10, this meant 11 DB operations (10 deletes + 1 write) instead of just 1 write.These ancestor deletes were intended to clear stale parent encodings from previous pivots, but they are redundant because:
Healing commits bottom-up:
commit_nodewalks from leaf to root viahealing_queue, adding every ancestor to the write batch. So stale ancestors get overwritten by the new value in the same batch.get_node_checkedvalidates hashes: When checking if a child exists, it verifiesnode.compute_hash() == expected_hash. A stale node from a previous pivot has a different hash, so it's detected as missing and re-downloaded. The subsequentputoverwrites it atomically.Orphan nodes are harmless: Nodes from old pivots at paths not visited by healing become unreachable from the current root. Trie traversal always follows child references from the root — orphans are never read.
Description
for i in 0..path.len()loop instorage.rsthat generated O(depth) empty-vec entries (whichwrite_storage_trie_nodes_batchinterprets as deletes)state.rs, plus the now-unusedBTreeMapimportPERFcomments that referenced the delete requirementThis reduces DB operations per healing node from ~O(depth) to O(1), which should meaningfully reduce the 1h 10m storage healing phase.
Checklist