Skip to content

perf(l1): refactor storage download to use StorageTrieTracker#6171

Open
fedacking wants to merge 21 commits into
mainfrom
perf/refactor-storage-download-snap
Open

perf(l1): refactor storage download to use StorageTrieTracker#6171
fedacking wants to merge 21 commits into
mainfrom
perf/refactor-storage-download-snap

Conversation

@fedacking
Copy link
Copy Markdown
Contributor

@fedacking fedacking commented Feb 10, 2026

Summary

  • Replace AccountStorageRoots with StorageTrieTracker throughout snap sync, eliminating index-based referencing and the accounts_by_root_hash intermediate structure
  • Introduce StorageTask / StorageTaskResult enums that move trie data into tasks and back in results, removing clones and simplifying the download loop
  • Use structured concurrency (JoinSet + try_join_next) instead of channels for worker communication in request_storage_ranges
  • Extract BigTrie::compute_intervals helper from the inline chunking logic
  • Fix peers with the lowest score (-50) being permanently unable to receive any requests. The previous strict inequality (<) in can_try_more_requests meant a score ratio of 0.0 resulted in requests < 0.0, which is always false — effectively blacklisting the peer. Changing to <= ensures every connected peer can always handle at least 1 concurrent request.

The full plan for this PR is documented in #6170

Test plan

  • cargo check -p ethrex-p2p compiles cleanly (default + rocksdb features)
  • cargo clippy -p ethrex-p2p passes with no warnings (default + rocksdb features)
  • cargo test -p ethrex-p2p — all 38 tests pass
  • Full snap sync test against a testnet peer (manual verification)

@fedacking fedacking requested a review from a team as a code owner February 10, 2026 19:47
@github-actions github-actions Bot added L1 Ethereum client performance Block execution throughput and performance in general labels Feb 10, 2026
@github-actions
Copy link
Copy Markdown

🤖 Kimi Code Review

Review Summary

This PR refactors the snap sync storage download logic by introducing a new StorageTrieTracker structure to replace the previous AccountStorageRoots. The changes improve code organization and separate concerns between small and big storage tries.

Issues Found

1. Potential Panic in handle_small_batch (Line 764-765)

let (big_root, big_trie) = tries
    .pop()
    .expect("tries should not be empty after split_off");
  • Issue: The expect could panic if tries is empty after split_off. While the comment suggests this shouldn't happen, there's no guarantee.
  • Fix: Use proper error handling instead of expect.

2. Race Condition in StorageTrieTracker::handle_healed_account (Lines 245-290)

  • Issue: When handling healed accounts, the function performs multiple operations on big_tries and small_tries without proper synchronization. While the current usage appears single-threaded, this could become problematic if the sync process becomes multi-threaded in the future.
  • Fix: Consider adding synchronization primitives or documenting thread-safety assumptions.

3. Inefficient Interval Calculation (Lines 350-387)

let storage_density = start_hash_u256 / slot_count;
let chunk_size = storage_density
    .checked_mul(U256::from(slots_per_chunk))
    .unwrap_or(U256::MAX);
  • Issue: The interval calculation uses floating-point-like division with U256, which could lead to very small chunk sizes when slot_count is large, potentially creating excessive intervals.
  • Fix: Add bounds checking and consider a minimum chunk size.

4. Missing Validation in StorageTrieTracker::return_small_tries (Lines 215-229)

  • Issue: When returning failed tries, the function blindly extends accounts without checking for duplicates or validating the trie structure.
  • Fix: Consider deduplication or validation of the returned tries.

5. Incomplete Error Handling in request_storage_ranges_worker (Lines 1098-1357)

  • Issue: The worker functions return StorageTaskResult variants but don't properly propagate all error conditions, particularly network timeouts or invalid responses.
  • Fix: Ensure all error paths are properly handled and logged.

Positive Aspects

  1. Better Separation of Concerns: The new StorageTrieTracker structure clearly separates small and big trie handling.
  2. Improved Type Safety: Using enums (StorageTask, StorageTaskResult) instead of tuples improves code clarity.
  3. Reduced Complexity: The main request_storage_ranges function is significantly simplified.
  4. Better Metrics: The new structure allows for more accurate progress tracking.

Minor Suggestions

  1. Documentation: Add more documentation to the new StorageTrieTracker methods, especially around the healing process.
  2. Constants: Consider making the 10_000 slots per chunk in BigTrie::compute_intervals a named constant.
  3. Testing: The new structure would benefit from unit tests for the interval calculation and account management logic.

Overall, this is a solid refactoring that improves the codebase structure, though a few edge cases need attention.


Automated review by Kimi (Moonshot AI) · custom prompt

@github-actions
Copy link
Copy Markdown

🤖 Claude Code Review

Based on my review of PR #6171, here's my code review:

Code Review - PR #6171: Refactor storage download to use StorageTrieTracker

Overall Assessment

This is a substantial refactoring that improves code organization and eliminates clones. The changes move from index-based referencing to a more structured approach using enums and dedicated tracker types. The refactoring is well-conceived, but there are several issues that need attention.

Critical Issues

1. Potential Panic in handle_healed_account (sync.rs:298, 304)

let big = self
    .big_tries
    .get_mut(&old_root)
    .expect("big_tries should contain old_root");

The code uses .expect() after checking contains_key, but there's a TOCTOU (Time-of-check to Time-of-use) issue if this code becomes concurrent. More importantly, the logic assumes in_big guarantees the key exists, but if the check at line 291 returns true and then the entry is somehow removed before line 295, this will panic.

Recommendation: Replace the two .expect() calls with proper error handling or use if let Some(big) = ... pattern.

2. Division by Zero Risk in compute_intervals (sync.rs:372)

let storage_density = start_hash_u256 / slot_count;

While slot_count is set to .max(1) on line 371, if start_hash_u256 is zero and slot_count is 1, you get storage_density = 0, which could lead to issues. More critically, the division start_hash_u256 / slot_count when start_hash_u256 < slot_count will result in zero, making chunk_size = 0 and potentially causing infinite loops or incorrect interval calculations.

Recommendation: Add validation that start_hash_u256 > 0 or handle the zero case explicitly. Consider using checked_div and proper error handling.

3. Integer Overflow in Interval Calculation (sync.rs:380)

let interval_start_u256 = start_hash_u256 + chunk_size * i;

The multiplication chunk_size * i could overflow before the addition. While U256 has a large range, this should use checked_mul and checked_add for safety in blockchain code.

Recommendation: Use checked arithmetic:

let offset = chunk_size.checked_mul(U256::from(i)).unwrap_or(U256::MAX);
let interval_start_u256 = start_hash_u256.checked_add(offset).unwrap_or(U256::MAX);

4. Unused Return Value from return_small_tries Method (sync.rs:263)

The method return_small_tries is defined but never called in the diff. This suggests incomplete implementation.

Recommendation: Either remove if unused or document why it's provided for future use.

Major Issues

5. Cloning in Promoted Big Trie Logic (client.rs:529, 561)

let intervals = BigTrie::compute_intervals(last_hash, slot_count, 10_000);
// ...
let accounts = tracker
    .big_tries
    .get(&big_root)
    .map(|b| b.accounts.clone())  // Clone here
    .unwrap_or_default();
for interval in intervals {
    tasks_queue_not_started.push_back(StorageTask::BigInterval {
        root: big_root,
        accounts: accounts.clone(),  // Clone for every interval

The PR claims to eliminate clones, but there are still significant clones happening when promoting to big tries and creating BigInterval tasks. The accounts vector is cloned for every interval.

Recommendation: Consider using Arc<Vec<H256>> for accounts if they're shared across multiple tasks.

6. Inconsistent Error Handling in Workers (client.rs:747, 773)

The worker functions return StorageTaskResult enum variants instead of using Result types. Failures return "failed" variants that are processed as normal results. This makes it harder to distinguish between network errors, validation errors, and actual failures.

Recommendation: Consider using Result<StorageTaskResult, WorkerError> for clearer error semantics.

7. Missing Validation in handle_small_batch (client.rs:832)

if (slots.is_empty() && proof.is_empty()) || slots.is_empty() || slots.len() > tries.len() {
    return StorageTaskResult::SmallFailed { tries, peer_id };
}

The condition slots.is_empty() is checked twice (redundant). More importantly, there's no upper bound check on individual slot counts per trie.

Recommendation: Simplify the condition and add per-trie validation.

Minor Issues

8. Magic Number (client.rs:529)

let intervals = BigTrie::compute_intervals(last_hash, slot_count, 10_000);

The value 10_000 (slots_per_chunk) is hardcoded here but should be a constant like SLOTS_PER_CHUNK for consistency with other constants in the codebase.

9. Potential Empty Accounts Vector (client.rs:555-557)

let accounts = tracker
    .big_tries
    .get(&big_root)
    .map(|b| b.accounts.clone())
    .unwrap_or_default();

If this returns an empty vector and we proceed to create BigInterval tasks, those tasks will have empty account lists, which could cause issues downstream.

Recommendation: Add a check and handle the empty case explicitly.

10. Inconsistent Peer Failure Recording (client.rs:608-612)

if slots.is_empty() {
    peers.peer_table.record_failure(&peer_id).await?;
} else {
    peers.peer_table.record_success(&peer_id).await?;
}

This is inside BigIntervalResult, but empty slots could be legitimate (the interval was already downloaded). The logic might incorrectly penalize peers.

Recommendation: Review whether empty slots is truly a failure condition for big intervals.

11. Missing Bounds Check (client.rs:920)

tries[i].1.slots = slot_data;

While the loop bounds should prevent out-of-bounds access, there's no explicit validation that i < tries.len() before this assignment.

Recommendation: Add an assertion or bounds check for safety.

Positive Aspects

  1. Structured Concurrency: The move from channels to JoinSet is excellent for structured concurrency and proper task lifecycle management.

  2. Type Safety: The use of enums (StorageTask, StorageTaskResult) provides better type safety than the previous index-based approach.

  3. Separation of Concerns: Extracting handle_small_batch and handle_big_interval improves readability and testability.

  4. Clearer Intent: The StorageTrieTracker makes the relationship between accounts and storage roots more explicit.

Recommendations

  1. Fix the critical arithmetic and panic issues before merging
  2. Add comprehensive tests for edge cases (zero slots, overflow conditions, empty tries)
  3. Consider using Arc for shared account vectors to truly eliminate clones
  4. Document the state machine transitions (small → big trie promotion)
  5. Add assertions or validation at critical points to catch invariant violations early

Test Coverage

The PR mentions tests pass, but given the complexity of the state machine logic, I'd recommend adding specific tests for:

  • Empty tries and edge cases in promotion logic
  • Overflow conditions in compute_intervals
  • Concurrent modifications in handle_healed_account
  • Failed workers and retry logic

Automated review by Claude (Anthropic) · custom prompt

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 10, 2026

Lines of code report

Total lines added: 205
Total lines removed: 9
Total lines changed: 214

Detailed view
+------------------------------------------------+-------+------+
| File                                           | Lines | Diff |
+------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/snap/client.rs    | 1173  | -9   |
+------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/snap/constants.rs | 24    | +1   |
+------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/sync.rs           | 448   | +204 |
+------------------------------------------------+-------+------+

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Feb 10, 2026

Greptile Overview

Greptile Summary

This PR refactors the storage download subsystem to use a cleaner StorageTrieTracker architecture, replacing the complex index-based AccountStorageRoots approach.

Key improvements:

  • Introduced StorageTask/StorageTaskResult enums that move trie data into tasks and results, eliminating clones and the intermediate accounts_by_root_hash structure
  • Replaced channel-based worker communication with structured concurrency using JoinSet and try_join_next
  • Extracted BigTrie::compute_intervals as a standalone helper for better separation of concerns
  • Simplified healing integration with new handle_healed_account method that encapsulates root migration logic

Minor issue found:

  • Redundant slots.is_empty() check in client.rs:1166 (see inline comment)

The refactor maintains equivalent functionality while improving code clarity and reducing unnecessary data copies. Tests pass and the design aligns well with the documented plan in issue #6170.

Confidence Score: 4/5

  • This PR is safe to merge with only minor cleanup needed
  • The refactoring is well-structured and maintains functional equivalence with the previous implementation. All tests pass and the code follows good architectural patterns. Only one minor redundant condition was found, which doesn't affect correctness.
  • No files require special attention

Important Files Changed

Filename Overview
crates/networking/p2p/snap/client.rs Major refactor of storage download logic using new StorageTask/StorageTaskResult enums and JoinSet-based concurrency; replaced clone-heavy channel communication with structured data flow
crates/networking/p2p/sync.rs Introduced new StorageTrieTracker with SmallTrie/BigTrie structures to replace index-based AccountStorageRoots; added interval computation helper for big tries

Sequence Diagram

sequenceDiagram
    participant Main as request_storage_ranges
    participant Tracker as StorageTrieTracker
    participant Queue as Task Queue
    participant Worker as Worker Tasks
    participant Peer as Peer Network

    Main->>Tracker: take_small_batch(STORAGE_BATCH_SIZE)
    Tracker-->>Main: SmallBatch tasks
    Main->>Queue: Queue SmallBatch tasks

    loop For each BigTrie
        Main->>Tracker: Get intervals from big_tries
        Main->>Queue: Queue BigInterval tasks
    end

    loop Until all tasks complete
        Main->>Worker: Spawn worker with task
        Worker->>Peer: GetStorageRanges request
        Peer-->>Worker: StorageRanges response
        
        alt SmallBatch completed
            Worker-->>Main: SmallComplete
            Main->>Main: Write to disk buffer
            Main->>Queue: Re-queue remaining tries
        else SmallBatch failed
            Worker-->>Main: SmallFailed
            Main->>Queue: Re-queue all tries
        else Small promoted to big
            Worker-->>Main: SmallPromotedToBig
            Main->>Tracker: promote_to_big()
            Main->>Main: Compute intervals
            Main->>Queue: Queue BigInterval tasks
        else BigInterval result
            Worker-->>Main: BigIntervalResult
            Main->>Main: Append slots to buffer
            Main->>Queue: Re-queue remaining interval
        end
    end

    Main->>Main: Flush disk buffers
Loading

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment thread crates/networking/p2p/snap/client.rs Outdated
tx.send(empty_task_result).await.ok();
return Ok(());

if (slots.is_empty() && proof.is_empty()) || slots.is_empty() || slots.len() > tries.len() {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant condition: slots.is_empty() is checked twice

Suggested change
if (slots.is_empty() && proof.is_empty()) || slots.is_empty() || slots.len() > tries.len() {
if slots.is_empty() || proof.is_empty() || slots.len() > tries.len() {
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/networking/p2p/snap/client.rs
Line: 1166:1166

Comment:
redundant condition: `slots.is_empty()` is checked twice

```suggestion
    if slots.is_empty() || proof.is_empty() || slots.len() > tries.len() {
```

How can I resolve this? If you propose a fix, please make it concise.

@github-actions
Copy link
Copy Markdown

🤖 Codex Code Review

Findings

  • Critical – division by zero/panic in interval computation: BigTrie::compute_intervals can compute storage_density = 0 when last_downloaded_hash is small (e.g., 0 or < slot_count), which yields chunk_size = 0 and then divides by zero in missing_storage_range / chunk_size. This will panic and abort snap sync. Guard with chunk_size = max(1) or early-return a single [start..MAX] interval. crates/networking/p2p/sync.rs:369-376
  • High – stale storage data reused after root change: handle_healed_account re-keys/clones BigTrie slots/intervals when an account’s storage root changes. If old_root != new_root, those slots/intervals are no longer valid and can cause incorrect storage downloads/verification. New root should start empty (or be fully re-requested) rather than inheriting old data. crates/networking/p2p/sync.rs:293-316
  • High – loss of pending work on stale pivot: request_storage_ranges drains small_tries into local tasks via take_small_batch and never returns unfinished tasks to the tracker when the loop exits due to staleness. This loses remaining tries for subsequent attempts. Reinsert tasks_queue_not_started (and any unfinished/in-flight) into the tracker before returning on early break. crates/networking/p2p/snap/client.rs:545-566, crates/networking/p2p/snap/client.rs:829-832
  • Medium – busy spin when no tasks available: When tasks_queue_not_started is empty but completed_tasks < task_count, the loop continues without waiting, causing a tight spin. Add a small sleep or await worker_joinset.join_next() in that branch to avoid CPU burn and improve fairness. crates/networking/p2p/snap/client.rs:852-857

If you want, I can suggest concrete patches for the fixes above or add a small unit test for compute_intervals edge cases.


Automated review by OpenAI Codex · custom prompt

devin-ai-integration[bot]

This comment was marked as resolved.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 10, 2026

Benchmark Block Execution Results Comparison Against Main

Command Mean [s] Min [s] Max [s] Relative
base 66.275 ± 0.248 65.872 66.754 1.01 ± 0.01
head 65.335 ± 0.425 64.840 65.988 1.00

@fedacking
Copy link
Copy Markdown
Contributor Author

fedacking commented Feb 10, 2026

One thing I think would be very useful is adding monitoring to request_storage_ranges so we can observe what's happening during the storage download phase. Right now it's hard to tell the state of progress and the mix of work being done.

Concretely, I'd like to see periodic debug! logs (or at the very least before/after each request_storage_ranges call) that include:

  1. Small vs big tries count — tracker.small_tries.len() and tracker.big_tries.len() separately. We already log tracker.remaining_count() in snap_sync.rs but that collapses both into one
    number, which hides the distribution.
  2. Total intervals across big tries — the sum of intervals.len() for each entry in tracker.big_tries. This tells us how much sub-range work is still pending for the large tries.
  3. Number of small batch vs big interval requests sent — how many SmallBatch and BigInterval tasks were actually dispatched to workers during the call.

This would give us good visibility into whether storage download is making progress, whether tries are getting promoted from small to big, and how the interval-based download is evolving over successive attempts. The existing metrics infrastructure (METRICS) seems like the right place to wire these into.

Comment thread crates/networking/p2p/snap/client.rs Outdated
Comment thread crates/networking/p2p/snap/client.rs
Comment thread crates/networking/p2p/sync.rs
Comment thread crates/networking/p2p/sync.rs Outdated
Comment thread crates/networking/p2p/snap/client.rs Outdated
@github-project-automation github-project-automation Bot moved this to In Progress in ethrex_l1 Feb 13, 2026
promote_to_big was trying to get accounts from small_tries, but they
had already been taken out by take_small_batch. The big trie in the
tracker ended up with zero accounts, causing BigInterval tasks to
have empty account lists.
devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 18 additional findings in Devin Review.

Open in Devin Review

Comment on lines +559 to +565
current_account_storages.insert(
root,
AccountsWithStorage {
accounts: trie.accounts,
storages,
},
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 flush_completed_tries uses insert, overwriting previously accumulated big-trie storage data

flush_completed_tries at crates/networking/p2p/snap/client.rs:559 uses BTreeMap::insert to write completed small-trie data into current_account_storages. This overwrites any existing entry for the same storage root. Meanwhile, the BigIntervalResult handler at crates/networking/p2p/snap/client.rs:670-677 uses .entry().or_insert_with().storages.extend() to append slots incrementally.

Scenario where data is lost

Although the StorageTrieTracker keeps small and big tries in separate maps keyed by root, current_account_storages is a shared buffer that accumulates data from both code paths. If a big trie interval result writes slots for root X via extend, and then a later SmallComplete or SmallPromotedToBig result calls flush_completed_tries with a completed trie that happens to share root X (e.g. due to a race between healing adding a new small trie with the same root and an in-flight big interval completing), the insert call will silently discard all previously accumulated big-trie slots for that root.

Even if this race is unlikely today, using insert instead of entry().or_insert_with().extend() is inconsistent with the BigIntervalResult path and fragile against future changes. The fix is to use entry + extend (or at minimum or_insert) in flush_completed_tries to preserve any previously accumulated data.

Impact: Potential silent loss of downloaded storage slots for accounts sharing a storage root, requiring re-download or healing.

Suggested change
current_account_storages.insert(
root,
AccountsWithStorage {
accounts: trie.accounts,
storages,
},
);
current_account_storages
.entry(root)
.and_modify(|existing| {
existing.storages.extend(storages.iter().cloned());
})
.or_insert_with(|| AccountsWithStorage {
accounts: trie.accounts,
storages,
});
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@ElFantasma ElFantasma dismissed their stale review February 18, 2026 18:17

Re-reviewed: all comments addressed in updated commits.

Without a cap, the number of in-flight storage range workers was
bounded only by available peers × allowed requests per peer, which
could reach thousands and consume tens of GB of memory. Add a
MAX_STORAGE_RANGE_WORKERS constant (1000, ~2 MB each ≈ 2 GB) and
block-wait for a worker to finish before spawning new ones when at
capacity.
ElFantasma added a commit that referenced this pull request Apr 15, 2026
- Add §1.18 observability tooling (PR #6470)
- Add §1.19 pivot update reliability (PR #6475, issue #6474)
- Add §1.20 big-account within-trie parallelization (issue #6477)
- Add §1.21 small-account batching (issue #6476)
- Add §1.22 decoded TrieLayerCache (PR #6348)
- Add §1.23 bloom filter for non-existent storage (PR #6288)
- Add §1.24 adaptive request sizing + bisection (PR #6181)
- Add §1.25 concurrent bytecode + storage (PR #6205)
- Add §1.26 phase completion markers (PR #6189)
- Add §2.18 StorageTrieTracker refactor (PR #6171)
- Update current-state bottleneck table with small-account and pivot-update findings
- Reprioritize timeline: pivot-update crash fix is now priority 0
- Add two risks (pivot crash masks perf work, DB corruption on every crash)
- Bump doc version to 1.3
@ElFantasma ElFantasma mentioned this pull request Apr 15, 2026
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

L1 Ethereum client performance Block execution throughput and performance in general

Projects

Status: In Progress
Status: Todo

Development

Successfully merging this pull request may close these issues.

3 participants