Skip to content

refactor(l1): snap sync code reorganization and error handling consolidation#5975

Merged
pablodeymo merged 32 commits into
mainfrom
refactor/snapsync-healing-unification
Feb 6, 2026
Merged

refactor(l1): snap sync code reorganization and error handling consolidation#5975
pablodeymo merged 32 commits into
mainfrom
refactor/snapsync-healing-unification

Conversation

@pablodeymo
Copy link
Copy Markdown
Contributor

@pablodeymo pablodeymo commented Jan 21, 2026

Summary

Major refactoring of the snap sync implementation (~6,500 lines across 7 files) to improve code organization, maintainability, and error handling. This PR completes all 5 phases of the snap sync refactoring plan.

Motivation

The snap sync codebase had grown organically with:

  • Large monolithic files (sync.rs at 1,648 lines, peer_handler.rs at 2,074 lines)
  • Scattered error types across multiple modules
  • Mixed concerns (server/client code, protocol messages, sync orchestration)
  • Inconsistent naming conventions in healing modules

This refactoring addresses these issues through a structured, phased approach.

Changes by Phase

Phase 1: Foundation ✅

  • Created snap/ module directory structure
  • Moved server code from snap.rs to snap/server.rs
  • Created snap/constants.rs with documented protocol constants
  • Moved snap server tests to root test crate (test/tests/p2p/snap_server_tests.rs) to avoid clippy::unwrap_used lint denial inherited from the ethrex-p2p crate

Phase 2: Protocol Layer ✅

  • Split rlpx/snap.rs into rlpx/snap/ directory:
    • messages.rs - Snap protocol message type definitions
    • codec.rs - RLPxMessage encoding/decoding implementations
    • mod.rs - Module re-exports

Phase 3: Healing Unification ✅

  • Created sync/healing/ directory with unified structure:
    • types.rs - Shared healing types (HealingQueueEntry, StateHealingQueue)
    • state.rs - State trie healing (~420 lines)
    • storage.rs - Storage trie healing (~530 lines)
  • Improved naming conventions:
    • MembatchEntryValueHealingQueueEntry
    • MembatchEntryStorageHealingQueueEntry
    • MembatchStorageHealingQueue
    • children_not_in_storage_countmissing_children_count

Phase 4: Sync Orchestration ✅

  • Split sync.rs (1,648 lines) into focused modules:
    • sync/full.rs (~260 lines) - Full sync implementation
    • sync/snap_sync.rs (~1,100 lines) - Snap sync implementation
    • sync/mod.rs (~285 lines) - Syncer struct and orchestration
  • Extracted snap client methods from peer_handler.rs to snap/client.rs:
    • request_account_range, request_bytecodes, request_storage_ranges
    • request_state_trienodes, request_storage_trienodes
    • Reduced peer_handler.rs from 2,060 to 670 lines (~68% reduction)

Phase 5: Error Handling ✅

  • Created unified snap/error.rs with SnapError enum:
    • Store - Storage layer errors
    • Protocol - Connection/protocol errors
    • Trie - Trie operation errors
    • RlpDecode - RLP decoding errors
    • PeerTable - Peer table errors
    • BadRequest - Invalid request validation
    • ValidationError - Response validation failures
    • NoTasks, NoAccountHashes, NoAccountStorages, NoStorageRoots
    • InternalError, FileSystem, SnapshotDir, TaskPanic
    • InvalidData, InvalidHash
  • Removed redundant error types: SnapClientError, RequestStateTrieNodesError
  • Added From<SnapError> for PeerConnectionError for error propagation
  • Kept RequestStorageTrieNodesError struct for request ID tracking

Bug Fixes

Fixed several potential panics and edge cases in snap/client.rs:

  • Empty vector indexing: Replaced direct [0] indexing with safe .get(0).ok_or() or .first().unwrap_or() to prevent panics on empty vectors
  • Zero chunk_size in request_bytecodes: Added early return for empty input and adjusted chunk_count to not exceed input length
  • Zero chunk_size in request_account_range: Ensured chunk_count doesn't exceed the hash range to avoid zero-sized chunks
  • Typo fix: "Trie to get" → "Tried to get" in error message

New Module Structure

crates/networking/p2p/
├── snap/
│   ├── mod.rs          # Re-exports
│   ├── server.rs       # Server-side request processing
│   ├── client.rs       # Client-side request methods (~1,439 lines)
│   ├── constants.rs    # Protocol constants
│   └── error.rs        # Unified SnapError type
├── rlpx/snap/
│   ├── mod.rs          # Re-exports
│   ├── messages.rs     # Message struct definitions
│   └── codec.rs        # RLPxMessage implementations
├── sync/
│   ├── mod.rs          # Syncer and orchestration (~285 lines)
│   ├── full.rs         # Full sync (~260 lines)
│   ├── snap_sync.rs    # Snap sync (~1,100 lines)
│   └── healing/
│       ├── mod.rs      # Re-exports
│       ├── types.rs    # Shared healing types
│       ├── state.rs    # State healing (~420 lines)
│       └── storage.rs  # Storage healing (~530 lines)
└── peer_handler.rs     # ETH protocol methods (~670 lines)

test/tests/p2p/
└── snap_server_tests.rs  # Snap server tests (moved from p2p crate)

Testing

  • cargo check -p ethrex-p2p - Compilation passes
  • cargo clippy -p ethrex-p2p - No warnings
  • cargo test -p ethrex-p2p - All 36 unit tests pass
  • cargo test -p ethrex-test snap_server - All 12 snap server tests pass (moved to root test crate)

Checklist

  • Updated STORE_SCHEMA_VERSION (crates/storage/lib.rs) if the PR includes breaking changes to the Store requiring a re-sync.

Refactoring Plan Details

Click to expand the full refactoring plan

Overview

The Snap Sync implementation spans ~6,500 lines across 7 files. This plan provides a structured approach to simplify and improve the code.

Files Involved

Original Structure

File Lines Purpose
crates/networking/p2p/snap.rs 1,008 Server-side request processing (90% tests)
crates/networking/p2p/rlpx/snap.rs 389 Protocol message definitions
crates/networking/p2p/sync.rs 1,648 Main sync orchestration
crates/networking/p2p/sync/state_healing.rs 471 State trie healing
crates/networking/p2p/sync/storage_healing.rs 718 Storage healing
crates/networking/p2p/sync/code_collector.rs 102 Bytecode collection
crates/networking/p2p/peer_handler.rs 2,074 Client-side snap requests (~800 lines snap-related)

New Structure (After Phases 1-5)

File Purpose
crates/networking/p2p/snap/mod.rs Snap module re-exports
crates/networking/p2p/snap/server.rs Server-side request processing
crates/networking/p2p/snap/client.rs Client-side snap request methods (~1,439 lines)
crates/networking/p2p/snap/constants.rs Centralized protocol constants
crates/networking/p2p/rlpx/snap/mod.rs Protocol message re-exports
crates/networking/p2p/rlpx/snap/messages.rs Message struct definitions
crates/networking/p2p/rlpx/snap/codec.rs RLPxMessage implementations
crates/networking/p2p/sync/mod.rs Sync orchestration (~285 lines)
crates/networking/p2p/sync/full.rs Full sync implementation (~260 lines)
crates/networking/p2p/sync/snap_sync.rs Snap sync implementation (~1,100 lines)
crates/networking/p2p/sync/healing/mod.rs Healing module re-exports
crates/networking/p2p/sync/healing/types.rs Shared healing types
crates/networking/p2p/sync/healing/state.rs State healing (~420 lines)
crates/networking/p2p/sync/healing/storage.rs Storage healing (~530 lines)
crates/networking/p2p/peer_handler.rs ETH protocol requests (~670 lines)
test/tests/p2p/snap_server_tests.rs Snap server tests (in root test crate)

Implementation Order (Dependencies)

Phase 1.1-1.2 (snap module)
    ↓
Phase 1.3 (constants) ──→ Phase 2.1-2.3 (rlpx reorganization)
    ↓
Phase 1.4-1.5 (tests, imports)
    ↓
Phase 3.1-3.2 (pending_nodes)
    ↓
Phase 3.3-3.5 (healing unification)
    ↓
Phase 4.1-4.3 (sync split)
    ↓
Phase 4.4-4.5 (snap client extraction)
    ↓
Phase 5.1-5.3 (error consolidation)

Risk Mitigation

Phase Risk Mitigation
1. Foundation Low Simple reorganization, APIs unchanged
2. Protocol Medium Extensive hive testing
3. Healing Medium-High Incremental migration, keep old files until verified
4. Sync orchestration High Feature flags, integration tests, gradual extraction
5. Error handling Medium Keep old errors, wrap in new type initially

Decisions Made

  • No RLPxMessage macro: Implementations vary too much (e.g., GetStorageRanges special hash handling)
  • Backward-compatible re-exports: Constants re-exported from peer_handler.rs to avoid breaking changes
  • Incremental approach: Each phase builds on previous, allowing verification at each step

Key Considerations

  • The accounts_by_root_hash structure in sync is unbounded - consider adding limits
  • Tests should cover edge cases for hash boundary handling in GetStorageRanges
  • Healing processes share similar patterns but have distinct algorithms - trait unification should preserve this

@pablodeymo pablodeymo requested a review from a team as a code owner January 21, 2026 21:11
Copilot AI review requested due to automatic review settings January 21, 2026 21:11
@pablodeymo pablodeymo marked this pull request as draft January 21, 2026 21:11
@github-actions github-actions Bot added the L1 Ethereum client label Jan 21, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jan 21, 2026

Lines of code report

Total lines added: 4045
Total lines removed: 2330
Total lines changed: 6375

Detailed view
+--------------------------------------------------------+-------+-------+
| File                                                   | Lines | Diff  |
+--------------------------------------------------------+-------+-------+
| ethrex/crates/networking/p2p/peer_handler.rs           | 546   | -1187 |
+--------------------------------------------------------+-------+-------+
| ethrex/crates/networking/p2p/rlpx/connection/server.rs | 1053  | +2    |
+--------------------------------------------------------+-------+-------+
| ethrex/crates/networking/p2p/rlpx/error.rs             | 131   | +11   |
+--------------------------------------------------------+-------+-------+
| ethrex/crates/networking/p2p/rlpx/snap/codec.rs        | 284   | +284  |
+--------------------------------------------------------+-------+-------+
| ethrex/crates/networking/p2p/rlpx/snap/messages.rs     | 64    | +64   |
+--------------------------------------------------------+-------+-------+
| ethrex/crates/networking/p2p/rlpx/snap/mod.rs          | 7     | +7    |
+--------------------------------------------------------+-------+-------+
| ethrex/crates/networking/p2p/snap/client.rs            | 1182  | +1182 |
+--------------------------------------------------------+-------+-------+
| ethrex/crates/networking/p2p/snap/constants.rs         | 23    | +23   |
+--------------------------------------------------------+-------+-------+
| ethrex/crates/networking/p2p/snap/error.rs             | 94    | +94   |
+--------------------------------------------------------+-------+-------+
| ethrex/crates/networking/p2p/snap/mod.rs               | 22    | +22   |
+--------------------------------------------------------+-------+-------+
| ethrex/crates/networking/p2p/snap/server.rs            | 150   | +150  |
+--------------------------------------------------------+-------+-------+
| ethrex/crates/networking/p2p/sync.rs                   | 241   | -1140 |
+--------------------------------------------------------+-------+-------+
| ethrex/crates/networking/p2p/sync/full.rs              | 243   | +243  |
+--------------------------------------------------------+-------+-------+
| ethrex/crates/networking/p2p/sync/healing/mod.rs       | 7     | +7    |
+--------------------------------------------------------+-------+-------+
| ethrex/crates/networking/p2p/sync/healing/state.rs     | 387   | +387  |
+--------------------------------------------------------+-------+-------+
| ethrex/crates/networking/p2p/sync/healing/storage.rs   | 615   | +615  |
+--------------------------------------------------------+-------+-------+
| ethrex/crates/networking/p2p/sync/healing/types.rs     | 9     | +9    |
+--------------------------------------------------------+-------+-------+
| ethrex/crates/networking/p2p/sync/snap_sync.rs         | 945   | +945  |
+--------------------------------------------------------+-------+-------+
| ethrex/crates/networking/p2p/utils.rs                  | 191   | -3    |
+--------------------------------------------------------+-------+-------+

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reorganizes the state and storage healing modules into a unified sync/healing/ directory structure. The refactoring aims to improve code organization by consolidating related functionality, extracting shared types, and applying consistent naming conventions throughout the healing logic.

Changes:

  • Created new sync/healing/ module with separate files for state, storage, types, and module definitions
  • Renamed types for clarity: MembatchEntryValueHealingQueueEntry, MembatchEntryStorageHealingQueueEntry, and children_not_in_storage_countmissing_children_count
  • Extracted shared healing queue types to a common types.rs file

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
sync/healing/types.rs New file defining shared HealingQueueEntry struct and StateHealingQueue type alias
sync/healing/storage.rs Renamed from storage_healing.rs with updated type/variable names and reorganized imports
sync/healing/state.rs Renamed from state_healing.rs with updated type/variable names and reorganized imports
sync/healing/mod.rs New module definition file with public exports
sync.rs Updated imports to use new healing module structure and reference relocated constants
Comments suppressed due to low confidence (3)

crates/networking/p2p/sync/healing/storage.rs:118

  • Spelling error: 'childre' should be 'children'. Additionally, 'wchich' should be 'which'.
    crates/networking/p2p/sync/healing/storage.rs:11
  • This import references snap::constants module which doesn't appear to exist in the codebase or in this PR. The constants being imported (MAX_IN_FLIGHT_REQUESTS, MAX_RESPONSE_BYTES, SHOW_PROGRESS_INTERVAL_DURATION, STORAGE_BATCH_SIZE) are either currently defined elsewhere (e.g., MAX_RESPONSE_BYTES is in peer_handler.rs) or were removed from state_healing.rs. This PR appears incomplete - it needs to either create the snap::constants module with these constants, or the constants need to be moved/defined in an appropriate location before this import can work.
    crates/networking/p2p/sync/healing/state.rs:28
  • This import references snap::constants::{NODE_BATCH_SIZE, SHOW_PROGRESS_INTERVAL_DURATION} which doesn't appear to exist in the codebase or in this PR. The constants NODE_BATCH_SIZE and SHOW_PROGRESS_INTERVAL_DURATION were defined in the original state_healing.rs file but are now being imported from a non-existent module. This will cause a compilation error.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/networking/p2p/sync/healing/types.rs Outdated
Comment thread crates/networking/p2p/sync/healing/mod.rs
Comment thread crates/networking/p2p/sync.rs Outdated
@pablodeymo pablodeymo changed the title chore(l1): unify snapsync healing modules into sync/healing/ directory refactor(l1): snap sync code reorganization and error handling consolidation Jan 22, 2026
…tory

Reorganize state_healing.rs and storage_healing.rs into a shared
sync/healing/ module structure with clearer naming conventions:

- Create sync/healing/ directory with mod.rs, types.rs, state.rs, storage.rs
- Rename MembatchEntryValue to HealingQueueEntry
- Rename MembatchEntry to StorageHealingQueueEntry
- Rename Membatch type to StorageHealingQueue
- Rename children_not_in_storage_count to missing_children_count
- Rename membatch variables to healing_queue throughout
- Extract shared HealingQueueEntry and StateHealingQueue types to types.rs
- Update sync.rs imports to use new healing module
Reorganize snap protocol code for better maintainability:

- Split rlpx/snap.rs into rlpx/snap/ directory:
  - codec.rs: RLP encoding/decoding for snap messages
  - messages.rs: Snap protocol message types
  - mod.rs: Module re-exports

- Split snap.rs into snap/ directory:
  - constants.rs: Snap sync constants and configuration
  - server.rs: Snap protocol server implementation
  - mod.rs: Module re-exports

- Move snap server tests to dedicated tests/ directory
- Update imports in p2p.rs, peer_handler.rs, and code_collector.rs
Document the phased approach for reorganizing snap sync code:
- Phase 1: rlpx/snap module split
- Phase 2: snap module split with server extraction
- Phase 3: healing module unification
Split the large sync.rs (1631 lines) into focused modules:

- sync/full.rs (~260 lines): Full sync implementation
  - sync_cycle_full(), add_blocks_in_batch(), add_blocks()

- sync/snap_sync.rs (~1100 lines): Snap sync implementation
  - sync_cycle_snap(), snap_sync(), SnapBlockSyncState
  - store_block_bodies(), update_pivot(), block_is_stale()
  - validate_state_root(), validate_storage_root(), validate_bytecodes()
  - insert_accounts(), insert_storages() (both rocksdb and non-rocksdb)

- sync.rs (~285 lines): Orchestration layer
  - Syncer struct with start_sync() and sync_cycle()
  - SyncMode, SyncError, AccountStorageRoots types
  - Re-exports for public API
…p/client.rs

Move all snap protocol client-side request methods from peer_handler.rs
to a dedicated snap/client.rs module:
- request_account_range and request_account_range_worker
- request_bytecodes
- request_storage_ranges and request_storage_ranges_worker
- request_state_trienodes
- request_storage_trienodes

Also moves related types: DumpError, RequestMetadata, SnapClientError,
RequestStateTrieNodesError, RequestStorageTrieNodes.

This reduces peer_handler.rs from 2,060 to 670 lines (~68% reduction),
leaving it focused on ETH protocol methods (block headers/bodies).

Added SnapClientError variant to SyncError for proper error handling.
Updated plan_snap_sync.md to mark Phase 4 as complete.
…napError type

Implement Phase 5 of snap sync refactoring plan - Error Handling.

- Create snap/error.rs with unified SnapError enum covering all snap protocol errors
- Update server functions (process_account_range_request, process_storage_ranges_request,
  process_byte_codes_request, process_trie_nodes_request) to return Result<T, SnapError>
- Remove SnapClientError and RequestStateTrieNodesError, consolidate into SnapError
- Keep RequestStorageTrieNodesError struct for request ID tracking in storage healing
- Add From<SnapError> for PeerConnectionError to support error propagation in message handlers
- Update sync module to use SyncError::Snap variant
- Update healing modules (state.rs, storage.rs) to use new error types
- Move DumpError struct to error.rs module
- Update test return types to use SnapError
- Mark Phase 5 as completed in plan document

All phases of the snap sync refactoring are now complete.
@pablodeymo pablodeymo force-pushed the refactor/snapsync-healing-unification branch from e9abcbc to d91bc8b Compare January 22, 2026 16:20
@github-actions
Copy link
Copy Markdown

Kimi AI Code Review

This PR refactors the snap sync code, reorganizing it and consolidating error handling. Here are my specific comments and suggestions:

Code Correctness and Potential Bugs

  1. Removal of snap module: The original snap.rs file has been removed and its contents redistributed. Ensure that all functionalities are correctly moved to their respective new locations (codec.rs, messages.rs, and mod.rs). Double-check that no logic is accidentally removed or broken during the refactor.

  2. Error Handling Consolidation: The consolidation of error handling into a single SnapError enum is a good approach for reducing complexity. However, ensure that all possible error cases are covered and that the new error enum is fully utilized across the codebase.

Security Vulnerabilities

  1. Peer Connection Handling: With the changes in peer connection handling, ensure that there are no unintended information disclosures or opportunities for peers to cause denial-of-service attacks through malformed messages.

Performance Implications

  1. Message Decoding/Encoding: The separation of concerns for message encoding and decoding might introduce slight performance overhead. Profile the changes to ensure they do not negatively impact the overall performance.

Rust Best Practices and Idiomatic Patterns

  1. Use of pub(crate): The use of pub(crate) for internal modules is good practice to limit visibility, reducing the risk of accidental public API changes.

  2. Error Handling: The use of the thiserror crate for error definitions is idiomatic and improves readability and maintainability.

Memory Safety and Proper Error Handling

  1. Asynchronous Code: Ensure that all asynchronous code paths handle potential panics and errors correctly, especially when spawning new tasks.

  2. Resource Management: Verify that all resources (e.g., file handles, network connections) are properly managed and released when no longer needed.

Code Readability and Maintainability

  1. Documentation: The new module structure is well-documented, which improves maintainability. Ensure that all public APIs have clear documentation.

  2. Code Comments: While the code is relatively well-commented, consider adding more inline comments to explain complex logic or reasoning behind certain design decisions.

Ethereum-Specific Considerations

  1. EVM Opcode Correctness: No direct changes to EVM opcode handling are visible in this PR. However, ensure that any changes in the surrounding code do not inadvertently affect EVM execution.

  2. Consensus Rules and EIP Compliance: The refactor should not affect consensus rules or EIP compliance. Verify that all relevant logic remains compliant with the Ethereum protocol.

  3. State Trie and Storage Operations: The changes involve state trie and storage operations. Ensure that these operations are performed correctly and efficiently, without introducing any consensus flaws.

  4. RLP Encoding/Decoding: The refactor involves RLP encoding and decoding. Ensure that all encoding and decoding operations are correct and do not lead to any data corruption or loss.

  5. Transaction and Block Validation Logic: No direct changes to transaction or block validation logic are visible. However,间接影响 should be considered, especially in how state and storage data are handled.

Overall, the refactor appears to be well-structured and follows good Rust practices. The key will be ensuring that all functionality is preserved and that the new error handling mechanisms are comprehensive and correctly implemented.


Automated review by Kimi (Moonshot AI)

Comment thread crates/networking/p2p/snap/client.rs Outdated
//! Snap sync client - methods for requesting snap protocol data from peers
//!
//! This module contains all the client-side snap protocol request methods
//! implemented as extension methods on PeerHandler.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it should be exenstion methods on peer handler. I think it would be better if implemented as just some functions we use a snap sync downloader.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Converted all 5 methods out of impl PeerHandler into standalone pub async fn functions. The three that used self.peer_table (request_account_range, request_bytecodes, request_storage_ranges) now take peers: &mut PeerHandler as a parameter. The two that were already static (request_state_trienodes, request_storage_trienodes) just moved out of the impl block unchanged.

Comment thread crates/networking/p2p/snap/client.rs Outdated
Comment thread crates/networking/p2p/snap/client.rs Outdated
pub async fn request_account_range(
&mut self,
start: H256,
limit: H256,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only ever gets called with

H256::zero(), H256::repeat_byte(0xff) as the two parameters. May not work with an arbitrary limit.

Comment thread crates/networking/p2p/snap/client.rs Outdated
Comment thread crates/networking/p2p/sync/healing/state.rs
account_storages: Vec<Vec<(H256, U256)>>,
peer_id: H256,
remaining_start: usize,
remaining_end: usize,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remaining_start remaining_end and remaining_hash_range are obtuse without an explanation.

Comment thread crates/networking/p2p/snap/client.rs Outdated
}

// channel to send the tasks to the peers
let (task_sender, mut task_receiver) =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue, we should handle all task with a joinset.

Comment thread crates/networking/p2p/snap/client.rs Outdated
.set(CurrentStepValue::RequestingStorageRanges);
debug!("Starting request_storage_ranges function");
// 1) split the range in chunks of same length
let mut accounts_by_root_hash: BTreeMap<_, Vec<_>> = BTreeMap::new();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Key insight: the accounts_by_root_hash is the accounts organized by the hash of the storage root. This is key because each storage root is downloaded only once. We still need to know the path to that account.

pub struct AccountStorageRoots {
/// The accounts that have not been healed are guaranteed to have the original storage root
/// we can read this storage root
pub accounts_with_storage_root: BTreeMap<H256, (Option<H256>, Vec<(H256, H256)>)>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The (Option<H256>, Vec<(H256, H256)>) is a complex type. Here's what it means, and it should be made more explicit. Option<H256> is the storage root of the account. If the root is None it means that we healed the account and we aren't sure if that is the storage root anymore. This behaviour was needed in the hash based trie, it may not be necessary in path based trie. Vec<(H256, H256)> is the intervals that we still need to download from the storage trie. When we reach a big account, we chunk it for download the storage. During the download we may need to stop request_strorage_request and continue later. As such, we need to know which intervals are still missing.

Comment thread crates/networking/p2p/snap/client.rs Outdated
}
}

if remaining_start < remaining_end {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic with remaining_start and remaining_end logic is a complex state machine. We should make it more explicit so it's easier to understand, ideally by changing the StorageResult struct. The state machine can be viewed in the documentation at the document docs/l1/fundamentals/snap_sync/Flow-BigAccountLogic.png

proof conversion helpers (encodable_to_proof, proof_to_encodable) from
server.rs to mod.rs since they are shared utilities used by both client
and server.
peer selection, extract magic numbers into named constants
(ACCOUNT_RANGE_CHUNK_COUNT, STORAGE_BATCH_SIZE, HASH_MAX), remove unused
contents field from DumpError and use derive Debug, rename
missing_children to pending_children in healing code, and wrap
process_byte_codes_request in spawn_blocking for consistency with other
server handlers.
…functions

so they no longer depend on the PeerHandler type as a method receiver. The three
methods that used self (request_account_range, request_bytecodes,
request_storage_ranges) now take peers: &mut PeerHandler as a parameter, and the
two already-static methods (request_state_trienodes, request_storage_trienodes)
simply move out of the impl block. Callers in snap_sync.rs, healing/state.rs,
and healing/storage.rs are updated accordingly.
@ilitteri ilitteri requested a review from iovoid February 6, 2026 18:15
@github-project-automation github-project-automation Bot moved this to In Review in ethrex_l1 Feb 6, 2026
@pablodeymo pablodeymo added this pull request to the merge queue Feb 6, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Feb 6, 2026
@pablodeymo pablodeymo added this pull request to the merge queue Feb 6, 2026
github-merge-queue Bot pushed a commit that referenced this pull request Feb 6, 2026
…idation (#5975)

## Summary

Major refactoring of the snap sync implementation (~6,500 lines across 7
files) to improve code organization, maintainability, and error
handling. This PR completes all 5 phases of the snap sync refactoring
plan.

## Motivation

The snap sync codebase had grown organically with:
- Large monolithic files (sync.rs at 1,648 lines, peer_handler.rs at
2,074 lines)
- Scattered error types across multiple modules
- Mixed concerns (server/client code, protocol messages, sync
orchestration)
- Inconsistent naming conventions in healing modules

This refactoring addresses these issues through a structured, phased
approach.

## Changes by Phase

### Phase 1: Foundation ✅
- Created `snap/` module directory structure
- Moved server code from `snap.rs` to `snap/server.rs`
- Created `snap/constants.rs` with documented protocol constants
- Extracted snap server tests to `tests/snap_server_tests.rs`

### Phase 2: Protocol Layer ✅
- Split `rlpx/snap.rs` into `rlpx/snap/` directory:
  - `messages.rs` - Snap protocol message type definitions
  - `codec.rs` - RLPxMessage encoding/decoding implementations
  - `mod.rs` - Module re-exports

### Phase 3: Healing Unification ✅
- Created `sync/healing/` directory with unified structure:
- `types.rs` - Shared healing types (HealingQueueEntry,
StateHealingQueue)
  - `state.rs` - State trie healing (~420 lines)
  - `storage.rs` - Storage trie healing (~530 lines)
- Improved naming conventions:
  - `MembatchEntryValue` → `HealingQueueEntry`
  - `MembatchEntry` → `StorageHealingQueueEntry`
  - `Membatch` → `StorageHealingQueue`
  - `children_not_in_storage_count` → `missing_children_count`

### Phase 4: Sync Orchestration ✅
- Split `sync.rs` (1,648 lines) into focused modules:
  - `sync/full.rs` (~260 lines) - Full sync implementation
  - `sync/snap_sync.rs` (~1,100 lines) - Snap sync implementation
  - `sync/mod.rs` (~285 lines) - Syncer struct and orchestration
- Extracted snap client methods from `peer_handler.rs` to
`snap/client.rs`:
- `request_account_range`, `request_bytecodes`, `request_storage_ranges`
  - `request_state_trienodes`, `request_storage_trienodes`
  - Reduced `peer_handler.rs` from 2,060 to 670 lines (~68% reduction)

### Phase 5: Error Handling ✅
- Created unified `snap/error.rs` with `SnapError` enum:
  - `Store` - Storage layer errors
  - `Protocol` - Connection/protocol errors  
  - `Trie` - Trie operation errors
  - `RlpDecode` - RLP decoding errors
  - `PeerTable` - Peer table errors
  - `BadRequest` - Invalid request validation
  - `ValidationError` - Response validation failures
  - `NoTasks`, `NoAccountHashes`, `NoAccountStorages`, `NoStorageRoots`
  - `InternalError`, `FileSystem`, `SnapshotDir`, `TaskPanic`
  - `InvalidData`, `InvalidHash`
- Removed redundant error types: `SnapClientError`,
`RequestStateTrieNodesError`
- Added `From<SnapError>` for `PeerConnectionError` for error
propagation
- Kept `RequestStorageTrieNodesError` struct for request ID tracking

## Bug Fixes

Fixed several potential panics and edge cases in `snap/client.rs`:

- **Empty vector indexing**: Replaced direct `[0]` indexing with safe
`.get(0).ok_or()` or `.first().unwrap_or()` to prevent panics on empty
vectors
- **Zero chunk_size in `request_bytecodes`**: Added early return for
empty input and adjusted `chunk_count` to not exceed input length
- **Zero chunk_size in `request_account_range`**: Ensured `chunk_count`
doesn't exceed the hash range to avoid zero-sized chunks
- **Typo fix**: "Trie to get" → "Tried to get" in error message

## New Module Structure

```
crates/networking/p2p/
├── snap/
│   ├── mod.rs          # Re-exports
│   ├── server.rs       # Server-side request processing
│   ├── client.rs       # Client-side request methods (~1,439 lines)
│   ├── constants.rs    # Protocol constants
│   └── error.rs        # Unified SnapError type
├── rlpx/snap/
│   ├── mod.rs          # Re-exports
│   ├── messages.rs     # Message struct definitions
│   └── codec.rs        # RLPxMessage implementations
├── sync/
│   ├── mod.rs          # Syncer and orchestration (~285 lines)
│   ├── full.rs         # Full sync (~260 lines)
│   ├── snap_sync.rs    # Snap sync (~1,100 lines)
│   └── healing/
│       ├── mod.rs      # Re-exports
│       ├── types.rs    # Shared healing types
│       ├── state.rs    # State healing (~420 lines)
│       └── storage.rs  # Storage healing (~530 lines)
├── peer_handler.rs     # ETH protocol methods (~670 lines)
└── tests/
    └── snap_server_tests.rs
```

## Testing

- ✅ `cargo check -p ethrex-p2p` - Compilation passes
- ✅ `cargo clippy -p ethrex-p2p` - No warnings
- ✅ `cargo test -p ethrex-p2p` - All 48 tests pass (36 unit + 12 snap
server)

## Checklist

- [ ] Updated `STORE_SCHEMA_VERSION` (crates/storage/lib.rs) if the PR
includes breaking changes to the `Store` requiring a re-sync.

---

## Refactoring Plan Details

<details>
<summary>Click to expand the full refactoring plan</summary>

### Overview

The Snap Sync implementation spans ~6,500 lines across 7 files. This
plan provides a structured approach to simplify and improve the code.

### Files Involved

#### Original Structure
| File | Lines | Purpose |
|------|-------|---------|
| `crates/networking/p2p/snap.rs` | 1,008 | Server-side request
processing (90% tests) |
| `crates/networking/p2p/rlpx/snap.rs` | 389 | Protocol message
definitions |
| `crates/networking/p2p/sync.rs` | 1,648 | Main sync orchestration |
| `crates/networking/p2p/sync/state_healing.rs` | 471 | State trie
healing |
| `crates/networking/p2p/sync/storage_healing.rs` | 718 | Storage
healing |
| `crates/networking/p2p/sync/code_collector.rs` | 102 | Bytecode
collection |
| `crates/networking/p2p/peer_handler.rs` | 2,074 | Client-side snap
requests (~800 lines snap-related) |

#### New Structure (After Phases 1-5)
| File | Purpose |
|------|---------|
| `crates/networking/p2p/snap/mod.rs` | Snap module re-exports |
| `crates/networking/p2p/snap/server.rs` | Server-side request
processing |
| `crates/networking/p2p/snap/client.rs` | Client-side snap request
methods (~1,439 lines) |
| `crates/networking/p2p/snap/constants.rs` | Centralized protocol
constants |
| `crates/networking/p2p/rlpx/snap/mod.rs` | Protocol message re-exports
|
| `crates/networking/p2p/rlpx/snap/messages.rs` | Message struct
definitions |
| `crates/networking/p2p/rlpx/snap/codec.rs` | RLPxMessage
implementations |
| `crates/networking/p2p/sync/mod.rs` | Sync orchestration (~285 lines)
|
| `crates/networking/p2p/sync/full.rs` | Full sync implementation (~260
lines) |
| `crates/networking/p2p/sync/snap_sync.rs` | Snap sync implementation
(~1,100 lines) |
| `crates/networking/p2p/sync/healing/mod.rs` | Healing module
re-exports |
| `crates/networking/p2p/sync/healing/types.rs` | Shared healing types |
| `crates/networking/p2p/sync/healing/state.rs` | State healing (~420
lines) |
| `crates/networking/p2p/sync/healing/storage.rs` | Storage healing
(~530 lines) |
| `crates/networking/p2p/peer_handler.rs` | ETH protocol requests (~670
lines) |
| `crates/networking/p2p/tests/snap_server_tests.rs` | Snap server tests
|

### Implementation Order (Dependencies)

```
Phase 1.1-1.2 (snap module)
    ↓
Phase 1.3 (constants) ──→ Phase 2.1-2.3 (rlpx reorganization)
    ↓
Phase 1.4-1.5 (tests, imports)
    ↓
Phase 3.1-3.2 (pending_nodes)
    ↓
Phase 3.3-3.5 (healing unification)
    ↓
Phase 4.1-4.3 (sync split)
    ↓
Phase 4.4-4.5 (snap client extraction)
    ↓
Phase 5.1-5.3 (error consolidation)
```

### Risk Mitigation

| Phase | Risk | Mitigation |
|-------|------|------------|
| 1. Foundation | Low | Simple reorganization, APIs unchanged |
| 2. Protocol | Medium | Extensive hive testing |
| 3. Healing | Medium-High | Incremental migration, keep old files until
verified |
| 4. Sync orchestration | High | Feature flags, integration tests,
gradual extraction |
| 5. Error handling | Medium | Keep old errors, wrap in new type
initially |

### Decisions Made
- **No RLPxMessage macro**: Implementations vary too much (e.g.,
`GetStorageRanges` special hash handling)
- **Backward-compatible re-exports**: Constants re-exported from
`peer_handler.rs` to avoid breaking changes
- **Incremental approach**: Each phase builds on previous, allowing
verification at each step

### Key Considerations
- The `accounts_by_root_hash` structure in sync is unbounded - consider
adding limits
- Tests should cover edge cases for hash boundary handling in
`GetStorageRanges`
- Healing processes share similar patterns but have distinct algorithms
- trait unification should preserve this

</details>
pablodeymo added a commit that referenced this pull request Feb 6, 2026
Mark sections 1.2, 1.5, and 1.8 as discarded. Mark 2.3 as merged (PR #5975).
Link sections 1.4, 2.1, 2.4 to Issue #6140 (request_storage_ranges refactor).
Add new section 2.8 for correctness bug fixes. Update timeline accordingly.
…s/p2p/

to avoid clippy::unwrap_used denial inherited from the ethrex-p2p crate lints.
The root test crate uses workspace lints which don't deny unwrap_used.
@pablodeymo pablodeymo removed this pull request from the merge queue due to a manual request Feb 6, 2026
@pablodeymo pablodeymo enabled auto-merge February 6, 2026 20:26
pablodeymo added a commit that referenced this pull request Feb 6, 2026
Add 9 new sections (2.9-2.17) from fedacking's PR #5975 review:
- 2.9: Fix snap protocol capability bug (ETH vs SNAP)
- 2.10: Add spawn_blocking to bytecodes handler
- 2.11: Remove dead DumpError.contents field
- 2.12: Use JoinSet instead of channels for workers
- 2.13: Self-contained StorageTask with hashes instead of indices
- 2.14: Move snap client methods off PeerHandler
- 2.15: Guard write_set in account path
- 2.16: Healing code unification (generic trie healing)
- 2.17: Use existing constants for magic numbers

Also extend 2.1 description to include AccountStorageRoots SoA
simplification and named structs for tuple types.
Update timeline with new items.
@pablodeymo pablodeymo added this pull request to the merge queue Feb 6, 2026
Merged via the queue into main with commit 59ccf80 Feb 6, 2026
52 checks passed
@github-project-automation github-project-automation Bot moved this from In Review to Done in ethrex_l1 Feb 6, 2026
@pablodeymo pablodeymo deleted the refactor/snapsync-healing-unification branch February 6, 2026 21:52
pablodeymo added a commit that referenced this pull request Feb 6, 2026
…5975

instead of the now-merged refactor/snapsync-healing-unification branch.
ilitteri added a commit that referenced this pull request Feb 11, 2026
… fix diagnostic logging

The refactor in #5975 changed get_best_peer from SUPPORTED_ETH_CAPABILITIES to
SUPPORTED_SNAP_CAPABILITIES in all three download functions. This is the only
functional change from main where snap sync works correctly.

Also fixes the diagnostic logging timer which was sharing last_update with the
1s metrics refresh, making the 10s diagnostic check dead code. Now uses a
separate last_diagnostic timer and adds in_flight task tracking.
ElFantasma pushed a commit that referenced this pull request Mar 13, 2026
Mark sections 1.2, 1.5, and 1.8 as discarded. Mark 2.3 as merged (PR #5975).
Link sections 1.4, 2.1, 2.4 to Issue #6140 (request_storage_ranges refactor).
Add new section 2.8 for correctness bug fixes. Update timeline accordingly.
ElFantasma pushed a commit that referenced this pull request Mar 13, 2026
Add 9 new sections (2.9-2.17) from fedacking's PR #5975 review:
- 2.9: Fix snap protocol capability bug (ETH vs SNAP)
- 2.10: Add spawn_blocking to bytecodes handler
- 2.11: Remove dead DumpError.contents field
- 2.12: Use JoinSet instead of channels for workers
- 2.13: Self-contained StorageTask with hashes instead of indices
- 2.14: Move snap client methods off PeerHandler
- 2.15: Guard write_set in account path
- 2.16: Healing code unification (generic trie healing)
- 2.17: Use existing constants for magic numbers

Also extend 2.1 description to include AccountStorageRoots SoA
simplification and named structs for tuple types.
Update timeline with new items.
ElFantasma pushed a commit that referenced this pull request Mar 13, 2026
…5975

instead of the now-merged refactor/snapsync-healing-unification branch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

L1 Ethereum client snapsync

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants