perf(p2p): Implement adaptive peer timeouts for snap sync#6117
Draft
pablodeymo wants to merge 26 commits into
Draft
perf(p2p): Implement adaptive peer timeouts for snap sync#6117pablodeymo wants to merge 26 commits into
pablodeymo wants to merge 26 commits into
Conversation
…tory Reorganize state_healing.rs and storage_healing.rs into a shared sync/healing/ module structure with clearer naming conventions: - Create sync/healing/ directory with mod.rs, types.rs, state.rs, storage.rs - Rename MembatchEntryValue to HealingQueueEntry - Rename MembatchEntry to StorageHealingQueueEntry - Rename Membatch type to StorageHealingQueue - Rename children_not_in_storage_count to missing_children_count - Rename membatch variables to healing_queue throughout - Extract shared HealingQueueEntry and StateHealingQueue types to types.rs - Update sync.rs imports to use new healing module
Reorganize snap protocol code for better maintainability: - Split rlpx/snap.rs into rlpx/snap/ directory: - codec.rs: RLP encoding/decoding for snap messages - messages.rs: Snap protocol message types - mod.rs: Module re-exports - Split snap.rs into snap/ directory: - constants.rs: Snap sync constants and configuration - server.rs: Snap protocol server implementation - mod.rs: Module re-exports - Move snap server tests to dedicated tests/ directory - Update imports in p2p.rs, peer_handler.rs, and code_collector.rs
Document the phased approach for reorganizing snap sync code: - Phase 1: rlpx/snap module split - Phase 2: snap module split with server extraction - Phase 3: healing module unification
Split the large sync.rs (1631 lines) into focused modules: - sync/full.rs (~260 lines): Full sync implementation - sync_cycle_full(), add_blocks_in_batch(), add_blocks() - sync/snap_sync.rs (~1100 lines): Snap sync implementation - sync_cycle_snap(), snap_sync(), SnapBlockSyncState - store_block_bodies(), update_pivot(), block_is_stale() - validate_state_root(), validate_storage_root(), validate_bytecodes() - insert_accounts(), insert_storages() (both rocksdb and non-rocksdb) - sync.rs (~285 lines): Orchestration layer - Syncer struct with start_sync() and sync_cycle() - SyncMode, SyncError, AccountStorageRoots types - Re-exports for public API
…p/client.rs Move all snap protocol client-side request methods from peer_handler.rs to a dedicated snap/client.rs module: - request_account_range and request_account_range_worker - request_bytecodes - request_storage_ranges and request_storage_ranges_worker - request_state_trienodes - request_storage_trienodes Also moves related types: DumpError, RequestMetadata, SnapClientError, RequestStateTrieNodesError, RequestStorageTrieNodes. This reduces peer_handler.rs from 2,060 to 670 lines (~68% reduction), leaving it focused on ETH protocol methods (block headers/bodies). Added SnapClientError variant to SyncError for proper error handling. Updated plan_snap_sync.md to mark Phase 4 as complete.
…napError type Implement Phase 5 of snap sync refactoring plan - Error Handling. - Create snap/error.rs with unified SnapError enum covering all snap protocol errors - Update server functions (process_account_range_request, process_storage_ranges_request, process_byte_codes_request, process_trie_nodes_request) to return Result<T, SnapError> - Remove SnapClientError and RequestStateTrieNodesError, consolidate into SnapError - Keep RequestStorageTrieNodesError struct for request ID tracking in storage healing - Add From<SnapError> for PeerConnectionError to support error propagation in message handlers - Update sync module to use SyncError::Snap variant - Update healing modules (state.rs, storage.rs) to use new error types - Move DumpError struct to error.rs module - Update test return types to use SnapError - Mark Phase 5 as completed in plan document All phases of the snap sync refactoring are now complete.
Change missing_children_count from u64 to usize in HealingQueueEntry and node_missing_children function to match StorageHealingQueueEntry and be consistent with memory structure counting conventions.
Resolve conflicts from #5977 and #6018 merge to main: - Keep modular sync structure (sync.rs delegates to full.rs and snap_sync.rs) - Keep snap client code in snap/client.rs (removed from peer_handler.rs) - Add InsertingAccountRanges metric from #6018 to snap_sync.rs - Remove unused info import from peer_handler.rs
Lines of code reportTotal lines added: Detailed view |
Base automatically changed from
refactor/snapsync-healing-unification
to
main
February 6, 2026 21:52
Benchmark Block Execution Results Comparison Against Main
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR implements adaptive timeouts for peer requests in snap sync. Instead of using a fixed 15-second timeout for all peers, we track each peer's response latency and set timeouts dynamically based on their historical performance.
Related Roadmap Item: 1.7 Peer Connection Optimization
Motivation
Currently, all peer requests use a fixed
PEER_REPLY_TIMEOUT = 15 seconds. This causes issues:Proposed Solution
Track each peer's response latency using an Exponential Moving Average (EMA) and set timeouts to 3x their average response time.
Key Components
LatencyTracker struct - EMA calculation with α=0.2
Timeout Bounds:
Changes Required:
latency: LatencyTrackertoPeerDatarecord_latency()andget_peer_timeout()toPeerTablemake_request()to measure and record latencyExample Impact
Expected Impact: 20-30% improvement in peer utilization, faster detection of slow/dead peers.
Implementation Plan
See
docs/plans/adaptive_peer_timeouts.mdfor the detailed implementation plan including:Files to Modify
discv4/peer_table.rsLatencyTracker, updatePeerData, add methodssnap/constants.rspeer_handler.rsmake_request(), update call sitessnap/client.rsTesting Plan
LatencyTrackerEMA calculationChecklist
LatencyTrackerstruct implementedPeerDataupdated with latency trackingPeerTablemethods addedmake_request()updated to record latency