-
Notifications
You must be signed in to change notification settings - Fork 207
perf(l1): optimize snap sync insertion and healing write paths #6159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 4 commits
98fb092
aeea7f0
08ba6f9
d998441
2fb356a
4c30718
d714d82
668bd9f
0c10e15
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -956,16 +956,12 @@ async fn insert_accounts( | |
| .collect(); | ||
| db.ingest_external_file(file_paths) | ||
| .map_err(|err| SyncError::RocksDBError(err.into_string()))?; | ||
| let iter = db.full_iterator(rocksdb::IteratorMode::Start); | ||
| for account in iter { | ||
| let account = account.map_err(|err| SyncError::RocksDBError(err.into_string()))?; | ||
| let account_state = AccountState::decode(&account.1).map_err(SyncError::Rlp)?; | ||
| if account_state.code_hash != *EMPTY_KECCACK_HASH { | ||
| code_hash_collector.add(account_state.code_hash); | ||
| code_hash_collector.flush_if_needed().await?; | ||
| } | ||
| } | ||
|
|
||
| let start = std::time::Instant::now(); | ||
| // We collect code hashes directly into the collector's HashSet during the trie | ||
| // build pass. The collector deduplicates, so memory is bounded by unique contract | ||
| // accounts (~5M on mainnet = ~160MB). We can't call flush_if_needed() here because | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The comment says ~5M entries = ~160MB, but This might be fine for the target hardware running snap sync, but the comment understates actual memory usage. Worth either correcting the estimate or noting that it's the raw key size only. |
||
| // the trie build is synchronous, so we flush after the build completes. | ||
| let iter = db.full_iterator(rocksdb::IteratorMode::Start); | ||
|
Comment on lines
+960
to
965
|
||
| let compute_state_root = trie_from_sorted_accounts_wrap( | ||
| trie.db(), | ||
|
|
@@ -976,6 +972,9 @@ async fn insert_accounts( | |
| .account_tries_inserted | ||
| .fetch_add(1, Ordering::Relaxed); | ||
| let account_state = AccountState::decode(v).expect("We should have accounts here"); | ||
|
Comment on lines
972
to
974
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Panics on bad input
Consider propagating these as Prompt To Fix With AIThis is a comment left during a code review.
Path: crates/networking/p2p/sync/snap_sync.rs
Line: 972:974
Comment:
**Panics on bad input**
`insert_accounts` now hard-panics on any temp RocksDB iterator error (`.expect(...)`) and on any malformed account value (`AccountState::decode(v).expect(...)`). These are external inputs (snapshot files / DB iterator) and can be corrupted; panicking will crash the sync process instead of returning `SyncError` like the rest of this function does.
Consider propagating these as `SyncError::RocksDBError` / `SyncError::Rlp` instead of using `expect` inside the iterator (`map_err(...)?`).
How can I resolve this? If you propose a fix, please make it concise.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This absolutely shouldn't happen. The only scenario is where the user has a faulty file system.
Comment on lines
966
to
974
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Panics on iterator/rlp decode
Prompt To Fix With AIThis is a comment left during a code review.
Path: crates/networking/p2p/sync/snap_sync.rs
Line: 966:974
Comment:
**Panics on iterator/rlp decode**
`insert_accounts` still has `expect(...)` calls inside the RocksDB iterator (`.map(|k| k.expect(...))` and `AccountState::decode(v).expect(...)`). These are external inputs (temp RocksDB iterator + snapshot content), so a single iterator error or malformed RLP will crash the whole sync instead of returning a `SyncError` like the rest of the function does. This should be converted to `map_err(...)?` / `Result` propagation so snap sync can fail gracefully with a useful error.
How can I resolve this? If you propose a fix, please make it concise. |
||
| if account_state.code_hash != *EMPTY_KECCACK_HASH { | ||
| code_hash_collector.add(account_state.code_hash); | ||
| } | ||
|
Comment on lines
960
to
+977
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unbounded code hash buffer Prompt To Fix With AIThis is a comment left during a code review.
Path: crates/networking/p2p/sync/snap_sync.rs
Line: 960:974
Comment:
**Unbounded code hash buffer**
`insert_accounts` now collects all non-empty `code_hash` values into `code_hashes: Vec<H256>` and only flushes after `trie_from_sorted_accounts_wrap` finishes. On large snapshots this can grow to millions of hashes and OOM before the trie build completes. This used to be bounded by `flush_if_needed()` during the scan. Consider flushing incrementally during the iterator pass (e.g., push+flush when the collector is full) or using a bounded buffer.
How can I resolve this? If you propose a fix, please make it concise.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed in
|
||
| if account_state.storage_root != *EMPTY_TRIE_HASH { | ||
| storage_accounts.accounts_with_storage_root.insert( | ||
| H256::from_slice(k), | ||
|
|
@@ -986,6 +985,12 @@ async fn insert_accounts( | |
| .map(|(k, v)| (H256::from_slice(&k), v.to_vec())), | ||
| ) | ||
| .map_err(SyncError::TrieGenerationError)?; | ||
| debug!( | ||
| elapsed_ms = start.elapsed().as_millis() as u64, | ||
| "insert_accounts trie build" | ||
| ); | ||
| // Flush any remaining code hashes that accumulated during the trie build | ||
| code_hash_collector.flush_if_needed().await?; | ||
|
|
||
| drop(db); // close db before removing directory | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be at least more than 2. If there is only a single thread it will wait on the write buffers getting freed.