-
Notifications
You must be signed in to change notification settings - Fork 207
perf(l1): optimize snap sync insertion and healing write paths #6159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
98fb092
aeea7f0
08ba6f9
d998441
2fb356a
4c30718
d714d82
668bd9f
0c10e15
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -283,28 +283,37 @@ async fn heal_state_trie( | |
| // PERF: reuse buffers? | ||
| let to_write = std::mem::take(&mut nodes_to_write); | ||
| let store = store.clone(); | ||
| // NOTE: we keep only a single task in the background to avoid out of order deletes | ||
| if !db_joinset.is_empty() { | ||
| // Allow up to 2 tasks in flight to overlap encoding with DB commit | ||
| while db_joinset.len() >= 2 { | ||
| db_joinset | ||
| .join_next() | ||
| .await | ||
| .expect("we just checked joinset is not empty")?; | ||
| .expect("joinset is not empty")?; | ||
| } | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. JoinSet errors dropped Prompt To Fix With AIThis is a comment left during a code review.
Path: crates/networking/p2p/sync/healing/state.rs
Line: 285:292
Comment:
**JoinSet errors dropped**
This loop uses `join_next().await.expect(...)?;` but the spawned closure returns `()`, so task panics/errors are effectively ignored (and `?` here can only be applying to the `Option`, not the task result). That means DB write panics can be silently swallowed and healing can proceed as if writes succeeded. You likely want to `match join_next().await` and handle `Err(JoinError)` explicitly (and propagate/abort on panic).
How can I resolve this? If you propose a fix, please make it concise.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the suggestion from the bot is wrong, but I would check the types of the task to make it return errors. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You're correct — my comment was wrong. The spawned closure does return
The PR properly handles all three cases via the The error handling is correct as-is. My apologies for the noise! |
||
| db_joinset.spawn_blocking(move || { | ||
| let encode_start = std::time::Instant::now(); | ||
| let node_count = to_write.len(); | ||
| let mut encoded_to_write = BTreeMap::new(); | ||
| for (path, node) in to_write { | ||
| for i in 0..path.len() { | ||
| encoded_to_write.insert(path.slice(0, i), vec![]); | ||
| encoded_to_write.entry(path.slice(0, i)).or_insert(vec![]); | ||
| } | ||
| encoded_to_write.insert(path, node.encode_to_vec()); | ||
| } | ||
| let encode_ms = encode_start.elapsed().as_millis() as u64; | ||
| let db_start = std::time::Instant::now(); | ||
| let trie_db = store | ||
| .open_direct_state_trie(*EMPTY_TRIE_HASH) | ||
| .expect("Store should open"); | ||
| let db = trie_db.db(); | ||
| // PERF: use put_batch_no_alloc (note that it needs to remove nodes too) | ||
| db.put_batch(encoded_to_write.into_iter().collect()) | ||
| .expect("The put batch on the store failed"); | ||
| debug!( | ||
| node_count, | ||
| encode_ms, | ||
| db_write_ms = db_start.elapsed().as_millis() as u64, | ||
| "state healing batch write" | ||
| ); | ||
| }); | ||
| } | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -212,25 +212,34 @@ pub async fn heal_storage_trie( | |||||||||||||||||||||||||||
| if nodes_to_write.values().map(Vec::len).sum::<usize>() > 100_000 || is_done || is_stale { | ||||||||||||||||||||||||||||
| let to_write: Vec<_> = nodes_to_write.drain().collect(); | ||||||||||||||||||||||||||||
| let store = state.store.clone(); | ||||||||||||||||||||||||||||
| // NOTE: we keep only a single task in the background to avoid out of order deletes | ||||||||||||||||||||||||||||
| if !db_joinset.is_empty() { | ||||||||||||||||||||||||||||
| // Allow up to 2 tasks in flight to overlap encoding with DB commit | ||||||||||||||||||||||||||||
| while db_joinset.len() >= 2 { | ||||||||||||||||||||||||||||
| db_joinset.join_next().await; | ||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
| db_joinset.join_next().await; | |
| match db_joinset.join_next().await { | |
| Some(Ok(_)) => { | |
| // successfully completed background DB write task | |
| } | |
| Some(Err(e)) => { | |
| error!(?e, "background DB write task failed during storage healing"); | |
| } | |
| None => { | |
| // no more tasks to join | |
| break; | |
| } | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 08ba6f92c. The join_next() result is now checked and JoinError is propagated as SyncError::JoinHandle(e), which will surface task panics instead of silently dropping them.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -956,16 +956,9 @@ async fn insert_accounts( | |
| .collect(); | ||
| db.ingest_external_file(file_paths) | ||
| .map_err(|err| SyncError::RocksDBError(err.into_string()))?; | ||
| let iter = db.full_iterator(rocksdb::IteratorMode::Start); | ||
| for account in iter { | ||
| let account = account.map_err(|err| SyncError::RocksDBError(err.into_string()))?; | ||
| let account_state = AccountState::decode(&account.1).map_err(SyncError::Rlp)?; | ||
| if account_state.code_hash != *EMPTY_KECCACK_HASH { | ||
| code_hash_collector.add(account_state.code_hash); | ||
| code_hash_collector.flush_if_needed().await?; | ||
| } | ||
| } | ||
|
|
||
| let start = std::time::Instant::now(); | ||
| let mut code_hashes: Vec<H256> = Vec::new(); | ||
| let iter = db.full_iterator(rocksdb::IteratorMode::Start); | ||
|
Comment on lines
+960
to
965
|
||
| let compute_state_root = trie_from_sorted_accounts_wrap( | ||
| trie.db(), | ||
|
|
@@ -976,6 +969,9 @@ async fn insert_accounts( | |
| .account_tries_inserted | ||
| .fetch_add(1, Ordering::Relaxed); | ||
| let account_state = AccountState::decode(v).expect("We should have accounts here"); | ||
|
Comment on lines
972
to
974
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Panics on bad input
Consider propagating these as Prompt To Fix With AIThis is a comment left during a code review.
Path: crates/networking/p2p/sync/snap_sync.rs
Line: 972:974
Comment:
**Panics on bad input**
`insert_accounts` now hard-panics on any temp RocksDB iterator error (`.expect(...)`) and on any malformed account value (`AccountState::decode(v).expect(...)`). These are external inputs (snapshot files / DB iterator) and can be corrupted; panicking will crash the sync process instead of returning `SyncError` like the rest of this function does.
Consider propagating these as `SyncError::RocksDBError` / `SyncError::Rlp` instead of using `expect` inside the iterator (`map_err(...)?`).
How can I resolve this? If you propose a fix, please make it concise.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This absolutely shouldn't happen. The only scenario is where the user has a faulty file system.
Comment on lines
966
to
974
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Panics on iterator/rlp decode
Prompt To Fix With AIThis is a comment left during a code review.
Path: crates/networking/p2p/sync/snap_sync.rs
Line: 966:974
Comment:
**Panics on iterator/rlp decode**
`insert_accounts` still has `expect(...)` calls inside the RocksDB iterator (`.map(|k| k.expect(...))` and `AccountState::decode(v).expect(...)`). These are external inputs (temp RocksDB iterator + snapshot content), so a single iterator error or malformed RLP will crash the whole sync instead of returning a `SyncError` like the rest of the function does. This should be converted to `map_err(...)?` / `Result` propagation so snap sync can fail gracefully with a useful error.
How can I resolve this? If you propose a fix, please make it concise. |
||
| if account_state.code_hash != *EMPTY_KECCACK_HASH { | ||
| code_hashes.push(account_state.code_hash); | ||
| } | ||
|
Comment on lines
960
to
+977
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unbounded code hash buffer Prompt To Fix With AIThis is a comment left during a code review.
Path: crates/networking/p2p/sync/snap_sync.rs
Line: 960:974
Comment:
**Unbounded code hash buffer**
`insert_accounts` now collects all non-empty `code_hash` values into `code_hashes: Vec<H256>` and only flushes after `trie_from_sorted_accounts_wrap` finishes. On large snapshots this can grow to millions of hashes and OOM before the trie build completes. This used to be bounded by `flush_if_needed()` during the scan. Consider flushing incrementally during the iterator pass (e.g., push+flush when the collector is full) or using a bounded buffer.
How can I resolve this? If you propose a fix, please make it concise.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed in
|
||
| if account_state.storage_root != *EMPTY_TRIE_HASH { | ||
| storage_accounts.accounts_with_storage_root.insert( | ||
| H256::from_slice(k), | ||
|
|
@@ -986,6 +982,15 @@ async fn insert_accounts( | |
| .map(|(k, v)| (H256::from_slice(&k), v.to_vec())), | ||
| ) | ||
| .map_err(SyncError::TrieGenerationError)?; | ||
| debug!( | ||
| elapsed_ms = start.elapsed().as_millis() as u64, | ||
| code_hashes_count = code_hashes.len(), | ||
| "insert_accounts trie build" | ||
| ); | ||
| for hash in code_hashes { | ||
| code_hash_collector.add(hash); | ||
| code_hash_collector.flush_if_needed().await?; | ||
| } | ||
|
|
||
| drop(db); // close db before removing directory | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be at least more than 2. If there is only a single thread it will wait on the write buffers getting freed.