Skip to content

fix(stages): add commit threshold to merkle stage v2#1656

Merged
gakonst merged 16 commits intomainfrom
fix/merkle-oom-v2
Mar 14, 2023
Merged

fix(stages): add commit threshold to merkle stage v2#1656
gakonst merged 16 commits intomainfrom
fix/merkle-oom-v2

Conversation

@joshieDo
Copy link
Collaborator

@joshieDo joshieDo commented Mar 7, 2023

Replaces #1594.

DBTrieLoader now has an inner checkpoint (no commit) which writes it to a generic SyncStageProgress table, if 500_000 elements have been written to trie(s).

Follow-up: Make sure HashingAccount and HashingStorage also do not commit and use a similar checkpoint logic.

@joshieDo joshieDo added C-bug An unexpected or incorrect behavior A-staged-sync Related to staged sync (pipelines and stages) labels Mar 7, 2023
@codecov-commenter
Copy link

codecov-commenter commented Mar 7, 2023

Codecov Report

Merging #1656 (07225e2) into main (54aab53) will decrease coverage by 0.08%.
The diff coverage is 68.27%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##             main    #1656      +/-   ##
==========================================
- Coverage   72.83%   72.75%   -0.08%     
==========================================
  Files         392      392              
  Lines       46923    47108     +185     
==========================================
+ Hits        34175    34273      +98     
- Misses      12748    12835      +87     
Flag Coverage Δ
integration-tests 20.16% <0.00%> (-0.12%) ⬇️
unit-tests 67.54% <68.27%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
bin/reth/src/dump_stage/merkle.rs 0.00% <0.00%> (ø)
crates/primitives/src/lib.rs 100.00% <ø> (ø)
crates/storage/db/src/tables/codecs/compact.rs 86.95% <ø> (ø)
crates/storage/db/src/tables/mod.rs 0.00% <ø> (ø)
...tes/storage/provider/src/providers/state/latest.rs 31.03% <0.00%> (ø)
crates/storage/provider/src/transaction.rs 84.81% <0.00%> (-0.60%) ⬇️
crates/storage/provider/src/trie/mod.rs 83.93% <71.48%> (-7.44%) ⬇️
crates/stages/src/stages/merkle.rs 80.93% <81.08%> (-0.80%) ⬇️
crates/primitives/src/proofs.rs 97.16% <100.00%> (+0.03%) ⬆️

... and 2 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Member

@gakonst gakonst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joshieDo I Resolved merge conflicts, and have tested this locally and confirmed that it addresses the OOM issue, as discussed. Elegant solution overall - nice work.

My recommendation would be that we merge this, followed by:

  1. Further investigation of why we're getting mismatched state roots
  2. Apply the same logic in the account/storage hashing stages
  3. Improve how the checkpointing logic is communicated back to the user, right now we don't show any notion of progress for these stages

Comment on lines +114 to +116
// if there are more blocks than threshold it is faster to rebuild the trie
let mut loader = DBTrieLoader::new(tx.deref_mut());
loader.calculate_root().map_err(|e| StageError::Fatal(Box::new(e)))?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this new method of instantiation of the DBTrieLoader since #1515

Comment on lines +181 to +191
TrieProgress::InProgress(_) => {
// Save the loader's progress & drop it
// to allow committing to the database (otherwise we're hitting the borrow
// checker)
let progress = loader.current;
drop(loader);
tx.commit()?;
// Reinstantiate the loader from where it was left off.
loader = DBTrieLoader::new(tx.deref_mut());
loader.current = progress;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unfortunately necessary, we've already mutably borrowed the transaction to update_root and we cannot mutably borrow it again to commit(). But calling loader.tx.commit() is mut self because we use the vanilla DbTxMut method, so it will consume the tx w/o re-opening.

So instead, we drop the loader, freeing up the Transaction object, commit/reopen tx, and then re-instantiate the loader from where it was left off. Should be fine.


let (account_proof, storage_root) = loader
.generate_acount_proof(self.db, root, hashed_address)
.generate_acount_proof(root, hashed_address)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.db is no longer needed because we pass it inside the loader

@gakonst gakonst merged commit 5b90cbc into main Mar 14, 2023
@gakonst gakonst deleted the fix/merkle-oom-v2 branch March 14, 2023 04:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-staged-sync Related to staged sync (pipelines and stages) C-bug An unexpected or incorrect behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants