Skip to content

Shuffle order of scanning account storages in calculate_accounts_lt_hash_at_startup#7226

Merged
kskalski merged 11 commits into
anza-xyz:masterfrom
kskalski:ks/generalize_reshuffl
Jul 30, 2025
Merged

Shuffle order of scanning account storages in calculate_accounts_lt_hash_at_startup#7226
kskalski merged 11 commits into
anza-xyz:masterfrom
kskalski:ks/generalize_reshuffl

Conversation

@kskalski
Copy link
Copy Markdown

@kskalski kskalski commented Jul 29, 2025

Problem

We scan storages using rayon work splitting heuristic that treats each collection element as relatively similar in cost / processing time. However account storages have vastly differing sizes and their default order actually has clusters of large vs small storages.
This causes work to be split into a unit batch that apparently has mostly large storages and may run for >100s longer than other theads that sit idle without being able to pick more work.

Summary of Changes

  • refactor code used to order storages for placement in snapshot archive to account_storage.rs module as more generic AccountStoragesOrderer util wrapper
  • add an additional strategy for simple randomized order of storages (doesn't require magic numbers and works well when we only need uniform total size per batch of items)
  • use randomized order in calculate_accounts_lt_hash_at_startup

At current snapshot (slot 356410350) unpacking and verification:

startup_verify_accounts total_us=117597515i verify_accounts_lt_hash_us=117597513i

vs baseline (master for the same data):

startup_verify_accounts total_us=372777563i verify_accounts_lt_hash_us=372777560i

@kskalski kskalski marked this pull request as ready for review July 29, 2025 16:40
Comment thread accounts-db/src/account_storage.rs Outdated

impl<'a> IntoIterator for AccountStoragesOrderer<'a> {
type Item = &'a AccountStorageEntry;
type IntoIter = Box<dyn Iterator<Item = &'a AccountStorageEntry> + 'a>;
Copy link
Copy Markdown
Author

@kskalski kskalski Jul 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clippy wanted me to implement IntoIterator trait, but then compiler complained I can't use impl Iterator as associated type value, bacause it's unstable :(

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that why we need this to be a Boxed dyn?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could call the method something other than into_iter() and then clippy should be OK.

For example one of these:

    pub fn my_iter(&self) -> impl ExactSizeIterator<Item = &'a AccountStorageEntry> + use<'a, '_> {
        self.indices.iter().map(|i| self.storages[*i].as_ref())
    }

    pub fn my_into_iter(self) -> impl ExactSizeIterator<Item = &'a AccountStorageEntry> + use<'a> {
        self.indices.into_iter().map(|i| self.storages[i].as_ref())
    }

And then we can remove the impl IntoIterator until the impl trait is allowed in associated types here.

(or fn into_seq_iter(), etc...)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yap, without impl_trait_in_assoc_type I need to name the iterator type explicitly, other approaches:

  • create a struct that encapsulate the map operation and implement Iterator trait, such that IntoIterator can just use the struct's type here
  • work-around the clippy insistence on having into_iter() as actual IntoIterator implementation, e.g. use a non-standard naming like make_iter
  • a deeper refactor such that we don't use map to create iterator, then we can probably name the type being used as iterator implementation

Similar concern is with IntoParallelIterator, but for it there is no clippy check

@kskalski kskalski requested a review from roryharr July 29, 2025 16:43
@kskalski kskalski changed the title Shuffle order of scanning account stroages in calculate_accounts_lt_hash_at_startup Shuffle order of scanning account storages in calculate_accounts_lt_hash_at_startup Jul 29, 2025
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jul 29, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.2%. Comparing base (d76c437) to head (6c0a181).
⚠️ Report is 2712 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff            @@
##           master    #7226    +/-   ##
========================================
  Coverage    83.1%    83.2%            
========================================
  Files         850      850            
  Lines      369949   369978    +29     
========================================
+ Hits       307739   307849   +110     
+ Misses      62210    62129    -81     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kskalski kskalski requested a review from brooksprumo July 29, 2025 17:03
Comment thread accounts-db/src/account_storage.rs Outdated
Comment thread accounts-db/src/account_storage.rs Outdated
Comment thread accounts-db/src/account_storage.rs Outdated

impl<'a> IntoIterator for AccountStoragesOrderer<'a> {
type Item = &'a AccountStorageEntry;
type IntoIter = Box<dyn Iterator<Item = &'a AccountStorageEntry> + 'a>;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that why we need this to be a Boxed dyn?

Comment thread accounts-db/src/account_storage.rs Outdated
Comment thread accounts-db/src/account_storage.rs Outdated
@kskalski kskalski requested a review from brooksprumo July 29, 2025 19:04
brooksprumo
brooksprumo previously approved these changes Jul 29, 2025
Copy link
Copy Markdown

@brooksprumo brooksprumo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Comment thread accounts-db/src/account_storage.rs Outdated
@kskalski kskalski added the CI Pull Request is ready to enter CI label Jul 30, 2025
@anza-team anza-team removed the CI Pull Request is ready to enter CI label Jul 30, 2025
@kskalski kskalski added the CI Pull Request is ready to enter CI label Jul 30, 2025
@anza-team anza-team removed the CI Pull Request is ready to enter CI label Jul 30, 2025
@kskalski kskalski merged commit 6d8f436 into anza-xyz:master Jul 30, 2025
58 of 61 checks passed
@kskalski kskalski deleted the ks/generalize_reshuffl branch July 30, 2025 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants