Compute accounts data len during generate_index()#21757
Compute accounts data len during generate_index()#21757brooksprumo merged 5 commits intosolana-labs:masterfrom
Conversation
e36927c to
d6366e7
Compare
| &self.account_indexes, | ||
| ); | ||
| } | ||
| accounts_data_len += stored_account.data().len() as u64; |
There was a problem hiding this comment.
I decided on an implementation for generate_index_for_slot() that does not pass in the Atomic, but instead computes the len locally and returns it at the end (as a regular integer). I figured this would be faster due to less contention on the Atomic (and avoiding the atomic operations entirely in this function).
| // subtract data.len() from accounts_data_len for all old accounts that are in the index twice | ||
| if pass == 0 { |
There was a problem hiding this comment.
For the "remove duplicates" block, I first put the if pass == 0 {} around it.
I also applied the same tweaks of "compute sums locally without atomics, then reduce at the end with Atomics"
| let accounts_data_len_from_duplicates = unique_pubkeys | ||
| .into_iter() | ||
| .collect::<Vec<_>>() | ||
| .par_chunks(4096) | ||
| .map(pubkeys_to_accounts_data_len) | ||
| .sum(); | ||
| accounts_data_len.fetch_sub(accounts_data_len_from_duplicates, Ordering::Relaxed); |
There was a problem hiding this comment.
So now this is the main logic, of a map() and a sum(), which rayon handles. Then after that number is totaled up, only need the single/final atomic subtraction.
I tested this impl out by printing the accounts data len from generate_index(), and the immediately calling the Bank::get_total_accounts_stats() afterwards, which uses scan_accounts() under the hood, and asserted the values were always equal.
| let pubkeys_to_accounts_data_len = |pubkeys: &[Pubkey]| { | ||
| let mut accounts_data_len_from_duplicates = 0; | ||
| pubkeys.into_iter().for_each(|pubkey| { | ||
| if let Some(entry) = self.accounts_index.get_account_read_entry(pubkey) { | ||
| let slot_list = entry.slot_list(); | ||
| if slot_list.len() < 2 { | ||
| return; | ||
| } | ||
| let mut slot_list = slot_list.clone(); | ||
| slot_list.sort_unstable_by(|a, b| a.0.cmp(&b.0)); | ||
| assert!(slot_list[0].0 < slot_list[1].0); | ||
| slot_list | ||
| .into_iter() | ||
| .rev() | ||
| .skip(1) | ||
| .for_each(|(slot, account_info)| { | ||
| let maybe_storage_entry = self | ||
| .storage | ||
| .get_account_storage_entry(slot, account_info.store_id); | ||
| let mut accessor = LoadedAccountAccessor::Stored( | ||
| maybe_storage_entry | ||
| .map(|entry| (entry, account_info.offset)), | ||
| ); | ||
| let loaded_account = accessor.check_and_get_loaded_account(); | ||
| let account = loaded_account.take_account(); | ||
| accounts_data_len_from_duplicates += account.data().len(); | ||
| }); | ||
| } | ||
| }); | ||
| accounts_data_len_from_duplicates as u64 | ||
| }; |
There was a problem hiding this comment.
And this is the closure to get the accounts data len from the pubkeys chunk. Again, no atomic operations in here, yay!
d6366e7 to
0601dde
Compare
| info!( | ||
| "accounts data len: {}, {}", | ||
| accounts_data_len.load(Ordering::Relaxed), | ||
| timer |
There was a problem hiding this comment.
what time are you seeing on, say, a mnb snapshot?
There was a problem hiding this comment.
On my GCE box (brooks-dev2), I see 600-650ms.
| .2 | ||
| .iter() | ||
| .for_each(|(slot, account_info)| { | ||
| let maybe_storage_entry = self |
There was a problem hiding this comment.
I wrote this code here, but i didn't like it when I wrote it. I assumed there would be a 'load_account' function I could call at this level, but what I came up with were these pieces. It might be nice to see if there is a function already that goes from account_info -> accountshareddata
Codecov Report
@@ Coverage Diff @@
## master #21757 +/- ##
=========================================
- Coverage 81.6% 81.4% -0.2%
=========================================
Files 511 511
Lines 143320 143535 +215
=========================================
- Hits 116976 116901 -75
- Misses 26344 26634 +290 |
(cherry picked from commit ec7e177)
Problem
To set a cap on the accounts data size, transaction processing needs to know about the current size of the accounts data. This falls to the bank, which doesn't currently know about the size of the accounts data.
Summary of Changes
Compute the accounts data len during
AccountsDb::generate_index()and pass it to the Bank when deserializing from a snapshot.Related to issue #21604