Skip to content
This repository was archived by the owner on Jan 22, 2025. It is now read-only.

Compute accounts data len during generate_index()#21757

Merged
brooksprumo merged 5 commits intosolana-labs:masterfrom
brooksprumo:generate-index-accounts-data-len
Dec 10, 2021
Merged

Compute accounts data len during generate_index()#21757
brooksprumo merged 5 commits intosolana-labs:masterfrom
brooksprumo:generate-index-accounts-data-len

Conversation

@brooksprumo
Copy link
Copy Markdown
Contributor

@brooksprumo brooksprumo commented Dec 9, 2021

Problem

To set a cap on the accounts data size, transaction processing needs to know about the current size of the accounts data. This falls to the bank, which doesn't currently know about the size of the accounts data.

Summary of Changes

Compute the accounts data len during AccountsDb::generate_index() and pass it to the Bank when deserializing from a snapshot.

Related to issue #21604

@brooksprumo brooksprumo force-pushed the generate-index-accounts-data-len branch from e36927c to d6366e7 Compare December 9, 2021 23:36
&self.account_indexes,
);
}
accounts_data_len += stored_account.data().len() as u64;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided on an implementation for generate_index_for_slot() that does not pass in the Atomic, but instead computes the len locally and returns it at the end (as a regular integer). I figured this would be faster due to less contention on the Atomic (and avoiding the atomic operations entirely in this function).

Comment on lines +7006 to +7007
// subtract data.len() from accounts_data_len for all old accounts that are in the index twice
if pass == 0 {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the "remove duplicates" block, I first put the if pass == 0 {} around it.

I also applied the same tweaks of "compute sums locally without atomics, then reduce at the end with Atomics"

Comment on lines +7047 to +7053
let accounts_data_len_from_duplicates = unique_pubkeys
.into_iter()
.collect::<Vec<_>>()
.par_chunks(4096)
.map(pubkeys_to_accounts_data_len)
.sum();
accounts_data_len.fetch_sub(accounts_data_len_from_duplicates, Ordering::Relaxed);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So now this is the main logic, of a map() and a sum(), which rayon handles. Then after that number is totaled up, only need the single/final atomic subtraction.

I tested this impl out by printing the accounts data len from generate_index(), and the immediately calling the Bank::get_total_accounts_stats() afterwards, which uses scan_accounts() under the hood, and asserted the values were always equal.

Comment thread runtime/src/accounts_db.rs Outdated
Comment on lines +7008 to +7038
let pubkeys_to_accounts_data_len = |pubkeys: &[Pubkey]| {
let mut accounts_data_len_from_duplicates = 0;
pubkeys.into_iter().for_each(|pubkey| {
if let Some(entry) = self.accounts_index.get_account_read_entry(pubkey) {
let slot_list = entry.slot_list();
if slot_list.len() < 2 {
return;
}
let mut slot_list = slot_list.clone();
slot_list.sort_unstable_by(|a, b| a.0.cmp(&b.0));
assert!(slot_list[0].0 < slot_list[1].0);
slot_list
.into_iter()
.rev()
.skip(1)
.for_each(|(slot, account_info)| {
let maybe_storage_entry = self
.storage
.get_account_storage_entry(slot, account_info.store_id);
let mut accessor = LoadedAccountAccessor::Stored(
maybe_storage_entry
.map(|entry| (entry, account_info.offset)),
);
let loaded_account = accessor.check_and_get_loaded_account();
let account = loaded_account.take_account();
accounts_data_len_from_duplicates += account.data().len();
});
}
});
accounts_data_len_from_duplicates as u64
};
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this is the closure to get the accounts data len from the pubkeys chunk. Again, no atomic operations in here, yay!

@brooksprumo brooksprumo force-pushed the generate-index-accounts-data-len branch from d6366e7 to 0601dde Compare December 9, 2021 23:50
@brooksprumo brooksprumo marked this pull request as ready for review December 10, 2021 00:05
Comment thread runtime/src/accounts_db.rs
Comment thread runtime/src/accounts_db.rs Outdated
Comment thread runtime/src/accounts_db.rs Outdated
info!(
"accounts data len: {}, {}",
accounts_data_len.load(Ordering::Relaxed),
timer
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what time are you seeing on, say, a mnb snapshot?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my GCE box (brooks-dev2), I see 600-650ms.

Comment thread runtime/src/accounts_db.rs
.2
.iter()
.for_each(|(slot, account_info)| {
let maybe_storage_entry = self
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote this code here, but i didn't like it when I wrote it. I assumed there would be a 'load_account' function I could call at this level, but what I came up with were these pieces. It might be nice to see if there is a function already that goes from account_info -> accountshareddata

@jeffwashington jeffwashington self-requested a review December 10, 2021 17:36
Copy link
Copy Markdown
Contributor

@jeffwashington jeffwashington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 10, 2021

Codecov Report

Merging #21757 (88a73da) into master (0224a8b) will decrease coverage by 0.1%.
The diff coverage is 77.2%.

@@            Coverage Diff            @@
##           master   #21757     +/-   ##
=========================================
- Coverage    81.6%    81.4%   -0.2%     
=========================================
  Files         511      511             
  Lines      143320   143535    +215     
=========================================
- Hits       116976   116901     -75     
- Misses      26344    26634    +290     

@brooksprumo brooksprumo merged commit ec7e177 into solana-labs:master Dec 10, 2021
@brooksprumo brooksprumo deleted the generate-index-accounts-data-len branch December 10, 2021 19:28
mergify Bot pushed a commit that referenced this pull request Jan 4, 2022
mergify Bot added a commit that referenced this pull request Jan 4, 2022
(cherry picked from commit ec7e177)

Co-authored-by: Brooks Prumo <brooks@solana.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants