Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

accounts-db: Benchmark cache evictions #4045

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

vadorovsky
Copy link
Member

The already existing concurrent_{read,scan}_write benchmarks are not sufficient for benchmarking the eviction and evaluating what kind of eviction policy performs the best, because they don't fill up the cache, so eviction never happens.

Add a new benchmark, which starts measuring the concurrent reads and writes on a full cache.

Copy link

@alessandrod alessandrod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Left some comments

@@ -137,7 +137,7 @@ impl ReadOnlyAccountsCache {
}
}

pub(crate) fn load(&self, pubkey: Pubkey, slot: Slot) -> Option<AccountSharedData> {
pub fn load(&self, pubkey: Pubkey, slot: Slot) -> Option<AccountSharedData> {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: here and in store, make these public only under the dev-context-only-utils
feature. When reading code there's value in knowing whether something is
actually used by the validator or if it's being exported for tests/benches.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honest dumb question - how?

If this was a regular function, I could simply do something like:

#[cfg(feature = "dev-context-only-utils")]
pub use load;

But for the methods, the only way I can imagine is:

impl ReadOnlyAccountsCache {
    #[cfg(feature = "dev-context-only-utils")]
    pub fn load(&self, pubkey: Pubkey, slot: Slot) -> Option<AccountSharedData> {
        self._load(pubkey,slot)
    }

    #[cfg(not(feature = "dev-context-only-utils"))]
    fn load(&self, pubkey: Pubkey, slot: Slot) -> Option<AccountSharedData> {
        self._load(pubkey, slot)
    }

    fn _load(&self, pubkey: Pubkey, slot: Slot) -> Option<AccountSharedData> {
        // do stuff
    }
}

Is there any other less dumb trick I'm missing?

@@ -0,0 +1,208 @@
#![feature(test)]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: some benches have the bench_ prefix in the filename, some don't. I'd drop it.

));

// Prepare accounts for the cache fillup.
let pubkeys: Vec<_> = std::iter::repeat_with(solana_sdk::pubkey::new_rand)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add use imports for solana_sdk and std at the top, don't use full paths


/// Benchmarks the read-only cache eviction mechanism. It does so by performing
/// multithreaded reads and writes on a full cache. Each write triggers
/// eviction. Background reads add more contention.
Copy link

@alessandrod alessandrod Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great bench, but it tests the worst case right? Do we have benches for the cases when we have no evictions, and also for when we have a "normal" amount of evictions? Normal being X every 500ms or so? Something that approximates the current eviction rate on mnb.

We want to test the worst case, and we want to improve perf with higher throughput than current mnb, but we also want to ensure we don't accidentally regress with current load.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great bench, but it tests the worst case right? Do we have benches for the cases when we have no evictions, and also for when we have a "normal" amount of evictions? Normal being X every 500ms or so? Something that approximates the current eviction rate on mnb.

No, we don't. This is the only bench which triggers any eviction at all. I will add a case closer to the current mnb numbers.

Copy link
Member Author

@vadorovsky vadorovsky Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the "no eviction" case is handled here -

fn store_accounts_with_possible_contention<F>(bench_name: &str, bencher: &mut Bencher, reader_f: F)
where
F: Fn(&Accounts, &[Pubkey]) + Send + Copy + 'static,
{
let num_readers = 5;
let accounts_db = new_accounts_db(vec![PathBuf::from(
std::env::var("FARF_DIR").unwrap_or_else(|_| "farf".to_string()),
)
.join(bench_name)]);
let accounts = Arc::new(Accounts::new(Arc::new(accounts_db)));
let num_keys = 1000;
let slot = 0;
let pubkeys: Vec<_> = std::iter::repeat_with(solana_sdk::pubkey::new_rand)
.take(num_keys)
.collect();
let accounts_data: Vec<_> = std::iter::repeat(
Account {
lamports: 1,
..Default::default()
}
.to_account_shared_data(),
)
.take(num_keys)
.collect();
let storable_accounts: Vec<_> = pubkeys.iter().zip(accounts_data.iter()).collect();
accounts.store_accounts_cached((slot, storable_accounts.as_slice()));
accounts.add_root(slot);
accounts
.accounts_db
.flush_accounts_cache_slot_for_tests(slot);
let pubkeys = Arc::new(pubkeys);
for i in 0..num_readers {
let accounts = accounts.clone();
let pubkeys = pubkeys.clone();
Builder::new()
.name(format!("reader{i:02}"))
.spawn(move || {
reader_f(&accounts, &pubkeys);
})
.unwrap();
}
let num_new_keys = 1000;
bencher.iter(|| {
let new_pubkeys: Vec<_> = std::iter::repeat_with(solana_sdk::pubkey::new_rand)
.take(num_new_keys)
.collect();
let new_storable_accounts: Vec<_> = new_pubkeys.iter().zip(accounts_data.iter()).collect();
// Write to a different slot than the one being read from. Because
// there's a new account pubkey being written to every time, will
// compete for the accounts index lock on every store
accounts.store_accounts_cached((slot + 1, new_storable_accounts.as_slice()));
});
}

But I'm not a big fan of it, because:

  • It does iter().collect() inside the bench.
  • It adds only 1000 accounts.
  • It doesn't use criterion. 😛

So I think it makes sense to bench that case in this file.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have opinions here, up to you and accounts-db peeps to decide whether to improve this bench or consolidate with the new ones you're writing!

Account {
lamports: 1,
// 1 MiB
data: vec![1; 1024 * 1024],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want to do these allocations outside the bench, otherwise this is
what takes the most time

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do they have to be 1MB? This is a contention bug right? The size of
accounts shouldn't matter

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point with the allocation, will move outside the bench.

Regarding the size - the size itself doesn't matter, but a bigger size triggers eviction faster. The default eviction threshold is 410 * 1024 * 1024 (410 MiB). CACHE_ENTRY_SIZE is 144. If you're benching without account data, you need 2985529 accounts to fill up the cache. This is the reason why the currently existing bench isn't anywhere close to filling up the cache - it creates 1000 accounts without data.

I'm open to picking other size which is closer to the average on mnb, I'll think of it today.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, the average size calculated in the most silly way would be:

>>> 265633981325/744678683
356.7095277319762

356 bytes, lol. That makes sense, given that the most of accounts on mnb are token accounts. In that case, I'm not sure if we should go by some size like 256 or 512 bytes, or should we just skip the size all together and bench with >2985529 empty accounts. Leaning towards the latter.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also bench the 0 data case, as that gives us the worst case in terms of [number of things we store]. So I'd say 0 and something reasonable, 512b or 1K.

let cache = cache.clone();
let new_pubkeys = new_pubkeys[i].clone();

Builder::new()
Copy link

@alessandrod alessandrod Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should pre-create the writers outside the bench, and make them sleep
on a https://doc.rust-lang.org/stable/std/sync/struct.Barrier.html

Then in the bench you should trigger the barrier, then wait for all the writers to terminate. Otherwise
you're adding thread creation time to your bench, which will get slower with the
more threads you add

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually don't think you can use the barrier, because iter() will want to repeat this many times. You'll need to get a little creative :P

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah no nevermind, barriers can be used more than once, nice

@vadorovsky vadorovsky force-pushed the benchmark-cache-evictions branch from 8206e16 to 2766162 Compare December 11, 2024 15:09
The already existing `concurrent_{read,scan}_write` benchmarks are not
sufficient for benchmarking the eviction and evaluating what kind of
eviction policy performs the best, because they don't fill up the cache,
so eviction never happens.

Add a new benchmark, which starts measuring the concurrent reads and
writes on a full cache.
@vadorovsky vadorovsky force-pushed the benchmark-cache-evictions branch from 2766162 to 42ca375 Compare December 11, 2024 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants