solana-runtime: add ReadOptimizedDashMap#8314
Conversation
| @@ -0,0 +1,199 @@ | |||
| #![allow(dead_code)] | |||
There was a problem hiding this comment.
this will go away in the next PR
7044a2a to
e95b51f
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #8314 +/- ##
========================================
Coverage 83.2% 83.2%
========================================
Files 836 837 +1
Lines 367890 368003 +113
========================================
+ Hits 306231 306339 +108
- Misses 61659 61664 +5 🚀 New features to boost your workflow:
|
bw-solana
left a comment
There was a problem hiding this comment.
LGTM, but I'll let Starry take a look.
Left one potential suggestion (don't trust me - I might be missing something)
| Self { inner } | ||
| } | ||
|
|
||
| /// Alternative to entry(k).or_insert_with(default) that returns an Arc<V> instead of returning a |
There was a problem hiding this comment.
technically returns an ROValue<V>, but I understand this is just Arc++
apfitzge
left a comment
There was a problem hiding this comment.
Thank you for breaking up the big PR, this seems much more managable to review. I had some sugggestions and questions
| /// This type is a wrapper around Arc that allows checking whether there are | ||
| /// other strong references to the inner value. | ||
| #[derive(Debug, Default)] | ||
| pub struct ROValue<V> { |
There was a problem hiding this comment.
I'm not sure I get the purpose of this wrapper. Am I missing something? Usually with these wrappers it'll hide or prevent outside mutation or something, but this gives pub access to &inner anyway?
There was a problem hiding this comment.
And this is exactly why I think that splitting PRs is a bad idea 😋
This wrapper is only needed to split PRs, because in a later PR I'm going to make ROValue always use std::sync::Arc even when shuttle is in use, otherwise you get a deadlock in shuttle tests if you yield to the shuttle scheduler while holding a shard lock (this breaks the dashmap assumption that code that holds a shard lock can't re-enter itself).
Then in an even later PR, I'm going to introduce ShuttleMap, which uses shuttle for the shard RwLocks and those can reenter the shuttle scheduler by design.
So yeah, this is basically churn 😋
EDIT:
oh, and the reason for exposing inner() - and another reason for not splitting PRs because you don't see where code is used - is that otherwise I need to leak this whole monstrosity all the way to the accounts-db/snapshot generator.
| pub unsafe fn retain(&self, f: impl FnMut(&K, &mut ROValue<V>) -> bool) { | ||
| self.inner.retain(f) | ||
| } |
There was a problem hiding this comment.
Maybe a dumb question. Could we not make this safe wrt concurrent modification if we did something similar to remove_if_not_accessed_and?
DashMap::retain grabs write locks on each shard, so if we just did something like:
pub unsafe fn retain(&self, mut f: impl FnMut(&K, &mut ROValue<V>) -> bool) {
self.inner.retain(move |k, v| !v.shared() && f(k, v))
}Obviously this could be done in the passed f, but in this way it is forced.
And actually I think if we do this...then nothing is unsafe anymore? It'd become impossible to mutate values that are concurrently dropped. Since if this goes through, the only way to drop is if no shared references are out. not sure this holds with Weak...if not then could potentially just make ROValue not give access directly to arc, that way being weak is impossible!
If accepting the doc I reccomended for iter we should do that here too.
There was a problem hiding this comment.
Yes this is effectively what the caller code does. The reason retain itself doesn't do it is that in slot_deltas.retain() you actually want to remove even if accessed, because that happens effectively all the time when snapshots are getting generated, and it's safe because snapshot generation doesn't mutate anything.
But now I'm thinking I could split this into UnsafeReadOptimizedDashMap and ReadOptimizedDashMap and use the former for slot_deltas and the latter for StatusCache::cache.
There was a problem hiding this comment.
I ended up leaving it ReadOptimizedDashMap, making everything safe, and just leaving retain as unsafe
There was a problem hiding this comment.
If accepting the doc I reccomended for iter we should do that here too.
does not apply here either
| let removed = map.remove_if_not_accessed(&1).unwrap(); | ||
| assert!(removed.is_none()); | ||
| } | ||
| } |
There was a problem hiding this comment.
may be good to have shuttle test for the drop protection of values when shared. particularly if we rely on that for safety!
There was a problem hiding this comment.
yes, this is tested in the next PR in StatusCache itself and I was lazy to add the same test here, but I'll stop being lazy I guess 😋
There was a problem hiding this comment.
I added tests for things that can run concurrently.
I didn't add a shuttle test for remove_if_not_accessed because the regular tests already test what we need to test: given an outstanding ref, a key is not removed. A shuttle test doesn't make sense for that since the point of shuttle would be scheduling so that at least some of the time there are no outstanding refs (if drop happens before remove).
Not used yet, follow up PRs will plug it into the status cache.
add retain_if_accessed_or, make everything else safe
e95b51f to
fa8ca94
Compare
Another bit extracted from #3796.
Not used yet, follow up PRs will plug it into the status cache.
This is a wrapper around DashMap that minimizes the time shard locks are held.