Skip to content

solana-runtime: add ReadOptimizedDashMap#8314

Merged
alessandrod merged 3 commits intoanza-xyz:masterfrom
alessandrod:status-ro-map
Oct 6, 2025
Merged

solana-runtime: add ReadOptimizedDashMap#8314
alessandrod merged 3 commits intoanza-xyz:masterfrom
alessandrod:status-ro-map

Conversation

@alessandrod
Copy link
Copy Markdown

@alessandrod alessandrod commented Oct 2, 2025

Another bit extracted from #3796.

Not used yet, follow up PRs will plug it into the status cache.

This is a wrapper around DashMap that minimizes the time shard locks are held.

@alessandrod alessandrod requested a review from jstarry October 2, 2025 07:16
@@ -0,0 +1,199 @@
#![allow(dead_code)]
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will go away in the next PR

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Oct 2, 2025

Codecov Report

❌ Patch coverage is 88.49558% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.2%. Comparing base (7b4f131) to head (1d7a6a6).

Additional details and impacted files
@@           Coverage Diff            @@
##           master    #8314    +/-   ##
========================================
  Coverage    83.2%    83.2%            
========================================
  Files         836      837     +1     
  Lines      367890   368003   +113     
========================================
+ Hits       306231   306339   +108     
- Misses      61659    61664     +5     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@bw-solana bw-solana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I'll let Starry take a look.

Left one potential suggestion (don't trust me - I might be missing something)

Comment thread runtime/src/read_optimized_dashmap.rs Outdated
Self { inner }
}

/// Alternative to entry(k).or_insert_with(default) that returns an Arc<V> instead of returning a
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically returns an ROValue<V>, but I understand this is just Arc++

Comment thread runtime/src/read_optimized_dashmap.rs
@apfitzge apfitzge self-requested a review October 3, 2025 18:51
Copy link
Copy Markdown

@apfitzge apfitzge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for breaking up the big PR, this seems much more managable to review. I had some sugggestions and questions

/// This type is a wrapper around Arc that allows checking whether there are
/// other strong references to the inner value.
#[derive(Debug, Default)]
pub struct ROValue<V> {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I get the purpose of this wrapper. Am I missing something? Usually with these wrappers it'll hide or prevent outside mutation or something, but this gives pub access to &inner anyway?

Copy link
Copy Markdown
Author

@alessandrod alessandrod Oct 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this is exactly why I think that splitting PRs is a bad idea 😋

This wrapper is only needed to split PRs, because in a later PR I'm going to make ROValue always use std::sync::Arc even when shuttle is in use, otherwise you get a deadlock in shuttle tests if you yield to the shuttle scheduler while holding a shard lock (this breaks the dashmap assumption that code that holds a shard lock can't re-enter itself).

Then in an even later PR, I'm going to introduce ShuttleMap, which uses shuttle for the shard RwLocks and those can reenter the shuttle scheduler by design.

So yeah, this is basically churn 😋

EDIT:

oh, and the reason for exposing inner() - and another reason for not splitting PRs because you don't see where code is used - is that otherwise I need to leak this whole monstrosity all the way to the accounts-db/snapshot generator.

Comment thread runtime/src/read_optimized_dashmap.rs Outdated
Comment thread runtime/src/read_optimized_dashmap.rs
Comment on lines +106 to +108
pub unsafe fn retain(&self, f: impl FnMut(&K, &mut ROValue<V>) -> bool) {
self.inner.retain(f)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a dumb question. Could we not make this safe wrt concurrent modification if we did something similar to remove_if_not_accessed_and?

DashMap::retain grabs write locks on each shard, so if we just did something like:

    pub unsafe fn retain(&self, mut f: impl FnMut(&K, &mut ROValue<V>) -> bool) {
        self.inner.retain(move |k, v| !v.shared() && f(k, v))
    }

Obviously this could be done in the passed f, but in this way it is forced.
And actually I think if we do this...then nothing is unsafe anymore? It'd become impossible to mutate values that are concurrently dropped. Since if this goes through, the only way to drop is if no shared references are out. not sure this holds with Weak...if not then could potentially just make ROValue not give access directly to arc, that way being weak is impossible!


If accepting the doc I reccomended for iter we should do that here too.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is effectively what the caller code does. The reason retain itself doesn't do it is that in slot_deltas.retain() you actually want to remove even if accessed, because that happens effectively all the time when snapshots are getting generated, and it's safe because snapshot generation doesn't mutate anything.

But now I'm thinking I could split this into UnsafeReadOptimizedDashMap and ReadOptimizedDashMap and use the former for slot_deltas and the latter for StatusCache::cache.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up leaving it ReadOptimizedDashMap, making everything safe, and just leaving retain as unsafe

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If accepting the doc I reccomended for iter we should do that here too.

does not apply here either

let removed = map.remove_if_not_accessed(&1).unwrap();
assert!(removed.is_none());
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be good to have shuttle test for the drop protection of values when shared. particularly if we rely on that for safety!

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is tested in the next PR in StatusCache itself and I was lazy to add the same test here, but I'll stop being lazy I guess 😋

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added tests for things that can run concurrently.

I didn't add a shuttle test for remove_if_not_accessed because the regular tests already test what we need to test: given an outstanding ref, a key is not removed. A shuttle test doesn't make sense for that since the point of shuttle would be scheduling so that at least some of the time there are no outstanding refs (if drop happens before remove).

Not used yet, follow up PRs will plug it into the status cache.
add retain_if_accessed_or, make everything else safe
@alessandrod alessandrod merged commit 1baf20e into anza-xyz:master Oct 6, 2025
43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants