solana-runtime: add ReadOptimizedDashMap by alessandrod · Pull Request #8314 · anza-xyz/agave

alessandrod · 2025-10-02T07:16:23Z

Another bit extracted from #3796.

Not used yet, follow up PRs will plug it into the status cache.

This is a wrapper around DashMap that minimizes the time shard locks are held.

alessandrod · 2025-10-02T07:16:52Z

@@ -0,0 +1,199 @@
+#![allow(dead_code)]


this will go away in the next PR

codecov-commenter · 2025-10-02T08:21:20Z

Codecov Report

❌ Patch coverage is 88.49558% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.2%. Comparing base (7b4f131) to head (1d7a6a6).

Additional details and impacted files

@@           Coverage Diff            @@
##           master    #8314    +/-   ##
========================================
  Coverage    83.2%    83.2%            
========================================
  Files         836      837     +1     
  Lines      367890   368003   +113     
========================================
+ Hits       306231   306339   +108     
- Misses      61659    61664     +5

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

bw-solana

LGTM, but I'll let Starry take a look.

Left one potential suggestion (don't trust me - I might be missing something)

bw-solana · 2025-10-02T15:29:25Z

+        Self { inner }
+    }
+
+    /// Alternative to entry(k).or_insert_with(default) that returns an Arc<V> instead of returning a


technically returns an ROValue<V>, but I understand this is just Arc++

apfitzge

Thank you for breaking up the big PR, this seems much more managable to review. I had some sugggestions and questions

apfitzge · 2025-10-03T18:57:22Z

+/// This type is a wrapper around Arc that allows checking whether there are
+/// other strong references to the inner value.
+#[derive(Debug, Default)]
+pub struct ROValue<V> {


I'm not sure I get the purpose of this wrapper. Am I missing something? Usually with these wrappers it'll hide or prevent outside mutation or something, but this gives pub access to &inner anyway?

And this is exactly why I think that splitting PRs is a bad idea 😋

This wrapper is only needed to split PRs, because in a later PR I'm going to make ROValue always use std::sync::Arc even when shuttle is in use, otherwise you get a deadlock in shuttle tests if you yield to the shuttle scheduler while holding a shard lock (this breaks the dashmap assumption that code that holds a shard lock can't re-enter itself).

Then in an even later PR, I'm going to introduce ShuttleMap, which uses shuttle for the shard RwLocks and those can reenter the shuttle scheduler by design.

So yeah, this is basically churn 😋

EDIT:

oh, and the reason for exposing inner() - and another reason for not splitting PRs because you don't see where code is used - is that otherwise I need to leak this whole monstrosity all the way to the accounts-db/snapshot generator.

apfitzge · 2025-10-03T19:24:08Z

+    pub unsafe fn retain(&self, f: impl FnMut(&K, &mut ROValue<V>) -> bool) {
+        self.inner.retain(f)
+    }


Maybe a dumb question. Could we not make this safe wrt concurrent modification if we did something similar to remove_if_not_accessed_and?

DashMap::retain grabs write locks on each shard, so if we just did something like:

pub unsafe fn retain(&self, mut f: impl FnMut(&K, &mut ROValue<V>) -> bool) { self.inner.retain(move |k, v| !v.shared() && f(k, v)) }

Obviously this could be done in the passed f, but in this way it is forced.
And actually I think if we do this...then nothing is unsafe anymore? It'd become impossible to mutate values that are concurrently dropped. Since if this goes through, the only way to drop is if no shared references are out. not sure this holds with Weak...if not then could potentially just make ROValue not give access directly to arc, that way being weak is impossible!

If accepting the doc I reccomended for iter we should do that here too.

Yes this is effectively what the caller code does. The reason retain itself doesn't do it is that in slot_deltas.retain() you actually want to remove even if accessed, because that happens effectively all the time when snapshots are getting generated, and it's safe because snapshot generation doesn't mutate anything.

But now I'm thinking I could split this into UnsafeReadOptimizedDashMap and ReadOptimizedDashMap and use the former for slot_deltas and the latter for StatusCache::cache.

I ended up leaving it ReadOptimizedDashMap, making everything safe, and just leaving retain as unsafe

If accepting the doc I reccomended for iter we should do that here too.

does not apply here either

apfitzge · 2025-10-03T19:26:15Z

+        let removed = map.remove_if_not_accessed(&1).unwrap();
+        assert!(removed.is_none());
+    }
+}


may be good to have shuttle test for the drop protection of values when shared. particularly if we rely on that for safety!

yes, this is tested in the next PR in StatusCache itself and I was lazy to add the same test here, but I'll stop being lazy I guess 😋

I added tests for things that can run concurrently.

I didn't add a shuttle test for remove_if_not_accessed because the regular tests already test what we need to test: given an outstanding ref, a key is not removed. A shuttle test doesn't make sense for that since the point of shuttle would be scheduling so that at least some of the time there are no outstanding refs (if drop happens before remove).

Not used yet, follow up PRs will plug it into the status cache.

add retain_if_accessed_or, make everything else safe

alessandrod requested a review from jstarry October 2, 2025 07:16

alessandrod commented Oct 2, 2025

View reviewed changes

alessandrod requested a review from bw-solana October 2, 2025 07:17

alessandrod force-pushed the status-ro-map branch from 7044a2a to e95b51f Compare October 2, 2025 07:23

bw-solana reviewed Oct 2, 2025

View reviewed changes

apfitzge self-requested a review October 3, 2025 18:51

apfitzge reviewed Oct 3, 2025

View reviewed changes

alessandrod added 2 commits October 5, 2025 05:23

solana-runtime: add ReadOptimizedDashMap

8e72afd

Not used yet, follow up PRs will plug it into the status cache.

make everything but retain safe

fa8ca94

add retain_if_accessed_or, make everything else safe

alessandrod force-pushed the status-ro-map branch from e95b51f to fa8ca94 Compare October 5, 2025 05:24

add shuttle tests

1d7a6a6

apfitzge approved these changes Oct 6, 2025

View reviewed changes

alessandrod merged commit 1baf20e into anza-xyz:master Oct 6, 2025
43 checks passed

Conversation

alessandrod commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

bw-solana left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

apfitzge left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alessandrod Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

alessandrod commented Oct 2, 2025 •

edited

Loading

codecov-commenter commented Oct 2, 2025 •

edited

Loading

alessandrod Oct 4, 2025 •

edited

Loading