Faster status cache#3796
Conversation
571fca8 to
2db0759
Compare
d41386e to
fc67d2f
Compare
2ff1d2b to
d9fdc54
Compare
70974b7 to
4661809
Compare
|
I strongly think you should consider other options and yield on this approach. Finding others to review and merge this code feels subversive to me. If others are ok with the complexity and comfortable with the idea of needing to debug this code in the future, then feel free to move forward, though. I think the goal here is to reduce the time spent on lock contention and moving to ahash / removing results both help in that regard. |
why would it be subversive? We don’t have to agree on everything.
That is not the goal. The goal is to remove the contention not reduce it. I’m fully in favor of doing ahash, I didn’t do it in this PR because it’s orthogonal. Same with the result. |
|
Also it’s worth noting that frankendancer has been running this on mnb for months, I’ve ran it for months on my node, firedancer uses a similar approach. I acknowledge that the code is more complex and happy to discuss concerns with it as I’ve been doing so far, but it’s not like I’m being unreasonable and trying to be subversive 😅 |
Hmm ok, maybe to rephrase.. the goal can be to remove contention entirely but any progress towards reducing is also productive. If we get to a point where incrementally reducing contention further (ie to zero) results in a lot of complexity and maintenance, then we should be mindful of the tradeoffs.
Now I'm confused. You did apply Removing result reduces contention as well because it's one less hash (of the signature) under write lock. I think it's better to do such a change before making the status cache more complex, not after. |
I really shouldn't be on github from my phone 😅 When I was young and naive and started this change I had everything nice and split in a bunch of commits, including ahash before the rwlock removal. Then as I rebased the PR 600 times I got tired of having to resolve conflicts and ended up squashing all together. I'm happy to take ahash out and land it before this change tho, I'll do that next. |
| K: Eq + Hash, | ||
| S: BuildHasher, | ||
| { | ||
| shards: Vec<RwLock<HashMap<K, V, S>>>, |
There was a problem hiding this comment.
Looking at the internals of DashMap (what the previous implementation used) which has:
shards: Box<[CachePadded<RwLock<HashMap<K, V>>>]>,Would we benefit from cache-padding the inner write-locks?
I don't think Box vs Vec will change much, just type guarnatees we don't re-size at runtime...but meh, that's obvious from the code imo.
There was a problem hiding this comment.
this is only used in tests to provide shuttle with context switch points, we don't really use this in production, so I didn't spend any time making the code good.
Basically shuttle intercepts anything that may put a thread to sleep or cause thread synchronization, and explicitly handles the state transition according to a given scheduler (right now I'm using the random scheduler in the tests).
This code is needed to simulate a bunch of interleaving sequences in our code and see if anything breaks.
| @@ -1,118 +1,195 @@ | |||
| // Use shuttle's synchronization primitives when cfg(feature = "shuttle-test") is set, otherwise use std. | |||
There was a problem hiding this comment.
can we put these imports down with the other conditional imports?
it's easier to see what is changing based on shuttle-test flag if they are close to eachother.
There was a problem hiding this comment.
lmao for some reaon rustfmt put it there if I didn't add strategic newlines
There was a problem hiding this comment.
nope, looks like rustfmt wants it at the top or CI fails
| // Safety: | ||
| // - we explicitly check that the blockhash isn't referenced by other threads. | ||
| // Checking Arc::strong_count() is safe because retain() holds a write lock on | ||
| // the shard and get_or_insert_with() calls Arc::clone() holding a read lock. |
There was a problem hiding this comment.
🤓 there's an edge case we might mention here about Weak.
If another thread downgraded to Weak, it's possible that the strong_count below returns 1, they upgrade, then it gets removed.
We're not fucking around with Weak but might add a comment here that we do not do that.
Sometimes I wish there was a strong-only Arc
172b77c to
339d8ec
Compare
|
Some notes I took while re-re-re-viewing the code clean_slot_entries(slot):
So effectively in clean_slot_entries(), the only thing to be mindful of is root_slot_deltas():
I don't think that there are any race concerns at all here. add_root()/purge_roots()
Since MAX_PROCESSING_AGE = MAX_RECENT_BLOCKHASHES / 2, by the time we Additionally nonces have the nonce account data check in place that avoids replay, so I think insert(blockhash, tx, slot)
So effectively, what can happen concurrently:
Because of the serialization forced by account locks, check_transactions(tx) and insert(tx) can |
Yeah, I think you described this race condition correctly and the new method |
|
This pull request is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days. |
|
This pull request was closed because it has been stale for 7 days with no activity. |
Remove the global RwLock around the status cache, and introduce more granular RwLocks per-blockhash and per-slot. Additionally, change the internal hash tables from std HashMap to Dashmap, so that operations at the blockhash and slot level can be done only holding read locks.
339d8ec to
c91fc41
Compare
There was a problem hiding this comment.
Pull request overview
This PR refactors the runtime status cache to reduce contention by removing the global RwLock and introducing concurrent/sharded data structures (DashMap + per-entry Arc-based access), aiming to speed up transaction status checks and updates.
Changes:
- Replaced the status cache’s global lock +
HashMaplayout withReadOptimizedDashMap/DashMap/DashSetand per-blockhash/per-slot granularity. - Updated snapshot (de)serialization paths and bank call sites to use the new non-
RwLockstatus cache API. - Introduced a concurrent append-only vector (
ConcurrentVecviaboxcar) and addedboxcar/DashMap serde feature to dependencies.
Reviewed changes
Copilot reviewed 11 out of 13 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| runtime/src/status_cache.rs | Core refactor: concurrent cache + slot delta structures, new insertion/clearing/purging logic, new ConcurrentVec. |
| runtime/src/bank.rs | Bank now holds Arc<BankStatusCache> directly (no RwLock); updated status-cache call sites. |
| runtime/src/bank/check_transactions.rs | Uses the new status cache API directly (no read lock). |
| runtime/src/snapshot_controller.rs | Uses new status cache API when preparing snapshot requests. |
| runtime/src/snapshot_bank_utils.rs | Uses new status cache API for snapshot restore/create (some call sites still need updating). |
| runtime/src/serde_snapshot/status_cache.rs | Adjusted snapshot serialization/deserialization for new Status representation. |
| runtime/src/read_optimized_dashmap.rs | Tweaks to wrapper; adds a test-only helper around retain behavior. |
| core/tests/snapshots.rs | Updated one status cache usage to the new API (other call sites still need updating). |
| runtime/benches/status_cache.rs | Bench updates for new API surface (no lock guard). |
| runtime/Cargo.toml | Adds boxcar dependency to runtime crate. |
| Cargo.toml | Adds boxcar to workspace deps; enables DashMap serde feature. |
| Cargo.lock | Lockfile update for new dependency. |
| programs/sbf/Cargo.lock | Lockfile update for new dependency in SBF workspace. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| .status_cache | ||
| .read() | ||
| .unwrap() | ||
| .roots() | ||
| .iter() | ||
| .cloned() | ||
| .sorted(); |
There was a problem hiding this comment.
There is still a bank.status_cache.read().unwrap().root_slot_deltas() usage later in this file (in the fastboot snapshot test). With Bank::status_cache no longer behind an RwLock, this remaining call site should be updated to bank.status_cache.root_slot_deltas() to keep the test compiling.
| // #[test] | ||
| // fn test_shuttle_purge_nonce_overlap_dfs() { | ||
| // shuttle::check_dfs(do_test_shuttle_purge_nonce_overlap, PURGE_DFS_ITERATIONS); | ||
| // } |
There was a problem hiding this comment.
The DFS variant of the purge nonce overlap shuttle test has been commented out. Since this PR changes status cache concurrency behavior, disabling this stress test reduces coverage for race conditions around add_root()/purging. Consider keeping it enabled (or gating it behind an opt-in flag) and tracking the shuttle runtime crash separately.
| // #[test] | |
| // fn test_shuttle_purge_nonce_overlap_dfs() { | |
| // shuttle::check_dfs(do_test_shuttle_purge_nonce_overlap, PURGE_DFS_ITERATIONS); | |
| // } | |
| #[cfg(feature = "shuttle-dfs-tests")] | |
| #[test] | |
| fn test_shuttle_purge_nonce_overlap_dfs() { | |
| shuttle::check_dfs(do_test_shuttle_purge_nonce_overlap, PURGE_DFS_ITERATIONS); | |
| } |
| // Safety: we just initialized the whole key_slice above | ||
| let mut forks = txs.entry(unsafe { key_slice.assume_init() }).or_default(); | ||
| forks.push((slot, res.clone())); | ||
|
|
There was a problem hiding this comment.
key_slice.assume_init() is called here and then called again later when passing key_slice into add_to_slot_delta(). assume_init() moves the [u8; CACHED_KEY_SIZE] out of the MaybeUninit, so using it twice is undefined behavior (second call reads moved/uninitialized memory). Fix by materializing the initialized array exactly once (e.g., assign to a local let key_slice = ...;) and then reuse/copy it for both insertions.
| // FIXME: this can happen with RPC doing get_status or whatever | ||
| panic!( | ||
| "slot {slot} is being cleared while another thread is inserting new entries" | ||
| ); |
There was a problem hiding this comment.
Panicking on remove_if_not_accessed() failure will crash the node whenever another thread is holding an Arc to this slot's delta map (this includes readers, not just inserters). Since concurrent access is possible in production (e.g., snapshot/RPC readers), this should be handled gracefully (retry, skip, or defer clearing) instead of panic!.
| // FIXME: this can happen with RPC doing get_status or whatever | |
| panic!( | |
| "slot {slot} is being cleared while another thread is inserting new entries" | |
| ); | |
| // FIXME: this can happen with RPC doing get_status or whatever. Another thread may | |
| // be holding an `Arc` to this slot's delta map (including read-only access). | |
| // In that case, we skip clearing this slot for now to avoid crashing the node. | |
| #[cfg(test)] | |
| panic!( | |
| "slot {slot} is being cleared while another thread is inserting or accessing entries" | |
| ); | |
| #[cfg(not(test))] | |
| return; |
| verify_slot_deltas(slot_deltas.as_slice(), &bank)?; | ||
|
|
||
| bank.status_cache.write().unwrap().append(&slot_deltas); | ||
| bank.status_cache.append(&slot_deltas); |
There was a problem hiding this comment.
This module still contains at least one bank.status_cache.read().unwrap() call in the test helper create_bank_snapshot_from_bank() (later in this file). Since Bank::status_cache is now Arc<BankStatusCache> (not an RwLock), that call site needs to be updated to use bank.status_cache.root_slot_deltas() directly to avoid compilation failures.
|
This pull request is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days. |
|
This pull request was closed because it has been stale for 7 days with no activity. |






This PR removes the global RwLock around the status cache, and introduces more granular RwLocks per-blockhash and per-slot. Additionally, it changes the internal hash tables from std HashMap to Dashmap, so that operations at the blockhash and slot level can be done only holding read locks.
This is not the final design of A Performant Status Cache - which, I think, can make check and update go straight to 0 - but it's a good incremental improvement.
Results are pretty good: check_transactions is ~6x faster, and update_transaction_statuses is ~2.5x faster.