Skip to content

Faster status cache#3796

Closed
alessandrod wants to merge 1 commit intoanza-xyz:masterfrom
alessandrod:status-cache
Closed

Faster status cache#3796
alessandrod wants to merge 1 commit intoanza-xyz:masterfrom
alessandrod:status-cache

Conversation

@alessandrod
Copy link
Copy Markdown

@alessandrod alessandrod commented Nov 26, 2024

This PR removes the global RwLock around the status cache, and introduces more granular RwLocks per-blockhash and per-slot. Additionally, it changes the internal hash tables from std HashMap to Dashmap, so that operations at the blockhash and slot level can be done only holding read locks.

This is not the final design of A Performant Status Cache - which, I think, can make check and update go straight to 0 - but it's a good incremental improvement.

Results are pretty good: check_transactions is ~6x faster, and update_transaction_statuses is ~2.5x faster.

Screenshot 2024-11-26 at 10 57 34 pm Screenshot 2024-11-26 at 10 59 04 pm

@alessandrod alessandrod force-pushed the status-cache branch 14 times, most recently from 571fca8 to 2db0759 Compare November 27, 2024 05:52
@alessandrod alessandrod force-pushed the status-cache branch 2 times, most recently from d41386e to fc67d2f Compare December 9, 2024 09:50
@alessandrod alessandrod marked this pull request as ready for review December 9, 2024 09:50
@alessandrod alessandrod force-pushed the status-cache branch 2 times, most recently from 2ff1d2b to d9fdc54 Compare December 9, 2024 09:52
@alessandrod alessandrod changed the title [WIP] faster status cache Faster status cache Dec 9, 2024
@alessandrod alessandrod force-pushed the status-cache branch 7 times, most recently from 70974b7 to 4661809 Compare December 9, 2024 11:22
@alessandrod
Copy link
Copy Markdown
Author

bench-tps against a single node

scheduler before and after

Screenshot 2024-12-10 at 12 33 42 am Screenshot 2024-12-10 at 12 30 24 am

workers

Screenshot 2024-12-10 at 12 30 44 am Screenshot 2024-12-10 at 12 31 54 am Screenshot 2024-12-10 at 12 32 04 am Screenshot 2024-12-10 at 12 32 14 am

@jstarry
Copy link
Copy Markdown

jstarry commented Sep 11, 2025

I strongly think you should consider other options and yield on this approach. Finding others to review and merge this code feels subversive to me. If others are ok with the complexity and comfortable with the idea of needing to debug this code in the future, then feel free to move forward, though. I think the goal here is to reduce the time spent on lock contention and moving to ahash / removing results both help in that regard.

@alessandrod
Copy link
Copy Markdown
Author

alessandrod commented Sep 11, 2025

I strongly think you should consider other options and yield on this approach. Finding others to review and merge this code feels subversive to me.

why would it be subversive? We don’t have to agree on everything.

If others are ok with the complexity and comfortable with the idea of needing to debug this code in the future, then feel free to move forward, though. I think the goal here is to reduce the time spent on lock contention and moving to ahash / removing results both help in that regard.

That is not the goal. The goal is to remove the contention not reduce it. I’m fully in favor of doing ahash, I didn’t do it in this PR because it’s orthogonal. Same with the result.

@alessandrod
Copy link
Copy Markdown
Author

Also it’s worth noting that frankendancer has been running this on mnb for months, I’ve ran it for months on my node, firedancer uses a similar approach.

I acknowledge that the code is more complex and happy to discuss concerns with it as I’ve been doing so far, but it’s not like I’m being unreasonable and trying to be subversive 😅

@jstarry
Copy link
Copy Markdown

jstarry commented Sep 11, 2025

That is not the goal. The goal is to remove the contention not reduce it.

Hmm ok, maybe to rephrase.. the goal can be to remove contention entirely but any progress towards reducing is also productive. If we get to a point where incrementally reducing contention further (ie to zero) results in a lot of complexity and maintenance, then we should be mindful of the tradeoffs.

I’m fully in favor of doing ahash, I didn’t do it in this PR because it’s orthogonal. Same with the result.

Now I'm confused. You did apply ahash in this PR, no?

Removing result reduces contention as well because it's one less hash (of the signature) under write lock. I think it's better to do such a change before making the status cache more complex, not after.

@alessandrod
Copy link
Copy Markdown
Author

Now I'm confused. You did apply ahash in this PR, no?

I really shouldn't be on github from my phone 😅

When I was young and naive and started this change I had everything nice and split in a bunch of commits, including ahash before the rwlock removal. Then as I rebased the PR 600 times I got tired of having to resolve conflicts and ended up squashing all together.

I'm happy to take ahash out and land it before this change tho, I'll do that next.

Comment thread runtime/src/shuttle_map.rs Outdated
K: Eq + Hash,
S: BuildHasher,
{
shards: Vec<RwLock<HashMap<K, V, S>>>,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the internals of DashMap (what the previous implementation used) which has:

    shards: Box<[CachePadded<RwLock<HashMap<K, V>>>]>,

Would we benefit from cache-padding the inner write-locks?

I don't think Box vs Vec will change much, just type guarnatees we don't re-size at runtime...but meh, that's obvious from the code imo.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is only used in tests to provide shuttle with context switch points, we don't really use this in production, so I didn't spend any time making the code good.

Basically shuttle intercepts anything that may put a thread to sleep or cause thread synchronization, and explicitly handles the state transition according to a given scheduler (right now I'm using the random scheduler in the tests).

This code is needed to simulate a bunch of interleaving sequences in our code and see if anything breaks.

Comment thread runtime/src/status_cache.rs Outdated
@@ -1,118 +1,195 @@
// Use shuttle's synchronization primitives when cfg(feature = "shuttle-test") is set, otherwise use std.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we put these imports down with the other conditional imports?

it's easier to see what is changing based on shuttle-test flag if they are close to eachother.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lmao for some reaon rustfmt put it there if I didn't add strategic newlines

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, looks like rustfmt wants it at the top or CI fails

Comment thread runtime/src/shuttle_map.rs Outdated
Comment thread runtime/src/status_cache.rs Outdated
Comment thread runtime/src/status_cache.rs Outdated
Comment thread runtime/src/status_cache.rs Outdated
Comment thread runtime/src/status_cache.rs Outdated
Comment thread runtime/src/status_cache.rs Outdated
Comment on lines +342 to +345
// Safety:
// - we explicitly check that the blockhash isn't referenced by other threads.
// Checking Arc::strong_count() is safe because retain() holds a write lock on
// the shard and get_or_insert_with() calls Arc::clone() holding a read lock.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤓 there's an edge case we might mention here about Weak.

If another thread downgraded to Weak, it's possible that the strong_count below returns 1, they upgrade, then it gets removed.

We're not fucking around with Weak but might add a comment here that we do not do that.

Sometimes I wish there was a strong-only Arc

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@alessandrod alessandrod force-pushed the status-cache branch 3 times, most recently from 172b77c to 339d8ec Compare September 11, 2025 18:37
@alessandrod
Copy link
Copy Markdown
Author

Some notes I took while re-re-re-viewing the code

clean_slot_entries(slot):

  • done in the replay loop from dump_then_repair_correct_slots -> purge_unconfirmed_duplicate_slot

  • dump_then_repair_correct_slots is called in the ReplayStage main loop

  • ReplayStage is single threaded, dump_then_repair_correct_slots is called
    after replay_active_banks; replay_active_banks waits for the replay
    scheduler to stop so by the time it returns there are no unified scheduler
    threads replaying entries

  • never called on a slot for which the current validator is leader according
    to leader schedule => can never race with banking

  • never cleans a rooted slot (by definition if it's rooted won't be dumped)

  • it can work concurrently with RPC... but I believe that interactions with
    RPC are already broken when we dump slots as per ReplayStage comment:

    // TODO: What about RPC queries that had already cloned the Bank for this slot
    // and are looking up the signature for this slot?"
    root_bank.clear_slot_signatures(slot);

So effectively in clean_slot_entries(), the only thing to be mindful of is
that banking might be working with some of the blockhashes in
status_cache.cache that might be referenced by the txs we're removing from
the dumped slot. We use self.cache.remove_if_not_accessed_and(...is_empty()) to remove
a blockhash, which means that if banking has added (not empty) or is adding
(is accessed) txs we won't remove the blockhash.

root_slot_deltas():

  • called from replay handle_votable_bank -> check_and_handle_new_root -> ... -> set_bank_forks_root
  • this too can't race with any other entry replay threads for the same reason as clean_slot_entries
  • it has exclusive access to status_cache.slot_delta[the roots it accesses] since roots can't be dumped
  • it does an Arc::clone of the contents (dashmap) of the roots from
    status_cache.slot_delta[the roots] and gives it to the snapshot thread
  • while the snapshot thread does its thing, purge_root() can remove old roots from
    status_cache.slot_delta, but that's ok, it only removes the values it doesn't change the content of
    the values at all (which is what the snapshot thread works with)

I don't think that there are any race concerns at all here.

add_root()/purge_roots()

  • called by replay like 3 lines after root_slot_deltas()
  • needs to be careful not to remove blockhashes touched by replay/banking.

Since MAX_PROCESSING_AGE = MAX_RECENT_BLOCKHASHES / 2, by the time we
purge a root all the blockhashes referenced have expired, except nonces. We
must make sure we don't remove nonces that have been used past the root
we're purging, and we do so with
key_in_use || max_slot.load(Ordering::Relaxed) > min
where min is the root slot we're purging and max_slot is the last slot a
blockhash was referenced in.

Additionally nonces have the nonce account data check in place that avoids replay, so I think
here we're all good.

insert(blockhash, tx, slot)

  • called by the entries replay threads and by banking
  • can't run concurrently with clear_slot_entries(), see above
  • can't touch the same status_cache.slot_delta[slot] as purge_roots()
    because roots can't be inserted into
  • does status_cache.slot_delta.get_or_insert(slot).get_or_insert(blockhash).push(tx) (boxcar vec)
  • does status_cache.cache.get_or_insert(blockhash).txs.entry(tx).forks.push(slot) (regular vec)

So effectively, what can happen concurrently:

  • purge_roots() removing blockhashes that replay/banking are inserting under
    (these can only be nonces, see above)
  • replay/banking inserting concurrently under the same blockhashes (very common)
  • check_transactions() called by scheduler/banking workers/replay threads
    while insert() is in progress (also common)

Because of the serialization forced by account locks, check_transactions(tx) and insert(tx) can
never be called concurrently when it matters (load_and_execute_transactions() in replay or
process_and_record_transactions() in the banking workers).

@jstarry
Copy link
Copy Markdown

jstarry commented Sep 15, 2025

So effectively in clean_slot_entries(), the only thing to be mindful of is
that banking might be working with some of the blockhashes in
status_cache.cache that might be referenced by the txs we're removing from
the dumped slot. We use self.cache.remove_if_not_accessed_and(...is_empty()) to remove
a blockhash, which means that if banking has added (not empty) or is adding
(is accessed) txs we won't remove the blockhash.

Yeah, I think you described this race condition correctly and the new method remove_if_not_accessed_and looks like a solid fix without needing a read guard. Awesome!

@github-actions
Copy link
Copy Markdown

This pull request is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions Bot added the stale label Jan 12, 2026
@github-actions
Copy link
Copy Markdown

This pull request was closed because it has been stale for 7 days with no activity.

@github-actions github-actions Bot closed this Jan 29, 2026
@alessandrod alessandrod reopened this Jan 29, 2026
@github-actions github-actions Bot removed the stale label Jan 30, 2026
Remove the global RwLock around the status cache, and introduce more
granular RwLocks per-blockhash and per-slot. Additionally, change the
internal hash tables from std HashMap to Dashmap, so that operations at
the blockhash and slot level can be done only holding read locks.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the runtime status cache to reduce contention by removing the global RwLock and introducing concurrent/sharded data structures (DashMap + per-entry Arc-based access), aiming to speed up transaction status checks and updates.

Changes:

  • Replaced the status cache’s global lock + HashMap layout with ReadOptimizedDashMap/DashMap/DashSet and per-blockhash/per-slot granularity.
  • Updated snapshot (de)serialization paths and bank call sites to use the new non-RwLock status cache API.
  • Introduced a concurrent append-only vector (ConcurrentVec via boxcar) and added boxcar/DashMap serde feature to dependencies.

Reviewed changes

Copilot reviewed 11 out of 13 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
runtime/src/status_cache.rs Core refactor: concurrent cache + slot delta structures, new insertion/clearing/purging logic, new ConcurrentVec.
runtime/src/bank.rs Bank now holds Arc<BankStatusCache> directly (no RwLock); updated status-cache call sites.
runtime/src/bank/check_transactions.rs Uses the new status cache API directly (no read lock).
runtime/src/snapshot_controller.rs Uses new status cache API when preparing snapshot requests.
runtime/src/snapshot_bank_utils.rs Uses new status cache API for snapshot restore/create (some call sites still need updating).
runtime/src/serde_snapshot/status_cache.rs Adjusted snapshot serialization/deserialization for new Status representation.
runtime/src/read_optimized_dashmap.rs Tweaks to wrapper; adds a test-only helper around retain behavior.
core/tests/snapshots.rs Updated one status cache usage to the new API (other call sites still need updating).
runtime/benches/status_cache.rs Bench updates for new API surface (no lock guard).
runtime/Cargo.toml Adds boxcar dependency to runtime crate.
Cargo.toml Adds boxcar to workspace deps; enables DashMap serde feature.
Cargo.lock Lockfile update for new dependency.
programs/sbf/Cargo.lock Lockfile update for new dependency in SBF workspace.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread core/tests/snapshots.rs
Comment on lines 316 to 318
.status_cache
.read()
.unwrap()
.roots()
.iter()
.cloned()
.sorted();
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still a bank.status_cache.read().unwrap().root_slot_deltas() usage later in this file (in the fastboot snapshot test). With Bank::status_cache no longer behind an RwLock, this remaining call site should be updated to bank.status_cache.root_slot_deltas() to keep the test compiling.

Copilot uses AI. Check for mistakes.
Comment on lines +962 to +965
// #[test]
// fn test_shuttle_purge_nonce_overlap_dfs() {
// shuttle::check_dfs(do_test_shuttle_purge_nonce_overlap, PURGE_DFS_ITERATIONS);
// }
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DFS variant of the purge nonce overlap shuttle test has been commented out. Since this PR changes status cache concurrency behavior, disabling this stress test reduces coverage for race conditions around add_root()/purging. Consider keeping it enabled (or gating it behind an opt-in flag) and tracking the shuttle runtime crash separately.

Suggested change
// #[test]
// fn test_shuttle_purge_nonce_overlap_dfs() {
// shuttle::check_dfs(do_test_shuttle_purge_nonce_overlap, PURGE_DFS_ITERATIONS);
// }
#[cfg(feature = "shuttle-dfs-tests")]
#[test]
fn test_shuttle_purge_nonce_overlap_dfs() {
shuttle::check_dfs(do_test_shuttle_purge_nonce_overlap, PURGE_DFS_ITERATIONS);
}

Copilot uses AI. Check for mistakes.
Comment on lines +302 to +305
// Safety: we just initialized the whole key_slice above
let mut forks = txs.entry(unsafe { key_slice.assume_init() }).or_default();
forks.push((slot, res.clone()));

Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

key_slice.assume_init() is called here and then called again later when passing key_slice into add_to_slot_delta(). assume_init() moves the [u8; CACHED_KEY_SIZE] out of the MaybeUninit, so using it twice is undefined behavior (second call reads moved/uninitialized memory). Fix by materializing the initialized array exactly once (e.g., assign to a local let key_slice = ...;) and then reuse/copy it for both insertions.

Copilot uses AI. Check for mistakes.
Comment on lines +124 to +127
// FIXME: this can happen with RPC doing get_status or whatever
panic!(
"slot {slot} is being cleared while another thread is inserting new entries"
);
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Panicking on remove_if_not_accessed() failure will crash the node whenever another thread is holding an Arc to this slot's delta map (this includes readers, not just inserters). Since concurrent access is possible in production (e.g., snapshot/RPC readers), this should be handled gracefully (retry, skip, or defer clearing) instead of panic!.

Suggested change
// FIXME: this can happen with RPC doing get_status or whatever
panic!(
"slot {slot} is being cleared while another thread is inserting new entries"
);
// FIXME: this can happen with RPC doing get_status or whatever. Another thread may
// be holding an `Arc` to this slot's delta map (including read-only access).
// In that case, we skip clearing this slot for now to avoid crashing the node.
#[cfg(test)]
panic!(
"slot {slot} is being cleared while another thread is inserting or accessing entries"
);
#[cfg(not(test))]
return;

Copilot uses AI. Check for mistakes.
verify_slot_deltas(slot_deltas.as_slice(), &bank)?;

bank.status_cache.write().unwrap().append(&slot_deltas);
bank.status_cache.append(&slot_deltas);
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module still contains at least one bank.status_cache.read().unwrap() call in the test helper create_bank_snapshot_from_bank() (later in this file). Since Bank::status_cache is now Arc<BankStatusCache> (not an RwLock), that call site needs to be updated to use bank.status_cache.root_slot_deltas() directly to avoid compilation failures.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown

This pull request is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions Bot added the stale label Apr 24, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

This pull request was closed because it has been stale for 7 days with no activity.

@github-actions github-actions Bot closed this May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Performance: Status Cache

9 participants