perf: release execution cache lock before waiting for state root validation#22138
perf: release execution cache lock before waiting for state root validation#22138
Conversation
|
…dation Previously, save_cache held the ExecutionCache mutex for the entire duration of state root validation (~95-131ms), blocking the next block's prewarming from accessing the cache. Split save_cache into three phases: 1. Phase 1 (locked, ~1ms): insert_state and optimistically publish the warmed cache 2. Phase 2 (unlocked): wait for valid_block_rx without holding the lock, allowing the next block's prewarming to proceed 3. Phase 3 (locked, only if invalid): re-acquire lock and clear cache only if it still belongs to this block Amp-Thread-ID: https://ampcode.com/threads/T-019c5345-8a04-75be-bb4a-cf30b28fe68d
96947d7 to
a6b34f2
Compare
mattsse
left a comment
There was a problem hiding this comment.
I'm pretty sure I reviewed the same change somewhere else
this change is problematic because this doesnt guard against re-use for alternative blocks, e.g. we just executed N + 1 == A what happens when we get a different A' with N as parent.
now the cache can already point to A but we haven't adjusted the anchor yet
There was a problem hiding this comment.
I don't think the assessment here is correct.
this change doesnt make it faster overall and just moves around the locking which is now problematic
the meassurements of
are flawed because this includes both cache updates + waiting until the block was validated
| // Wait for state root validation WITHOUT holding the cache lock. | ||
| // This is the key optimization: the original code held the lock across this | ||
| // blocking recv(), which blocked the next block's prewarming from accessing | ||
| // the cache for ~100ms+. | ||
| if valid_block_rx.recv().is_err() { |
There was a problem hiding this comment.
isnt this unproblematic anyways, because this fires once we have validated the block and the remaining ops here are cheap
| /// saved cache, inserts state, and publishes under a brief write lock. This avoids | ||
| /// the ~100ms+ lock hold that previously blocked concurrent readers during | ||
| /// `valid_block_rx.recv()`. |
There was a problem hiding this comment.
there are no concurrent readers here, because the main exec thread is busy validating the block
Summary
Traced via Grafana Tempo that
save_cacheis the #1 bottleneck in thenew_payloadpipeline, blocking for ~95-131ms (up to 80% of NP time) while holding theExecutionCachemutex. The root cause is thatvalid_block_rx.recv()(waiting for state root validation) is called insideupdate_with_guard, holding the lock for the entire state root duration. This blocks the next block's prewarming from accessing the cache.Split
save_cachefrom a single locked phase into three phases:SavedCache, runinsert_state, and optimistically publish the warmed cachevalid_block_rxwithout holding the lock — the next block's prewarming can start immediatelyChanges
save_cacheto release theExecutionCachelock before blocking onvalid_block_rx.recv()executed_block_hash() == hashExpected Impact
Reduces execution cache lock hold time from ~100ms+ to ~1ms per block, unblocking prewarming for subsequent blocks and improving overall NP throughput.