perf(engine): merge pruning into save_blocks write transaction#21886
Closed
perf(engine): merge pruning into save_blocks write transaction#21886
Conversation
Previously, `PrewarmCacheTask::save_cache` called `valid_block_rx.recv()` inside `PayloadExecutionCache::update_with_guard`, which holds a write lock. This blocking call could stall all readers/writers of the execution cache indefinitely, causing system-wide contention (observed via 5ms lock contention warnings in `get_cache_for()`). This change: 1. Moves all cache building operations (clone, insert_state, update_metrics) outside the write lock 2. Moves the blocking `recv()` call outside the lock 3. Adds conditional swap with parent hash check to prevent race conditions 4. Keeps the usage guard alive until after recv() to prevent another thread from clearing the cache we're still using The lock is now only held for the minimal final assignment. Amp-Thread-ID: https://ampcode.com/threads/T-019c2db8-0945-765a-979e-99d922bd4791
Previously, after on_save_blocks committed blocks (fsync #1), the persistence thread ran pruning in a separate MDBX write transaction with its own commit (fsync #2). During this entire pruning pass, the persistence thread could not process new requests. Merge pruning into the same write transaction as save_blocks by calling Pruner::run_with_provider() with the existing provider_rw before commit. This eliminates the second fsync entirely — one write transaction, one commit, one fsync per cycle. Prune errors are caught and logged but do not prevent block persistence. This preserves the existing guarantee that blocks are always committed regardless of prune outcome. Based on bench metrics (rf7d8): save p50=305ms, prune p50=128ms firing every other save. Prune accounts for 14.9% of total persistence wall time (53s / 356s). This change eliminates ~128ms of redundant fsync latency on every prune cycle. Amp-Thread-ID: https://ampcode.com/threads/T-019c3183-3b50-7379-8a4b-42f7a68aac22
Contributor
|
Rjected
requested changes
Feb 6, 2026
Member
Rjected
left a comment
There was a problem hiding this comment.
I do think it would be worth adding helpers for running the pruner with the same provider_rw, so that we only commit once per batch of blocks, would like the cached state / unrelated changes removed though.
Comment on lines
-77
to
-88
|
|
||
| /// Prunes block data before the given block number according to the configured prune | ||
| /// configuration. | ||
| #[instrument(level = "debug", target = "engine::persistence", skip_all, fields(block_num))] | ||
| fn prune_before(&mut self, block_num: u64) -> Result<PrunerOutput, PrunerError> { | ||
| debug!(target: "engine::persistence", ?block_num, "Running pruner"); | ||
| let start_time = Instant::now(); | ||
| // TODO: doing this properly depends on pruner segment changes | ||
| let result = self.pruner.run(block_num); | ||
| self.metrics.prune_before_duration_seconds.record(start_time.elapsed()); | ||
| result | ||
| } |
Member
There was a problem hiding this comment.
would like to keep a helper method for this if possible and the instrumentation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
After
on_save_blockscommitted blocks (fsync #1), the persistence thread ran pruning in a separate MDBX write transaction with its own commit (fsync #2). During this entire pruning pass, the persistence thread could not process new requests. This merges pruning into the same write transaction by callingPruner::run_with_provider()with the existingprovider_rwbeforecommit(), eliminating the second fsync entirely.Changes
on_save_blocksbeforeprovider_rw.commit()using the existingrun_with_providerAPIprune_beforemethod and post-save pruning fromrun()loopPrunerOutputimportExpected Impact
Eliminates ~128ms of redundant fsync latency on every prune cycle (fires every other save). Based on bench metrics: prune accounts for 14.9% of total persistence wall time (53s / 356s). Each persistence cycle becomes a single atomic DB update with one fsync instead of two.
Notes
The
PersistenceError::PrunerErrorvariant is retained for API compatibility even though it is no longer constructed in this code path. The pruner's built-indelete_limitandtimeoutbound the additional transaction duration.