fix(engine): defer changeset static file truncation during reorgs#23291
fix(engine): defer changeset static file truncation during reorgs#23291joshieDo wants to merge 8 commits into
Conversation
During RemoveBlocksAbove, changeset static file prune strategies are now deferred to the next SaveBlocks commit. This prevents truncating memory-mapped files while concurrent readers (payload builders, RPC) may still hold stale handles, which caused a panic in NippyJar from reading garbage offsets on the truncated mmap. Headers, transactions, and receipts are still truncated immediately. Only changeset prunes are deferred since those are the segments read concurrently during state root computation. Amp-Thread-ID: https://ampcode.com/threads/T-019d3ffb-22ed-70d4-ad5c-23eeed6f0ad7 Co-authored-by: Amp <amp@ampcode.com>
|
| debug_assert_eq!( | ||
| account, storage, | ||
| "account and storage changeset prunes must target the same block" | ||
| ); | ||
| account.or(storage) |
There was a problem hiding this comment.
this should be the same block, unsure if a new type for both is necessary
Amp-Thread-ID: https://ampcode.com/threads/T-019d3ffb-22ed-70d4-ad5c-23eeed6f0ad7 Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019d3ffb-22ed-70d4-ad5c-23eeed6f0ad7 Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019d3ffb-22ed-70d4-ad5c-23eeed6f0ad7 Co-authored-by: Amp <amp@ampcode.com>
… changeset prune Adds oldest_reader_txnid() and last_txnid() to the Database trait, backed by MDBX's mi_latter_reader_txnid and mi_recent_txnid. Before applying a deferred changeset prune in SaveBlocks, the persistence thread now spins until all MDBX readers from the reorg era have completed. This ensures no reader holds a stale mmap handle when the changeset files are truncated. Amp-Thread-ID: https://ampcode.com/threads/T-019d3ffb-22ed-70d4-ad5c-23eeed6f0ad7 Co-authored-by: Amp <amp@ampcode.com>
c6a5cab to
d1ecfa7
Compare
Amp-Thread-ID: https://ampcode.com/threads/T-019d4435-3bfe-7739-b47f-fbcf6473a16a Co-authored-by: Amp <amp@ampcode.com>
This reverts commit 5036cc4.
Amp-Thread-ID: https://ampcode.com/threads/T-019d4435-3bfe-7739-b47f-fbcf6473a16a Co-authored-by: Amp <amp@ampcode.com>
Rjected
left a comment
There was a problem hiding this comment.
mostly nits, based on my understanding of the problem this fix should work. I think we should add some docs explaining why we would not get any new readers by the time we call save_blocks and wait for any final readers to complete
| /// Last transaction ID | ||
| #[inline] | ||
| pub const fn last_txnid(&self) -> usize { | ||
| self.0.mi_recent_txnid as usize | ||
| } |
There was a problem hiding this comment.
can we document that this is the most recent txn id?
| /// | ||
| /// This only restores the queued prune-on-commit state. Safety still depends on the caller | ||
| /// having waited for stale MDBX readers to drain before the next commit executes the prune. | ||
| pub(crate) fn requeue_changeset_prunes(&self, last_block: BlockNumber) -> ProviderResult<()> { |
There was a problem hiding this comment.
can we rename this from requeue to maybe complete or something that mentions that this finalizes / flushes the queued prunes?
| /// commit. | ||
| /// | ||
| /// This is used by the deferred reorg-prune flow in the persistence service. Callers must | ||
| /// only reapply the returned prune after stale MDBX readers from the pre-unwind era have |
There was a problem hiding this comment.
| /// only reapply the returned prune after stale MDBX readers from the pre-unwind era have | |
| /// only apply the returned prune after stale MDBX readers from the pre-unwind era have |
| /// concurrent readers may still hold handles. The prune is applied at the start of the next | ||
| /// `SaveBlocks`, after waiting for all MDBX readers from the reorg era to drain. | ||
| #[derive(Debug)] | ||
| struct DeferredChangesetPrune { |
There was a problem hiding this comment.
lets please not add private types to top of file :)
| .oldest_reader_txnid() | ||
| .is_some_and(|oldest| oldest < prune_txn) |
There was a problem hiding this comment.
is this perhaps too relaxed?
this would also include rpc related transactions?
and there's a risk that an ethgetproof or adjacent tx will be open for a while?
should we either introduce a counter, that after like 3 blocks or so we always do this?
Defers changeset static file truncation during
RemoveBlocksAboveto the nextSaveBlocks, preventing stale mmap panics when concurrent readers hold old memory-mapped handles.Waits for all MDBX readers from the reorg era to drain (via
oldest_reader_txnid) before applying the deferred truncation.Follow-up: #23305 (pins RocksDB snapshot for cross-store read consistency, stacked on this PR)