perf(storage): Static file provider delete_segment should use unwind_queue logic#21880
Closed
perf(storage): Static file provider delete_segment should use unwind_queue logic#21880
Conversation
Make delete_segment and delete_segment_below_block in the static file provider use an unwind queue pattern for consistency with the hot/cold separation architecture. Deletions are now queued and only executed when commit() is called, matching the existing prune_on_commit pattern. - Add delete_queue to StaticFileProviderInner for deferred jar deletions - Refactor delete_jar into delete_jar + delete_jar_no_reindex - delete_segment/delete_segment_below_block now queue instead of immediately deleting - commit() drains the delete queue, resets affected writers, executes deletions, and rebuilds the index once - has_unwind_queued() checks the delete queue - finalize() errors if delete queue is non-empty - Add reset(segment) to StaticFileWriters to clear stale writer handles - Add 3 tests for deferred deletion behavior Amp-Thread-ID: https://ampcode.com/threads/T-019c30ef-1a42-75c9-b61d-81e062acc279
- Use highest expected range end (BTreeMap last key) instead of index.max_block for highest jar exclusion in delete_segment_below_block, fixing incorrect deletion of partially-filled highest jars - Replace existing queue entries for the same segment on re-queue, preventing duplicate deletions if called twice before commit Amp-Thread-ID: https://ampcode.com/threads/T-019c30ef-1a42-75c9-b61d-81e062acc279
…ntics, add 6 tests - Extract collect_headers_and_queue() to deduplicate header loading between delete_segment and delete_segment_below_block - Use scoped blocks instead of drop(indexes) for index reads - Rename commit result variable to first_delete_err with comment clarifying reindex failure takes priority - Use take_while iterator instead of manual loop in delete_segment_below_block - Add tests: finalize guard, empty segment (both APIs), partial highest jar preservation, delete_below+delete_segment interaction, commit with empty queue noop Amp-Thread-ID: https://ampcode.com/threads/T-019c3135-f7cb-7168-bc87-28a27f381a5a
- Failed jar deletions are re-enqueued for retry on next commit instead of being permanently lost. - Rename has_delete_queued to has_queued_deletions. - Remove obvious doc comments, trim test comments. - Add test_write_after_full_deletion and test_delete_segment_then_delete_below_block. Amp-Thread-ID: https://ampcode.com/threads/T-019c314c-645a-7311-905b-52de28ceaefb
…tore #[instrument], improve docs Amp-Thread-ID: https://ampcode.com/threads/T-019c3162-aa34-7393-8368-0d4cb0f379c6
If delete_jar_no_reindex fails midway through the deletion loop, remaining operations are re-inserted into the queue so the next commit() retries them. initialize_index() now always runs regardless of deletion success to keep the in-memory index consistent. Amp-Thread-ID: https://ampcode.com/threads/T-019c3170-799e-757d-ab53-6c1439d32885
b1801ab to
c9cdc25
Compare
Contributor
|
yongkangc
commented
Feb 6, 2026
yongkangc
commented
Feb 6, 2026
yongkangc
commented
Feb 6, 2026
yongkangc
commented
Feb 6, 2026
yongkangc
commented
Feb 6, 2026
- Add docstrings for queue_delete, has_queued_deletions, take_delete_queue, queue_delete_raw, delete_jar_no_reindex, reset, commit, has_unwind_queued, and finalize - Add step-by-step section comments inside commit() for readability - Rename max_block -> expected_end in delete_segment_below_block to clarify that the comparison uses expected block range ends (BTreeMap keys), not actual block numbers, preventing confusion about partially-filled jars - Add comment explaining why highest jar exclusion is correct for partially-filled jars Amp-Thread-ID: https://ampcode.com/threads/T-019c3194-a863-7159-9eed-8263c91452ce
- Add error semantics doc to commit() clarifying that DB changes are already durable when this method runs, and callers must not roll back. - Clarify delete_segment_below_block docs: explicitly document that the highest jar is always preserved even when its blocks are below the threshold, and point to delete_segment for full removal. - Add test for requeue-on-failure path: sabotages a jar's config file to force deletion failure, verifies unprocessed ops are requeued and has_unwind_queued remains true across retries. Amp-Thread-ID: https://ampcode.com/threads/T-019c31f9-d89c-72bd-99bc-7937843012b4
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
delete_segmentanddelete_segment_below_blockpreviously deleted static file jars immediately during the pruner run, beforeDatabaseProvider::commit().This bypassed the unwind commit ordering (DB first, then static files) that ensures crash consistency, a crash after jar deletion but before DB commit left the database referencing jars that no longer existed on disk.
Solution:
This PR defers jar deletions to
commit()time, matching the existingprune_on_commitpattern.has_unwind_queued()now returnstruewhen deletions are queued, routingDatabaseProvider::commit()through the unwind path (DB commits first, then static files are deleted).A single
initialize_index()call replaces the per-jar reindex loop.Expected Impact
Crash-consistent commit ordering for segment deletions, and O(1) reindexing instead of O(n) per-jar reindex during pruning.
Resolves RETH-343