Skip to content

fix(pruner): prune account and storage changeset static files#21346

Merged
Rjected merged 9 commits intomainfrom
dan/pruning-static-file-changesets
Jan 26, 2026
Merged

fix(pruner): prune account and storage changeset static files#21346
Rjected merged 9 commits intomainfrom
dan/pruning-static-file-changesets

Conversation

@Rjected
Copy link
Member

@Rjected Rjected commented Jan 22, 2026

When account/storage changesets are stored in static files (rather than MDBX), the pruner now correctly reads from the static file provider and deletes the corresponding jars. Previously, pruning would silently skip these changesets since they weren't present in the database tables.

Adds an equivalent prune test for static files

@Rjected Rjected requested a review from shekhirin as a code owner January 22, 2026 21:51
@Rjected Rjected added the C-bug An unexpected or incorrect behavior label Jan 22, 2026
@Rjected Rjected added A-db Related to the database A-static-files Related to static files labels Jan 22, 2026
@github-project-automation github-project-automation bot moved this to Backlog in Reth Tracker Jan 22, 2026
@Rjected Rjected force-pushed the dan/pruning-static-file-changesets branch from e4936c6 to 6b813cc Compare January 22, 2026 21:57
@Rjected Rjected force-pushed the dan/pruning-static-file-changesets branch from 6b813cc to e718398 Compare January 22, 2026 21:59
@Rjected Rjected force-pushed the dan/pruning-static-file-changesets branch from 8318fbf to 04adb32 Compare January 22, 2026 22:16
}
}

impl AccountHistory {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one question is whether rocksdb would make use of this impl as well under AccountHistory

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think yes it should

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think yes it should

})
}

fn prune_database<Provider>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess this is the legacy implementation? not sure if we wanna add a note legacy as doc string

// / 2`, so 8750 entries. Each entry is `160 bit + 256 bit + 64 bit`, so the total
// size should be up to ~0.5MB + some hashmap overhead. `blocks_since_last_run` is
// additionally limited by the `max_reorg_depth`, so no OOM is expected here.
let mut highest_deleted_storages: FxHashMap<_, _> = FxHashMap::default();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut highest_deleted_storages: FxHashMap<_, _> = FxHashMap::default();
let mut highest_deleted_storages = FxHashMap::default();

could we?

Comment on lines 141 to 171
// Sort highest deleted block numbers by account address and storage key and turn them into
// sharded keys.
// We did not use `BTreeMap` from the beginning, because it's inefficient for hashes.
let highest_sharded_keys = highest_deleted_storages
.into_iter()
.sorted_unstable() // Unstable is fine because no equal keys exist in the map
.map(|((address, storage_key), block_number)| {
StorageShardedKey::new(
address,
storage_key,
block_number.min(last_changeset_pruned_block),
)
});
let outcomes = prune_history_indices::<Provider, tables::StoragesHistory, _>(
provider,
highest_sharded_keys,
|a, b| a.address == b.address && a.sharded_key.key == b.sharded_key.key,
)?;
trace!(target: "pruner", ?outcomes, %done, "Pruned storage history (indices)");

let progress = limiter.progress(done);

Ok(SegmentOutput {
progress,
pruned: pruned_changesets + outcomes.deleted,
checkpoint: Some(SegmentOutputCheckpoint {
block_number: Some(last_changeset_pruned_block),
tx_number: None,
}),
})
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the logic looks really similar as the db one including the comments - not sure if we wanna dedup them in the future. just a note, not this pr tho

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could probably add a helper

// / 2`, so 8750 entries. Each entry is `160 bit + 256 bit + 64 bit`, so the total
// size should be up to 0.5MB + some hashmap overhead. `blocks_since_last_run` is
// additionally limited by the `max_reorg_depth`, so no OOM is expected here.
let mut highest_deleted_accounts: FxHashMap<_, _> = FxHashMap::default();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut highest_deleted_accounts: FxHashMap<_, _> = FxHashMap::default();
let mut highest_deleted_accounts = FxHashMap::default();

could we?

Copy link
Member

@yongkangc yongkangc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense

the logic of account, storage changesets looks realy smiliar to mdbx ones, not sure if we can dedup them in the futture

@github-project-automation github-project-automation bot moved this from Backlog to In Progress in Reth Tracker Jan 22, 2026
@Rjected Rjected force-pushed the dan/pruning-static-file-changesets branch from 2f6f372 to c649acc Compare January 22, 2026 23:09
}
trace!(target: "pruner", pruned = %pruned_changesets, %done, "Pruned account history (changesets from static files)");

let last_changeset_pruned_block = last_changeset_pruned_block
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

starting here and below could be dedupped with the mdbx implementation. but unsure if then @yongkangc would even use it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for storage history

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, has been mentioned already

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw added reuse for this ptal

@gakonst
Copy link
Member

gakonst commented Jan 23, 2026

[MAJOR] account_history.rs:79 - Significant code duplication between prune_static_files() and prune_database().

Both methods share ~60% identical logic:

  • Lines 89-100 (limiter setup + early return) duplicated at lines 176-187
  • Lines 133-146 (checkpoint calculation + sharded key sorting) duplicated at lines 211-224
  • Lines 147-163 (history index pruning + output construction) duplicated at lines 225-242

Consider extracting shared logic into helper methods:

  1. setup_limiter(input, tables_count) -> (PruneLimiter, Option<SegmentOutput>) for the early return pattern
  2. finalize_prune(highest_deleted_accounts, last_block, done, range_end, limiter) -> SegmentOutput for the common tail

This would make prune_static_files() and prune_database() only differ in their changeset iteration logic (~20 lines each), making the actual difference between static file vs database pruning much clearer.

Action: Should fix

@gakonst
Copy link
Member

gakonst commented Jan 23, 2026

[MAJOR] storage_history.rs:79 - Same duplication pattern as account_history.rs.

prune_static_files() (lines 79-171) and prune_database() (lines 173-253) share nearly identical:

  • Limiter setup and early return logic
  • Checkpoint calculation with the done flag adjustment
  • Sharded key sorting and history index pruning
  • Output construction

The only real difference is:

  • Static files: iterating via walk_storage_changeset_range() walker
  • Database: iterating via prune_table_with_range()

Consider a shared helper or a strategy pattern to avoid the duplication.

Action: Should fix

@gakonst

This comment has been minimized.

@gakonst
Copy link
Member

gakonst commented Jan 23, 2026

[MINOR] account_history.rs:89-100 - The limiter adjustment pattern is duplicated 4x across both files.

This exact pattern appears at:

  • account_history.rs:89-100 (prune_static_files)
  • account_history.rs:176-187 (prune_database)
  • storage_history.rs:89-100 (prune_static_files)
  • storage_history.rs:183-194 (prune_database)

Consider extracting to a helper:

fn adjust_limiter_for_tables(
    input: PruneInput,
    tables_to_prune: usize,
) -> Result<PruneLimiter, SegmentOutput> {
    let mut limiter = if let Some(limit) = input.limiter.deleted_entries_limit() {
        input.limiter.set_deleted_entries_limit(limit / tables_to_prune)
    } else {
        input.limiter
    };
    
    if limiter.is_limit_reached() {
        return Err(SegmentOutput::not_done(
            limiter.interrupt_reason(),
            input.previous_checkpoint.map(SegmentOutputCheckpoint::from_prune_checkpoint),
        ));
    }
    Ok(limiter)
}

Action: Should fix (part of the larger dedup effort)

@gakonst
Copy link
Member

gakonst commented Jan 23, 2026

Review Summary: Abstraction Quality & Code Duplication

Issues Found

Severity File Line(s) Issue
MAJOR account_history.rs 79-242 ~60% code duplication between prune_static_files() and prune_database()
MAJOR storage_history.rs 79-253 Same duplication pattern as account_history.rs
MAJOR Both files - Cross-file duplication: both files implement nearly identical structures
MINOR account_history.rs 89-100 Limiter adjustment pattern duplicated 4x across both files
NIT account_history.rs 102, 193 Typo: "it's" should be "is"
NIT account_history.rs 125 Comment restates what the code does (same in storage_history.rs:127)

What's Good

  • Comments explain "why" not "what": The memory size estimation comments and the BTreeMap inefficiency explanation are valuable
  • Naming is clear: highest_deleted_accounts, last_changeset_pruned_block, prune_static_files vs prune_database are all self-documenting
  • No unnecessary abstractions: No single-use traits or premature indirection

Recommendation

The main issue is that the new static file pruning path introduced significant duplication. Per Ousterhout's principles, this creates maintenance burden - any bug fix or enhancement will need to be applied in 4+ places. Consider:

  1. Short-term: Extract shared limiter setup and finalization logic into helper methods
  2. Long-term (optional): A generic HistoryPruner<T> that parameterizes over the table/key types

})
}

fn prune_database<Provider>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might want to add a comment here indicating this is for mdbx since we have rocksdb pruning as well?

@Rjected Rjected requested a review from joshieDo January 24, 2026 01:02
Copy link
Collaborator

@joshieDo joshieDo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@Rjected Rjected added this pull request to the merge queue Jan 26, 2026
Merged via the queue into main with commit 94235d6 Jan 26, 2026
45 checks passed
@Rjected Rjected deleted the dan/pruning-static-file-changesets branch January 26, 2026 19:37
@github-project-automation github-project-automation bot moved this from In Progress to Done in Reth Tracker Jan 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-db Related to the database A-static-files Related to static files C-bug An unexpected or incorrect behavior

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants