fix(pruner): prune account and storage changeset static files#21346
fix(pruner): prune account and storage changeset static files#21346
Conversation
e4936c6 to
6b813cc
Compare
6b813cc to
e718398
Compare
8318fbf to
04adb32
Compare
| } | ||
| } | ||
|
|
||
| impl AccountHistory { |
There was a problem hiding this comment.
one question is whether rocksdb would make use of this impl as well under AccountHistory
| }) | ||
| } | ||
|
|
||
| fn prune_database<Provider>( |
There was a problem hiding this comment.
i guess this is the legacy implementation? not sure if we wanna add a note legacy as doc string
| // / 2`, so 8750 entries. Each entry is `160 bit + 256 bit + 64 bit`, so the total | ||
| // size should be up to ~0.5MB + some hashmap overhead. `blocks_since_last_run` is | ||
| // additionally limited by the `max_reorg_depth`, so no OOM is expected here. | ||
| let mut highest_deleted_storages: FxHashMap<_, _> = FxHashMap::default(); |
There was a problem hiding this comment.
| let mut highest_deleted_storages: FxHashMap<_, _> = FxHashMap::default(); | |
| let mut highest_deleted_storages = FxHashMap::default(); |
could we?
| // Sort highest deleted block numbers by account address and storage key and turn them into | ||
| // sharded keys. | ||
| // We did not use `BTreeMap` from the beginning, because it's inefficient for hashes. | ||
| let highest_sharded_keys = highest_deleted_storages | ||
| .into_iter() | ||
| .sorted_unstable() // Unstable is fine because no equal keys exist in the map | ||
| .map(|((address, storage_key), block_number)| { | ||
| StorageShardedKey::new( | ||
| address, | ||
| storage_key, | ||
| block_number.min(last_changeset_pruned_block), | ||
| ) | ||
| }); | ||
| let outcomes = prune_history_indices::<Provider, tables::StoragesHistory, _>( | ||
| provider, | ||
| highest_sharded_keys, | ||
| |a, b| a.address == b.address && a.sharded_key.key == b.sharded_key.key, | ||
| )?; | ||
| trace!(target: "pruner", ?outcomes, %done, "Pruned storage history (indices)"); | ||
|
|
||
| let progress = limiter.progress(done); | ||
|
|
||
| Ok(SegmentOutput { | ||
| progress, | ||
| pruned: pruned_changesets + outcomes.deleted, | ||
| checkpoint: Some(SegmentOutputCheckpoint { | ||
| block_number: Some(last_changeset_pruned_block), | ||
| tx_number: None, | ||
| }), | ||
| }) | ||
| } |
There was a problem hiding this comment.
the logic looks really similar as the db one including the comments - not sure if we wanna dedup them in the future. just a note, not this pr tho
There was a problem hiding this comment.
could probably add a helper
| // / 2`, so 8750 entries. Each entry is `160 bit + 256 bit + 64 bit`, so the total | ||
| // size should be up to 0.5MB + some hashmap overhead. `blocks_since_last_run` is | ||
| // additionally limited by the `max_reorg_depth`, so no OOM is expected here. | ||
| let mut highest_deleted_accounts: FxHashMap<_, _> = FxHashMap::default(); |
There was a problem hiding this comment.
| let mut highest_deleted_accounts: FxHashMap<_, _> = FxHashMap::default(); | |
| let mut highest_deleted_accounts = FxHashMap::default(); |
could we?
yongkangc
left a comment
There was a problem hiding this comment.
makes sense
the logic of account, storage changesets looks realy smiliar to mdbx ones, not sure if we can dedup them in the futture
2f6f372 to
c649acc
Compare
| } | ||
| trace!(target: "pruner", pruned = %pruned_changesets, %done, "Pruned account history (changesets from static files)"); | ||
|
|
||
| let last_changeset_pruned_block = last_changeset_pruned_block |
There was a problem hiding this comment.
starting here and below could be dedupped with the mdbx implementation. but unsure if then @yongkangc would even use it
There was a problem hiding this comment.
same for storage history
There was a problem hiding this comment.
oops, has been mentioned already
There was a problem hiding this comment.
btw added reuse for this ptal
|
[MAJOR] account_history.rs:79 - Significant code duplication between Both methods share ~60% identical logic:
Consider extracting shared logic into helper methods:
This would make Action: Should fix |
|
[MAJOR] storage_history.rs:79 - Same duplication pattern as account_history.rs.
The only real difference is:
Consider a shared helper or a strategy pattern to avoid the duplication. Action: Should fix |
This comment has been minimized.
This comment has been minimized.
|
[MINOR] account_history.rs:89-100 - The limiter adjustment pattern is duplicated 4x across both files. This exact pattern appears at:
Consider extracting to a helper: fn adjust_limiter_for_tables(
input: PruneInput,
tables_to_prune: usize,
) -> Result<PruneLimiter, SegmentOutput> {
let mut limiter = if let Some(limit) = input.limiter.deleted_entries_limit() {
input.limiter.set_deleted_entries_limit(limit / tables_to_prune)
} else {
input.limiter
};
if limiter.is_limit_reached() {
return Err(SegmentOutput::not_done(
limiter.interrupt_reason(),
input.previous_checkpoint.map(SegmentOutputCheckpoint::from_prune_checkpoint),
));
}
Ok(limiter)
}Action: Should fix (part of the larger dedup effort) |
Review Summary: Abstraction Quality & Code DuplicationIssues Found
What's Good
RecommendationThe main issue is that the new static file pruning path introduced significant duplication. Per Ousterhout's principles, this creates maintenance burden - any bug fix or enhancement will need to be applied in 4+ places. Consider:
|
| }) | ||
| } | ||
|
|
||
| fn prune_database<Provider>( |
There was a problem hiding this comment.
we might want to add a comment here indicating this is for mdbx since we have rocksdb pruning as well?
When account/storage changesets are stored in static files (rather than MDBX), the pruner now correctly reads from the static file provider and deletes the corresponding jars. Previously, pruning would silently skip these changesets since they weren't present in the database tables.
Adds an equivalent
prunetest for static files