Skip to content

Marks old storages as dirty in clean_accounts()#3702

Merged
brooksprumo merged 1 commit intoanza-xyz:masterfrom
brooksprumo:mark-old-slots-as-dirty
Nov 19, 2024
Merged

Marks old storages as dirty in clean_accounts()#3702
brooksprumo merged 1 commit intoanza-xyz:masterfrom
brooksprumo:mark-old-slots-as-dirty

Conversation

@brooksprumo
Copy link
Copy Markdown

@brooksprumo brooksprumo commented Nov 19, 2024

Problem

We do not clean up old storages.

More context: when calculating a full accounts hash, we call mark_old_slots_as_dirty() as a way to ensure we do not forget or miss cleaning up really old storages (i.e. ones that are older than an epoch old). But, when we enable skipping rewrites, we don't want to clean up those old storages, as they'll intentionally be treated as ancient append vecs. So inside mark_old_slots_as_dirty() we conditionally mark old slots as dirty. This is based on the value of ancient_append_vec_offset, which should be None unless ancient append vecs are enabled.

Unfortunately, normal running validators, we end up never marking old slots as dirty, because the ancient append vec offset is always Some. And thus we don't clean up old storages.

Summary of Changes

Mark old storages as dirty in clean_accounts().

We still check if ancient append vecs are enabled, but not with the ancient_append_vec_offset. Instead we look at the skipping rewrites feature gate and the cli arg.

By moving this marking into clean_accounts(), we also decouple it from accounts hash calculation, which is not necessary anymore. This also removes behavioral differences based on if snapshots are enabled or not.

@brooksprumo brooksprumo self-assigned this Nov 19, 2024
@brooksprumo brooksprumo marked this pull request as ready for review November 19, 2024 12:56
@mergify
Copy link
Copy Markdown

mergify Bot commented Nov 19, 2024

Backports to the stable branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule.

@mergify
Copy link
Copy Markdown

mergify Bot commented Nov 19, 2024

Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis.

Copy link
Copy Markdown

@HaoranYi HaoranYi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

let acceptable_straggler_slot_count = 100;
let old_slot_cutoff =
slot_one_epoch_old.saturating_sub(acceptable_straggler_slot_count);
let (old_storages, old_slots) = self.get_snapshot_storages(..old_slot_cutoff);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was originally worried about the cost of getting these snapshots, but that worry was about 3 years ago. I assume this isn't too expensive to run every clean? clean is only once per 100s or so. I imagine this is fine. Not sure how else you'd get this. I was piggy backing off the storages collected for hash of all accounts previously.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it would be nice to not fetch the storages unnecessarily. I've been running this on a validator for a few hours now.

Yes, clean normally runs once every ~45-50 seconds (~100 slots). The median time to get the storages here has been 23 milliseconds, with a common range of 21-28 milliseconds.

Another option would be to inspect these old storages less often.

It is nice that once we skip rewrites this will all go away, and that does feel close now.

Copy link
Copy Markdown

@jeffwashington jeffwashington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. bummer we introduced this side effect. when we eliminate rewrites, all this will go away.

@brooksprumo brooksprumo merged commit bf33b8c into anza-xyz:master Nov 19, 2024
@brooksprumo brooksprumo deleted the mark-old-slots-as-dirty branch November 19, 2024 17:49
mergify Bot pushed a commit that referenced this pull request Nov 19, 2024
(cherry picked from commit bf33b8c)

# Conflicts:
#	accounts-db/src/accounts_db.rs
#	accounts-db/src/accounts_db/tests.rs
#	runtime/src/bank.rs
mergify Bot pushed a commit that referenced this pull request Nov 19, 2024
(cherry picked from commit bf33b8c)

# Conflicts:
#	accounts-db/src/accounts_db/tests.rs
brooksprumo added a commit that referenced this pull request Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants