Skip to content

Reclaim support for obsolete accounts#6501

Merged
roryharr merged 4 commits intoanza-xyz:masterfrom
roryharr:implement_reclaim_support_for_obsolete_accounts
Jun 17, 2025
Merged

Reclaim support for obsolete accounts#6501
roryharr merged 4 commits intoanza-xyz:masterfrom
roryharr:implement_reclaim_support_for_obsolete_accounts

Conversation

@roryharr
Copy link
Copy Markdown

Problem

Reclaims need to support marking obsolete accounts

Summary of Changes

  • Added enum for whether obsolete accounts should be marked during reclaims or not
  • Marked obsolete accounts during reclaim
  • Added some test coverage

Fixes #

assert_eq!(account_info.0, slot);
let reclaims = [account_info];
accounts_db.remove_dead_accounts(reclaims.iter(), None, true);
num_obsolete_accounts += reclaims.len();
Copy link
Copy Markdown
Author

@roryharr roryharr Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gives some coverage of the code for now. Will be more coverage with the full feature.

I think this is sufficient given that this is all the code is really doing right now

reset_accounts,
pubkeys_removed_from_accounts_index,
HandleReclaims::ProcessDeadSlots(&self.clean_accounts_stats.purge_stats),
MarkAccountsObsolete::No,
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be marked as they are not getting unreffed (as this is part of clean)

reset_accounts,
&pubkeys_removed_from_accounts_index,
HandleReclaims::ProcessDeadSlots(&self.clean_accounts_stats.purge_stats),
MarkAccountsObsolete::No,
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be marked as they are not getting unreffed (as this is part of clean)

@roryharr roryharr force-pushed the implement_reclaim_support_for_obsolete_accounts branch 3 times, most recently from 5d890b3 to fdecf03 Compare June 12, 2025 19:39
Comment thread accounts-db/src/accounts_db.rs Outdated
@roryharr roryharr force-pushed the implement_reclaim_support_for_obsolete_accounts branch from fdecf03 to 2ac7acc Compare June 12, 2025 20:08
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jun 12, 2025

Codecov Report

❌ Patch coverage is 98.14815% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 82.8%. Comparing base (69e749e) to head (f851322).
⚠️ Report is 3139 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff            @@
##           master    #6501    +/-   ##
========================================
  Coverage    82.8%    82.8%            
========================================
  Files         849      849            
  Lines      379196   379232    +36     
========================================
+ Hits       314029   314154   +125     
+ Misses      65167    65078    -89     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@roryharr roryharr force-pushed the implement_reclaim_support_for_obsolete_accounts branch from 2ac7acc to cf2742e Compare June 12, 2025 21:45
@roryharr roryharr self-assigned this Jun 12, 2025
@roryharr roryharr requested review from HaoranYi and brooksprumo June 12, 2025 23:10
@roryharr roryharr marked this pull request as ready for review June 12, 2025 23:10
&HashSet::default(),
// this callsite does NOT process dead slots
HandleReclaims::DoNotProcessDeadSlots,
MarkAccountsObsolete::No,
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't effect this PR much but:
There are two options I'm testing right now.

Original Path: Remove this entire populate reclaims during upsert path as it isn't used in current code at all. Add a new call to populate reclaims during do_flush_slot_cache. In this path reclaims are populated when determining whether to flush or purge accounts in do_flush_slot_cache

Possible New Direction: Re-use the populate reclaims during upsert to find the obsolete accounts, and then switch this call to mark obsolete accounts

It has as a small effect on this PR: In the Original Path this callsite of populate reclaims gets removed, and in the Possible New Direction it gets modified to MarkAccountsObsolete, but I don't see any reason to block this PR based on that.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to see: This is store_accounts_custom

@brooksprumo
Copy link
Copy Markdown

Reclaims need to support marking obsolete accounts

I was under the impression that flush was going to handle marking obsolete accounts. Can you add more info/help me understand why reclaims needs to support marking obsolete accounts too, please?

@roryharr
Copy link
Copy Markdown
Author

roryharr commented Jun 13, 2025

Reclaims need to support marking obsolete accounts

I was under the impression that flush was going to handle marking obsolete accounts. Can you add more info/help me understand why reclaims needs to support marking obsolete accounts too, please?

This is actually the only path where accounts are marked obsolete currently. I'm still testing different scenarios for marking zero lamport single refs obsolete (potentially clean could loop through all slots that are newly older than the last full snapshot and transition them all to obsolete), but haven't been successful yet. It also isn't required for the base implementation, as they still get cleaned up since they are marked as zero lamport single refs.

I have two versions of the implementation that i'm testing out, but both involve reclaims as reclaims provides the functionality required, which is:

  1. Remove accounts from the store.
  • This could be done in other ways (for example: have shrink work on alive_bytes + obsolete_bytes), but there doesn't seem to be a need to differentiate. obsolete bytes/accounts are dead bytes/accounts
  1. Determine if the account is valid for shrink
  2. Mark fully dead slots and remove if possible.
  • This could be deferred to clean, but flush already supports removing dead_slots in the existing implementation

@brooksprumo
Copy link
Copy Markdown

I'm still testing different scenarios for marking zero lamport single refs obsolete

I think it would be preferable to not require special handling for zero lamport single ref accounts.


I have two versions of the implementation that i'm testing out, but both involve reclaims as reclaims provides the functionality required, which is:

  1. Remove accounts from the store.

By "removing accounts" do you mean writing a new storage without dead accounts? aka shrink? Or do you mean modifying the AccountStorageEntry::count_and_status field? Fwiw, we used to reclaim whole AccountStorageEntry/AppendVecs, but we don't anymore. I believe there still is code for this purpose, but could now be cleaned up.

  • This could be done in other ways (for example: have shrink work on alive_bytes + obsolete_bytes), but there doesn't seem to be a need to differentiate. obsolete bytes/accounts are dead bytes/accounts

Yeah, I agree. I think the idea is that the obsolete bytes/accounts will be used to determine if a storage should be shrunk or not.

  1. Determine if the account is valid for shrink

"valid", meaning if the account is obsolete and the "obsoleted slot" is now old enough where we can shrink the storage and throw away this account?

  1. Mark fully dead slots and remove if possible.

Is there special handling required for obsolete accounts in fully dead slots?

@roryharr
Copy link
Copy Markdown
Author

roryharr commented Jun 13, 2025

I'll reply in a few different pieces if that's ok!

I'm still testing different scenarios for marking zero lamport single refs obsolete

I think it would be preferable to not require special handling for zero lamport single ref accounts.

It's not special handling really. Currently zero lamport single refs are never marked obsolete. This is OK since they eventually get visited by clean. A future improvement could be adding a step to clean that marks all zero lamport accounts as obsolete when they are older than the last full snapshot. I'm not sure if this will work yet though

One caveat: They are marked obsolete if a newer account with the same pubkey is created and treated like an obsolete account just like any other account

@roryharr
Copy link
Copy Markdown
Author

I have two versions of the implementation that i'm testing out, but both involve reclaims as reclaims provides the functionality required, which is:

  1. Remove accounts from the store.

By "removing accounts" do you mean writing a new storage without dead accounts? aka shrink? Or do you mean modifying the AccountStorageEntry::count_and_status field? Fwiw, we used to reclaim whole AccountStorageEntry/AppendVecs, but we don't anymore. I believe there still is code for this purpose, but could now be cleaned up.

In this case i just mean store.remove_accounts, which is modifying the count_and_status field

@roryharr
Copy link
Copy Markdown
Author

  1. Determine if the account is valid for shrink

"valid", meaning if the account is obsolete and the "obsoleted slot" is now old enough where we can shrink the storage and throw away this account?

The call in remove_dead_accounts:

if Self::is_shrinking_productive(&store)
                            && self.is_candidate_for_shrink(&store)
                        {
                            // Checking that this single storage entry is ready for shrinking,
                            // should be a sufficient indication that the slot is ready to be shrunk
                            // because slots should only have one storage entry, namely the one that was
                            // created by `flush_slot_cache()`.
                            new_shrink_candidates.insert(*slot);
                        }

@roryharr
Copy link
Copy Markdown
Author

  1. Mark fully dead slots and remove if possible.

Is there special handling required for obsolete accounts in fully dead slots?

It depends what you mean, but there might be, I'm still debugging an issue in this space:
I'm finding when snapshots are made, occasionally there are fully dead slots (as in no valid accounts left) after filtering.

These should've been picked up and removed by clean prior to the snapshot, but aren't for some reason, which I haven't been able to reproduce in a unit test. As a work around in testing right now I'm just rebuilding them as zero length files, which oddly seems to work as long as I ignore AppendVecError::FileSizeTooSmall(file_size).

I'm not sure what the root cause is yet, so I'm not sure what the solution will look like.

@roryharr
Copy link
Copy Markdown
Author

You can also look at https://github.com/roryharr/agave-sandbox/pull/3/files to see the remaining changes. I would recommend just looking at the commented pieces of accounts_db.rs. All the other changes are just test changes, or changes that are unlikely to be merged.

Comment thread accounts-db/src/accounts_db.rs Outdated
@roryharr roryharr requested a review from HaoranYi June 16, 2025 18:26
HaoranYi
HaoranYi previously approved these changes Jun 16, 2025
Copy link
Copy Markdown

@HaoranYi HaoranYi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

@brooksprumo
Copy link
Copy Markdown

One caveat: They are marked obsolete if a newer account with the same pubkey is created and treated like an obsolete account just like any other account

Makes sense. As then they are no longer single-ref.


I'm still debugging an issue in this space: I'm finding when snapshots are made, occasionally there are fully dead slots (as in no valid accounts left) after filtering.

This issue occurs with this PR, or future work?

Comment thread accounts-db/src/accounts_db.rs Outdated
@brooksprumo brooksprumo self-requested a review June 16, 2025 21:54
@roryharr
Copy link
Copy Markdown
Author

One caveat: They are marked obsolete if a newer account with the same pubkey is created and treated like an obsolete account just like any other account

Makes sense. As then they are no longer single-ref.

I'm still debugging an issue in this space: I'm finding when snapshots are made, occasionally there are fully dead slots (as in no valid accounts left) after filtering.

This issue occurs with this PR, or future work?

Future PR, definitely!. Just pointing out that there may be some special code related to obsolete accounts in fully dead slots to deal with that.

Copy link
Copy Markdown

@brooksprumo brooksprumo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good. Just a last question.

Comment thread accounts-db/src/accounts_db.rs
Copy link
Copy Markdown

@brooksprumo brooksprumo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@roryharr roryharr merged commit cbd3eb5 into anza-xyz:master Jun 17, 2025
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants