Skip to content

Adds Offset to callback on scan_accounts() family#6744

Merged
brooksprumo merged 1 commit intoanza-xyz:masterfrom
brooksprumo:accounts/scan-with-offset
Jun 26, 2025
Merged

Adds Offset to callback on scan_accounts() family#6744
brooksprumo merged 1 commit intoanza-xyz:masterfrom
brooksprumo:accounts/scan-with-offset

Conversation

@brooksprumo
Copy link
Copy Markdown

Problem

When building the accounts index, if secondary indexes are on, we scan each account storage file twice. This is bad, performance wise. The second scan is needed to get each account's data, which secondary indexes need.

In theory we should be able to switch on if secondary indexes are enabled or not, and then decide how to scan the storage. E.g. if secondary indexes are on, scan once with account data. If secondary indexes are off, scan once without account data.

The problem this PR is solving is that there isn't a scan function that returns both account data and all the info the index needs. IOW, when secondary indexes are on, there's not a scan function available for us to use.

Summary of Changes

We observe that the only missing piece of information while scanning with account data is the offset of the account within the file. We also observe that the implementations of the scan functions do already have the offset information! So, we can bubble that information up, which will allow the next PR to use if for speeding up building the accounts index.

Tl;dr: Add an Offset field to the scan_accounts() family of functions.

@brooksprumo brooksprumo self-assigned this Jun 25, 2025
Comment on lines +323 to +326
pub fn scan_accounts(
&self,
callback: impl for<'local> FnMut(Offset, StoredAccountInfo<'local>),
) {
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the main change. Note how callback now takes an Offset parameter. See scan_accounts_without_data() above, it is the same there now too.

Copy link
Copy Markdown

@roryharr roryharr Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to just add it to StoredAccountInfo/StoredAccountInfoWithoutData instead?
It's the return type from the scan function and from what i can tell the other uses are get_stored_account_callback family of calls which supply the value, so it's not unknown/irrelevant information in any of those locations.

Copy link
Copy Markdown
Author

@brooksprumo brooksprumo Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, originally I did put the offset inside. I ended up pulling it out because of the get_stored_account_callback() functions. It didn't sit well with me that we called the function with the offset, then we also returned the offset inside the StoredAccountInfo. Are they guaranteed to be equal? Do I need to assert they are equal? Could it ever change? Do I always need to check the code every time?

Not having the offset in the StoredAccountInfo felt cleaner, and also bypassed all those questions/issues.

Other thing was that I intended the StoredAccountInfo types to ideally mirror the Account/AccountSharedData types. Meaning, only the actual account field and not the storage implementation. I was kinda against adding Offset at all, since we are effectively exposing the storage internals, but we do rely on this info for the index. The other alternative would be to add another scan_index() function that returned the account data too. I'd prefer we don't add more scan functions if we don't need to. With the current impl, I think we can remove scan_index() entirely.

This seemed like the best compromise to me.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the next step will then be changing generate_index to use scan_accounts and scan_accounts_no_data? Does that mean this will end up removing scan_index entirely?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the next step will then be changing generate_index to use scan_accounts and scan_accounts_no_data? Does that mean this will end up removing scan_index entirely?

I'd like to do that, yes.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Less scan functions is great!

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #6783 removes scan_index().

mut callback: impl for<'local> FnMut(Offset, StoredAccountInfo<'local>),
) {
self.scan_accounts_stored_meta(|stored_account_meta| {
let offset = stored_account_meta.offset();
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How convenient! We already have the offset, so no additional work needed to surface this to the caller.

if secondary {
// scan storage a second time to update the secondary index
storage.accounts.scan_accounts(|stored_account| {
storage.accounts.scan_accounts(|_offset, stored_account| {
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the actual scan call when building the accounts index when secondary indexes are used. The next PR will leverage the new Offset parameter to combine this scan with the other scan a few lines up.

@brooksprumo brooksprumo marked this pull request as ready for review June 25, 2025 22:09
@brooksprumo brooksprumo requested review from HaoranYi and roryharr June 25, 2025 22:09
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

Attention: Patch coverage is 86.04651% with 6 lines in your changes missing coverage. Please review.

Project coverage is 83.4%. Comparing base (e682463) to head (468a5ab).
Report is 13 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #6744   +/-   ##
=======================================
  Coverage    83.4%    83.4%           
=======================================
  Files         850      850           
  Lines      377619   377628    +9     
=======================================
+ Hits       315000   315008    +8     
- Misses      62619    62620    +1     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread accounts-db/src/tiered_storage/hot.rs
Comment thread accounts-db/src/tiered_storage/hot.rs
Copy link
Copy Markdown

@HaoranYi HaoranYi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the details on the problem description.

Overall looks good to me. Just a few suggestions. Those can be done in this pr or a follow up pr.

@brooksprumo brooksprumo merged commit b979967 into anza-xyz:master Jun 26, 2025
28 checks passed
@brooksprumo brooksprumo deleted the accounts/scan-with-offset branch June 26, 2025 16:52
@mergify
Copy link
Copy Markdown

mergify Bot commented Jun 26, 2025

Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis.

mergify Bot pushed a commit that referenced this pull request Jun 26, 2025
(cherry picked from commit b979967)

# Conflicts:
#	accounts-db/src/tiered_storage/hot.rs
#	accounts-db/src/tiered_storage/readable.rs
brooksprumo added a commit that referenced this pull request Jul 9, 2025
…#6744) (#6755)

* Adds `Offset` to callback on scan_accounts() family (#6744)

(cherry picked from commit b979967)

# Conflicts:
#	accounts-db/src/tiered_storage/hot.rs
#	accounts-db/src/tiered_storage/readable.rs

* resolves merge conflicts

---------

Co-authored-by: Brooks <brooks@anza.xyz>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants