Adds Offset to callback on scan_accounts() family#6744
Adds Offset to callback on scan_accounts() family#6744brooksprumo merged 1 commit intoanza-xyz:masterfrom
Offset to callback on scan_accounts() family#6744Conversation
| pub fn scan_accounts( | ||
| &self, | ||
| callback: impl for<'local> FnMut(Offset, StoredAccountInfo<'local>), | ||
| ) { |
There was a problem hiding this comment.
Here's the main change. Note how callback now takes an Offset parameter. See scan_accounts_without_data() above, it is the same there now too.
There was a problem hiding this comment.
Is there a reason not to just add it to StoredAccountInfo/StoredAccountInfoWithoutData instead?
It's the return type from the scan function and from what i can tell the other uses are get_stored_account_callback family of calls which supply the value, so it's not unknown/irrelevant information in any of those locations.
There was a problem hiding this comment.
Yeah, originally I did put the offset inside. I ended up pulling it out because of the get_stored_account_callback() functions. It didn't sit well with me that we called the function with the offset, then we also returned the offset inside the StoredAccountInfo. Are they guaranteed to be equal? Do I need to assert they are equal? Could it ever change? Do I always need to check the code every time?
Not having the offset in the StoredAccountInfo felt cleaner, and also bypassed all those questions/issues.
Other thing was that I intended the StoredAccountInfo types to ideally mirror the Account/AccountSharedData types. Meaning, only the actual account field and not the storage implementation. I was kinda against adding Offset at all, since we are effectively exposing the storage internals, but we do rely on this info for the index. The other alternative would be to add another scan_index() function that returned the account data too. I'd prefer we don't add more scan functions if we don't need to. With the current impl, I think we can remove scan_index() entirely.
This seemed like the best compromise to me.
There was a problem hiding this comment.
I assume the next step will then be changing generate_index to use scan_accounts and scan_accounts_no_data? Does that mean this will end up removing scan_index entirely?
There was a problem hiding this comment.
I assume the next step will then be changing generate_index to use scan_accounts and scan_accounts_no_data? Does that mean this will end up removing scan_index entirely?
I'd like to do that, yes.
| mut callback: impl for<'local> FnMut(Offset, StoredAccountInfo<'local>), | ||
| ) { | ||
| self.scan_accounts_stored_meta(|stored_account_meta| { | ||
| let offset = stored_account_meta.offset(); |
There was a problem hiding this comment.
How convenient! We already have the offset, so no additional work needed to surface this to the caller.
| if secondary { | ||
| // scan storage a second time to update the secondary index | ||
| storage.accounts.scan_accounts(|stored_account| { | ||
| storage.accounts.scan_accounts(|_offset, stored_account| { |
There was a problem hiding this comment.
This is the actual scan call when building the accounts index when secondary indexes are used. The next PR will leverage the new Offset parameter to combine this scan with the other scan a few lines up.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6744 +/- ##
=======================================
Coverage 83.4% 83.4%
=======================================
Files 850 850
Lines 377619 377628 +9
=======================================
+ Hits 315000 315008 +8
- Misses 62619 62620 +1 🚀 New features to boost your workflow:
|
|
Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis. |
Problem
When building the accounts index, if secondary indexes are on, we scan each account storage file twice. This is bad, performance wise. The second scan is needed to get each account's data, which secondary indexes need.
In theory we should be able to switch on if secondary indexes are enabled or not, and then decide how to scan the storage. E.g. if secondary indexes are on, scan once with account data. If secondary indexes are off, scan once without account data.
The problem this PR is solving is that there isn't a scan function that returns both account data and all the info the index needs. IOW, when secondary indexes are on, there's not a scan function available for us to use.
Summary of Changes
We observe that the only missing piece of information while scanning with account data is the offset of the account within the file. We also observe that the implementations of the scan functions do already have the offset information! So, we can bubble that information up, which will allow the next PR to use if for speeding up building the accounts index.
Tl;dr: Add an
Offsetfield to thescan_accounts()family of functions.