Skip to content

Reuse buffered reader with overflow across full accounts storage scans#7080

Merged
kskalski merged 22 commits into
anza-xyz:masterfrom
kskalski:ks/share_bufr
Aug 7, 2025
Merged

Reuse buffered reader with overflow across full accounts storage scans#7080
kskalski merged 22 commits into
anza-xyz:masterfrom
kskalski:ks/share_bufr

Conversation

@kskalski
Copy link
Copy Markdown

@kskalski kskalski commented Jul 22, 2025

Problem

Currently full accounts scan creates buffered reader with overflow capacity for each storage and discards it after that storage is done processing. This also drops any additional state or work that might have been done by the reader implementation - none is done by current BufferedReader, but it prevents us from using multi-file read-ahead implementation of FileBufRead there.

Summary of Changes

  • remove &File from mandatory args of BufferedReader constructore, use Option<&file> in reader state
  • add set_file(file: &File, read_limit: usize) function to FileBufRead trait - this activates the file to be read from and resets buffer state and file offset
  • update scan iterations structure to create single buffered reader per work unit (usually this is dynamically defined by rayon par_iter) and pass it around to each storage scan

Reusing the reader saves some allocations made when reading large accounts data, but it is in-frequent enough that this PR is performance neutral:

master
rebuild bank from snapshots took 186.3s
startup_verify_accounts total_us=113899487i verify_accounts_lt_hash_us=113899484i

pr
rebuild bank from snapshots took 183.0s
startup_verify_accounts total_us=113460880i verify_accounts_lt_hash_us=113460878i

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jul 23, 2025

Codecov Report

❌ Patch coverage is 91.52542% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.2%. Comparing base (228732f) to head (5e764a9).
⚠️ Report is 3 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##           master    #7080     +/-   ##
=========================================
- Coverage    83.2%    83.2%   -0.1%     
=========================================
  Files         796      796             
  Lines      361923   361996     +73     
=========================================
+ Hits       301346   301394     +48     
- Misses      60577    60602     +25     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kskalski kskalski force-pushed the ks/share_bufr branch 4 times, most recently from f984c6a to 6c0a181 Compare July 30, 2025 08:45
@mergify
Copy link
Copy Markdown

mergify Bot commented Jul 30, 2025

⚠️ The sha of the head commit of this PR conflicts with #7226. Mergify cannot evaluate rules on this PR. ⚠️

HaoranYi
HaoranYi previously approved these changes Jul 31, 2025
Copy link
Copy Markdown

@HaoranYi HaoranYi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Reusing buffers instead of creating new ones for each storage seems reasonable.

Comment thread runtime/src/bank/accounts_lt_hash.rs Outdated
Comment thread accounts-db/src/account_storage.rs Outdated
Comment thread accounts-db/src/append_vec.rs Outdated
Comment thread accounts-db/src/append_vec.rs Outdated
Comment thread accounts-db/src/accounts_file.rs Outdated
Comment thread accounts-db/src/accounts_db/tests.rs Outdated
Comment thread accounts-db/src/accounts_db/geyser_plugin_utils.rs Outdated
Comment thread accounts-db/src/accounts_db/geyser_plugin_utils.rs Outdated
Comment thread accounts-db/src/accounts_db.rs Outdated
Comment thread accounts-db/src/accounts_db.rs Outdated
@kskalski kskalski force-pushed the ks/share_bufr branch 2 times, most recently from 730ed5f to df6f178 Compare August 7, 2025 13:48
@brooksprumo brooksprumo self-requested a review August 7, 2025 13:53
Copy link
Copy Markdown

@brooksprumo brooksprumo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! One nit then :shipit:

Comment thread accounts-db/src/accounts_db.rs Outdated
@brooksprumo
Copy link
Copy Markdown

Also, if you can avoid force-pushing/rebasing, it makes reviewing easier.

Here's an example. With the force-push, GitHub doesn't remember what commits I've already seen, so they are all marked as new:
Screenshot 2025-08-07 at 4 02 32 PM

This PR wasn't too bad, but it can make some reviews much harder, as then I have to go back and double check more stuff to ensure nothing else was changed. My default is to not force-push/rebase unless absolutely necessary (e.g. to resolve a merge conflict, or pick up a change from another PR that this PR depends on).

Copy link
Copy Markdown

@brooksprumo brooksprumo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@kskalski
Copy link
Copy Markdown
Author

kskalski commented Aug 7, 2025

Also, if you can avoid force-pushing/rebasing, it makes reviewing easier.

True, I try to avoid it, but with some of these changes conflicts pop up quite often and today there was a really painful issue with CI, so I rebased on pick up a fix and let the CI pass.

@kskalski kskalski merged commit 5e4c0cb into anza-xyz:master Aug 7, 2025
40 checks passed
@kskalski kskalski deleted the ks/share_bufr branch August 8, 2025 06:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants