Skip to content

Move append_vec buffered reader overflow handling to a struct#7058

Merged
kskalski merged 9 commits into
anza-xyz:masterfrom
kskalski:ks/buf_overflow
Jul 25, 2025
Merged

Move append_vec buffered reader overflow handling to a struct#7058
kskalski merged 9 commits into
anza-xyz:masterfrom
kskalski:ks/buf_overflow

Conversation

@kskalski
Copy link
Copy Markdown

@kskalski kskalski commented Jul 21, 2025

Problem

Ownership, handling and capacity management of overflow buffer for filling contiguous account data during full accounts-db scan is interleaved between buffered_reader and append_vec modules.
This:

  • is a bit hard to read and to verify optimal handling of overflow capacity
  • prevents reader from being shared across calls to scan_accounts_stored_meta
  • pollutes the API that provides contiguous slices of account bytes

Summary of Changes

  • new struct BufReaderWithOverflow is added that encapsulates the dynamically allocated overflow buffer with capacity limits
  • overflow buffer capacity is additionally capped by file size
  • fill_buf_required_or_overflow function is removed and ContiguousBufFileRead trait is renamed to RequiredLenBufRead
  • FileBufRead trait is separated from ContiguousBufFileRead to independently provide file offset tracking (this is also going to support multi-file navigation needed to share buffer across files)
  • overflow buffer filling functionality is moved to BufReaderWithOverflow and BufferedReader's Read impl, which fills provided buffer with existing valid bytes + read directly from file into provided buf

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jul 21, 2025

Codecov Report

Attention: Patch coverage is 98.36066% with 3 lines in your changes missing coverage. Please review.

Project coverage is 83.2%. Comparing base (82d3802) to head (10a2f0a).
Report is 23 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #7058   +/-   ##
=======================================
  Coverage    83.2%    83.2%           
=======================================
  Files         853      853           
  Lines      374799   374882   +83     
=======================================
+ Hits       311897   311982   +85     
+ Misses      62902    62900    -2     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kskalski kskalski marked this pull request as ready for review July 21, 2025 17:26
@kskalski kskalski requested review from brooksprumo and cpubot July 21, 2025 17:47
@brooksprumo
Copy link
Copy Markdown

Copy link
Copy Markdown

@brooksprumo brooksprumo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. I still plan to do another pass.

Also, what's the plan/next steps for this code? Are there further refactorings coming?

Comment thread accounts-db/src/append_vec.rs Outdated
Comment thread accounts-db/src/buffered_reader.rs Outdated
Comment thread accounts-db/src/buffered_reader.rs
@brooksprumo brooksprumo self-requested a review July 22, 2025 14:28
@kskalski
Copy link
Copy Markdown
Author

We are getting close to the target shape.
The next PR on top of this one is finally able to reuse the buffer across multiple storages scan: #7080

I know I've been going back and forth with this code and traits, but it's probably an easier to understand way than dumping both refactor and functional changes into a single PR.

Attention: Patch coverage is 81.11888% with 27 lines in your changes missing coverage. Please review.

Are these code paths missing test coverage?

* https://app.codecov.io/gh/anza-xyz/agave/pull/7058#7f852b49f6e37e268e884d837dac1d84-R184

* https://app.codecov.io/gh/anza-xyz/agave/pull/7058#7f852b49f6e37e268e884d837dac1d84-R278

* https://app.codecov.io/gh/anza-xyz/agave/pull/7058#7f852b49f6e37e268e884d837dac1d84-R300

* https://app.codecov.io/gh/anza-xyz/agave/pull/7058#7f852b49f6e37e268e884d837dac1d84-R348

Right, those code-paths are not used in prod and are there mostly for satisfying trait inheritance, but they should be correct - I added a test that mixes all 3 kinds of access patterns (Read, BufRead and RequiredLenBufRead), though I'm not sure how to access a new coverage report with the new changes...

@kskalski
Copy link
Copy Markdown
Author

kskalski commented Jul 22, 2025

And the further work will be to implement FileBufRead using io_uring multi-file read-ahead reader (needs rebase and updates, but the basic functionality is already in #6878), which paired with BufReaderWithOverflow can be used for scanning full accounts storage data with background IO.

Copy link
Copy Markdown

@brooksprumo brooksprumo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@kskalski went over the PR with me on zoom. Thanks!

@kskalski kskalski merged commit 7baf884 into anza-xyz:master Jul 25, 2025
41 checks passed
@kskalski kskalski deleted the ks/buf_overflow branch July 25, 2025 09:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants