Capture ability of BufferedReader to provide contiguous min len buffers as a trait#6921
Capture ability of BufferedReader to provide contiguous min len buffers as a trait#6921kskalski merged 10 commits intoanza-xyz:masterfrom
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6921 +/- ##
=======================================
Coverage 83.2% 83.2%
=======================================
Files 853 853
Lines 375116 375172 +56
=======================================
+ Hits 312208 312273 +65
+ Misses 62908 62899 -9 🚀 New features to boost your workflow:
|
|
One note here: maybe we should also capture this code https://github.com/anza-xyz/agave/pull/6921/files#diff-47c90ea28df1e8cf4668e31f3096d7cd870014a5ba791767bdd789e047a63515R1089-R1110 into a dedicated trait function, e.g. something like |
|
I changed the approach - the new trait functions take required data len as parameter and there is also some code refactoring to make accounts scan only use the trait instead of directly reading from file on some occasions. |
bb6e20f to
6c6ab1f
Compare
|
Reduced the diff a bit more and I think this is fine to review now. Next step will be to share buffered reader (as |
👀 ok will take a look
Do we (and then why do we) want to actually share the buffered reader? |
|
I think it will be beneficial to share buffered reader even just to get rid of the overflow buffer re-allocation for each storage - this buffer can be allocated once per thread and be reused for the whole scan. Ideally the shared state can be captured as a single trait (will need to be a bit different than one in this PR - the overflow fn should be hidden as internal detail) - this way we can replace implementation to io_uring reader without even touching all those scan functions / files. |
brooksprumo
left a comment
There was a problem hiding this comment.
Ok, made a first pass. Will do another.
| /// | ||
| /// Returns `Err(io::ErrorKind::UnexpectedEof)` if the end of file is reached | ||
| /// before the required number of bytes is available. | ||
| fn fill_buf_required(&mut self, required_len: usize) -> io::Result<&[u8]>; |
There was a problem hiding this comment.
IMO the names for both of these functions is kinda weird. I don't have a good recommendation though, so I'm fine with leaving them as-is for now. We can always rename later if/when we have a better alternative.
705c5c2 to
02cd2a0
Compare
|
There were some bugs in the way capacity of overflow buffer was managed - it's fixed now and I tested it generating index with and without secondary index (thus triggering the two ways buffered reader is used).
Yeah, there is going to be some refinement in which methods we will need, in the next PR overflow buffer will be encapsulated into a wrapper struct and |
Do we have testing gaps then? |
Good question that I was also discussing with alessandro - it seems CI tests don't exercise large accounts-db data (in terms of bytes and number of variable sizes files). The bug here was due to unsafe code and memory corruption, but I agree it should be triggered in automatic tests. Separate question is about adding command to CLI tool that will make processing snapshots and acccounts-db possible to do separate from running validator. |
I see. Yeah, if it requires large accounts-db, that'll be harder to do in a unit test.
|
We should have at least one test here that exercises the overflow buffer via a max size data account and variable sized files: agave/accounts-db/src/append_vec.rs Lines 1684 to 1728 in b4ba976 Was the bug behavior you encountered not triggered by this? Is the bug currently present in |
|
The nature of the bug is that Hm... actually, this bug exists already in the old code on master, not sure why it wasn't causing trouble so far, but basically it is a silent memory corruption. |
|
Looking closer at master's code, it is a bit misleading as it calculates additional reserve using capacity, it actually keeps the invariant that |
updated |
brooksprumo
left a comment
There was a problem hiding this comment.
![]()
Please wait for approval from @cpubot before merging.
Problem
As part of a larger change to provide io_uring replacement for BufferedReader I want to capture its functionality and API as a well-defined trait(s).
Some uses of BufferedReader are also mixed-up with extra reads done through
read_into_bufferto overflow buffers - that could be better encapsulated in BufferedReader impl and trait.Summary of Changes
This PR adds
ContiguousBufFileReadthat addresses the functionalities of:fill_buf_requiredfill_buf_required_or_overflowget_file_offset()