-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-7681: [Rust] Explicitly seeking a BufReader will discard the internal buffer (2) #6949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I would like to add a test on a parquet file with many lines because my first commit passed all tests but was buggy... Where should I add this file ? |
The short answer is here but you should mention it on the mailing list first. There is reluctance to adding binary data I believe. The preference is to make progress on the writer implementation so that any test files can be generated. There has been some initial progress on this here. If there is reluctance to add the file you can comment out the test and open up a JIRA to follow up once the writer is implemented I guess. |
|
In the end I did not go for the end to end test with a Parquet file as this is kind of a more general concern. I added a "byte level" test though to maintain good coverage! Hope the MacOS CI will not go nuts this time ;-) |
sunchao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry just noticed this! LGTM with a few nits
rust/parquet/src/util/io.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not super useful as it requires ppl to jump into BufReader to check the documentation - maybe add additional comments on what these two fields are for?
rust/parquet/src/util/io.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe we can extract this into a util function?
|
I'll work on your comments today. What about the problem we are trying to fix here? Do you agree with the benefits of this fix ? Also, I'm not sure why a |
|
Yes I think it is beneficial to avoid dropping buffers with
Originally we designed it this way so that we can concurrently read multiple column chunks after obtaining file handle from a single row group. Since the file handle is shared between these we wanted to provide thread safety on top of it. |
This should have been cought by tests, need test on large column
I'm not sure to understand how this could be the concern of the |
Is file thread-safe? it's not obvious when reading the doc. Plus, here type parameter |
|
I completely agree ! What I am saying is that the layer that makes the handle thread safe cannot be the But this question of multi-threading seems to be a whole other concern :-) I've made the nit changes you've proposed and improved the tests. |
|
Merged. Thanks @rdettai ! |
|
It seems that linting for rust was failing here? (and now on master) |
I'll fix it shortly @jorisvandenbossche |
|
Oops. Sorry my bad. Thanks @nevi-me ! |
|
My bad, I thought it was an issue with the linting CI ! |
A fix to 7681 that does not use nightly (as oposed to #6280).