feat(parquet): Add boolean rle decoder for Parquet#11282
feat(parquet): Add boolean rle decoder for Parquet#11282jkhaliqi wants to merge 1 commit intofacebookincubator:mainfrom
Conversation
✅ Deploy Preview for meta-velox canceled.
|
358855b to
f638683
Compare
f638683 to
a259672
Compare
f5735ba to
163bdb3
Compare
a8f69b9 to
c25886f
Compare
e8e82c4 to
c760214
Compare
|
@yingsu00 can you also take a look at this PR? Thank you! |
c760214 to
5b72579
Compare
99e8783 to
4b8412d
Compare
There was a problem hiding this comment.
The magic number 4 is used multiple times. Please make it a static const here and use an appropriate name for it.
You don't need super:: here. There is no ambiguity here.
There was a problem hiding this comment.
Nit: Do we need this comment once the 4 is not magic number anymore?
There was a problem hiding this comment.
Right removing comment since it is not necessary, thank you!
There was a problem hiding this comment.
This should be a constexpr.
There was a problem hiding this comment.
updated to constexpr, thank you!
There was a problem hiding this comment.
The function itself doesn't need to be constexpr. It is the if condition that should be constexpr.
if constexpr (hasNulls)
That means if the template argument is false this if expression is not generated.
There was a problem hiding this comment.
Shouldn't this be RleBpDecoder::skip(numValues) to disambiguate the function from this->skip(numValues)?
There was a problem hiding this comment.
Lets also initialize it to 0.
There was a problem hiding this comment.
Why is the size of the vector 20?
There was a problem hiding this comment.
Sorry was using this output buffer for some other testing since it's not being used anymore will delete this line of code
There was a problem hiding this comment.
There is no problem if toSkip > 0 but we already advanced current by 1 on line 97? I suppose this someting about what visitor represents? This might need a comment to explain why this is ok.
Or maybe some comment on how the algorithm works when the read occurs.
There was a problem hiding this comment.
This needs to be named numBytes_.
There was a problem hiding this comment.
Same comment above about using super vs the base class name to disambiguate.
4b8412d to
b850e02
Compare
There was a problem hiding this comment.
Now that you've replaced super with the actual base class we don't need this anymore. I did not see that you defined super here. This explains why it was working before. But someone not familiar with Java would be confused so better to be explicit.
There was a problem hiding this comment.
Thank you removed this line!
There was a problem hiding this comment.
Lets replace the std::to_string with fmt provided in the VELOX_FAIL like so:
VELOX_FAIL("Received invalid length : {} (corrupt data page?)", len);
for all occurrences.
There was a problem hiding this comment.
Thank you, updated!
There was a problem hiding this comment.
The function itself doesn't need to be constexpr. It is the if condition that should be constexpr.
if constexpr (hasNulls)
That means if the template argument is false this if expression is not generated.
There was a problem hiding this comment.
len is not modified here and we don't need a reference.
There was a problem hiding this comment.
updated, thank you!
There was a problem hiding this comment.
What is not clear to me is why you modify the base class member here. You have your own bufferStart_ so why not processing and modifying this using the base class methods (which you have to some degree).
The base class member (with the same name) is initialized in the constructor. But because you declared a new member of the same name that is never used on line 118 you need to explicitly refer to the base class here when this member is inherited.
There was a problem hiding this comment.
Good catch forgot to remove my own bufferStart_ which was being used for something else cleaned up the code with removing it, thank you!
6bbaab0 to
27ad3f1
Compare
84cb440 to
2350714
Compare
|
@majetideepak should we have another pass? |
2350714 to
11480f3
Compare
|
@jkhaliqi can you clarify why we cannot use the existing |
|
@majetideepak Seems to be too many errors while using |
11480f3 to
cceacf7
Compare
|
@jkhaliqi if the fastPath is not supported for bool type, we can skip that via |
b6acfb9 to
f842ecb
Compare
f842ecb to
0aa42e0
Compare
0aa42e0 to
89602bf
Compare
majetideepak
left a comment
There was a problem hiding this comment.
@jkhaliqi Two minor comments. This looks good!
89602bf to
795af8c
Compare
Co-authored-by: Minhan Cao <minhan.duc.cao@gmail.com>
795af8c to
573b8af
Compare
|
@kevinwilfong has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
@kevinwilfong merged this pull request in 9b2ad44. |
RLE/BP is an Encoding for Boolean values for Parquet Version 2 files.
https://parquet.apache.org/docs/file-format/data-pages/encodings/
Fixes: #10943