-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parquet page stats for float{16, 32, 64} #10982
Conversation
FYI there appears to be another PR open for this #10972 |
oh thx, haven't checked since yesterday... commented in there, happy both to continue with this one / close in favour of the other one. No further commits into this PR in my queue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice -- thank you again @tmi
make_data_page_stats_iterator!( | ||
MinFloat16DataPageStatsIterator, | ||
|x: &PageIndex<FixedLenByteArray>| { x.min.clone() }, | ||
Index::FIXED_LEN_BYTE_ARRAY, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exactly, basically caused by there being no f16 index in https://arrow.apache.org/rust/src/parquet/file/page_index/index.rs.html#74 ... I was thinking of mapping to f16 first earlier on, to make it cleaner... but this turned out to be the simplest I came up with.
#10972 (comment) is I think the best way to make this better eventually
@@ -614,6 +614,94 @@ async fn test_int_8() { | |||
.run(); | |||
} | |||
|
|||
#[tokio::test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 very nice tests
@@ -614,6 +614,94 @@ async fn test_int_8() { | |||
.run(); | |||
} | |||
|
|||
#[tokio::test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test is mostly a duplicate of test_float16
just that one has Check::RowGroup
vs Check::Both
... maybe one or the other can be updated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tshauck this is one of the things that were bugging me a bit, the other one being https://github.com/apache/datafusion/pull/10982/files#diff-7110f4709c105a18ef74a212396444d62052179a735d148fb62470a8b157fb40R749-R763 -- both are very repetitive
however, I didn't want to get overeager, only to realize later than the abstractions chosen was not the right one. Perhaps the best way forward would be to address #10952 first (which may also have its own "float16"-like tricky case), and then getting the correct macros for testing, index handling, etc. Speaking of that, would you like to take that one? I'd be happy to review then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a good plan to me -- let's merge this PR and then work on #10952
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #11000 to track duplication
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple comments if we want to go w/ this one around test duplication.
} | ||
|
||
#[tokio::test] | ||
async fn test_float_64() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a similar test_float64
in this file that could similarly be updated or removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tmi
Thanks this looks good - especially figuring out how to deal with f16.
Thanks again @tmi @marvinlanhenke and @tshauck |
Which issue does this PR close?
Closes #10951.
Rationale for this change
Just extends the existing functionality for parquet page data stats --
int
cases were covered previously, I just addfloat
cases in the same fashion.What changes are included in this PR?
FixedByteLength
of 2 bytes, which doesn't behave like a primitive type. In particular, I needed to add a.clone()
at a particular place, which required going fromident
toexpr
in a macro, affecting all cases. I'm very open to suggestions here -- I haven't come up with a better idea myself yet.Are these changes tested?
Yes, I extended the existing test coverage to the newly added cases.
Are there any user-facing changes?
No.