Fix Parquet predicate pushdown for smallint, tinyint#12408
Fix Parquet predicate pushdown for smallint, tinyint#12408nezihyigitbasi merged 6 commits intoprestodb:masterfrom
Conversation
838ed6f to
06001ed
Compare
|
@arhimondr @wenleix do you have some cycles to take a look at this one? |
|
@nezihyigitbasi I can get to this only next week. |
06001ed to
8a75d69
Compare
highker
left a comment
There was a problem hiding this comment.
Minor comments: it would be much easier to review if we:
- merge the 1st and 2nd commits (the logic is to move the meat from the 2nd commit to the 1st one)
- merge the 3rd and 4th commits (the function removal should come together)
There was a problem hiding this comment.
if (effectivePredicateDomain != null && effectivePredicateDomain.intersect(domain).isNone()) {There was a problem hiding this comment.
I would suggest to keep the original commit as is, and than if needed address the nits in a separate commit
This commit looks different in both sources. Is this a merge resolution artifact? Or was it changed deliberately? |
arhimondr
left a comment
There was a problem hiding this comment.
Didn't review the code itself, as i don't have too much context in Parquet. Did the human diff though.
Use known Presto Type instead of reconstructing
This commit looks different in both sources. Is this a merge resolution artifact? Or was it changed deliberately? If this is deliberate change, could you please extract it to a separate commit?
There was a problem hiding this comment.
Remove redundant
else
nit: original commit had a space here
This is refactoring commit. It is supposed not to change code behavior regardless of anything, unless introduced short-circuiting prevents an exception from being thrown.
This commit changes code behavior in the unlikely edge case where: 1. Domain constructed from file stats is empty (`isNone()`) 2. we did not skip the file/stripe based on `numberOfRows == 0` condition. 3. there is no effective predicate for a column Previously, the code would construct realize the `Domain` constructed from file stats `isNone()`. Now, the `Domain` construction will not be attempted, as the code short-circuits when there no effective predicate for a column. This change is based on assumption that the `Domain` constructed from file stats cannot `isNone()` in any practical situation.
When creating `Domain` from file stats, we need to know actual Presto Type of the column, as Types must match for `Domain#intersect`. Instead of reconstructing the type based on the type in the file, which is not always possible, as Parquet does not distinguish between INTEGER, SMALLINT etc, use the Type passed with effective predicate.
8a75d69 to
7a6dffd
Compare
Removed those additional (minor/refactoring) changes. |
Backports trinodb/trino#131
Original author is @findepi
couldn't retain authorship while applying the patch, but added the original author to all commit messages.