Skip to content

Fix Parquet predicate pushdown for smallint, tinyint#12408

Merged
nezihyigitbasi merged 6 commits intoprestodb:masterfrom
nezihyigitbasi:backport-131
Mar 28, 2019
Merged

Fix Parquet predicate pushdown for smallint, tinyint#12408
nezihyigitbasi merged 6 commits intoprestodb:masterfrom
nezihyigitbasi:backport-131

Conversation

@nezihyigitbasi
Copy link
Contributor

@nezihyigitbasi nezihyigitbasi commented Mar 1, 2019

Backports trinodb/trino#131
Original author is @findepi couldn't retain authorship while applying the patch, but added the original author to all commit messages.

@nezihyigitbasi
Copy link
Contributor Author

@arhimondr @wenleix do you have some cycles to take a look at this one?

@arhimondr
Copy link
Member

@nezihyigitbasi I can get to this only next week.

Copy link

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments: it would be much easier to review if we:

  • merge the 1st and 2nd commits (the logic is to move the meat from the 2nd commit to the 1st one)
  • merge the 3rd and 4th commits (the function removal should come together)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (effectivePredicateDomain != null && effectivePredicateDomain.intersect(domain).isNone()) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to keep the original commit as is, and than if needed address the nits in a separate commit

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as abovr

@arhimondr
Copy link
Member

arhimondr commented Mar 14, 2019

Use known Presto Type instead of reconstructing

This commit looks different in both sources. Is this a merge resolution artifact? Or was it changed deliberately?

Copy link
Member

@arhimondr arhimondr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't review the code itself, as i don't have too much context in Parquet. Did the human diff though.

Use known Presto Type instead of reconstructing

This commit looks different in both sources. Is this a merge resolution artifact? Or was it changed deliberately? If this is deliberate change, could you please extract it to a separate commit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove redundant else

nit: original commit had a space here

findepi added 6 commits March 18, 2019 10:48
This is refactoring commit. It is supposed not to change code behavior
regardless of anything, unless introduced short-circuiting prevents an
exception from being thrown.
This commit changes code behavior in the unlikely edge case where:

1. Domain constructed from file stats is empty (`isNone()`)
2. we did not skip the file/stripe based on `numberOfRows == 0`
   condition.
3. there is no effective predicate for a column

Previously, the code would construct realize the `Domain` constructed
from file stats `isNone()`. Now, the `Domain` construction will not be
attempted, as the code short-circuits when there no effective predicate
for a column. This change is based on assumption that the `Domain`
constructed from file stats cannot `isNone()` in any practical
situation.
When creating `Domain` from file stats, we need to know actual Presto
Type of the column, as Types must match for `Domain#intersect`.

Instead of reconstructing the type based on the type in the file, which
is not always possible, as Parquet does not distinguish between INTEGER,
SMALLINT etc, use the Type passed with effective predicate.
@nezihyigitbasi
Copy link
Contributor Author

This commit looks different in both sources. Is this a merge resolution artifact? Or was it changed deliberately? If this is deliberate change, could you please extract it to a separate commit?

Removed those additional (minor/refactoring) changes.

@nezihyigitbasi nezihyigitbasi merged commit 29b1a37 into prestodb:master Mar 28, 2019
@nezihyigitbasi nezihyigitbasi deleted the backport-131 branch March 28, 2019 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants