Parquet Dictionary Predicate Pushdown Fixes#1846
Merged
martint merged 2 commits intotrinodb:masterfrom Oct 24, 2019
Merged
Conversation
martint
approved these changes
Oct 23, 2019
Member
martint
left a comment
There was a problem hiding this comment.
Thanks! I'll merge it once the tests pass
findepi
reviewed
Oct 23, 2019
presto-parquet/src/main/java/io/prestosql/parquet/predicate/PredicateUtils.java
Outdated
Show resolved
Hide resolved
59a85e6 to
9a8bd7a
Compare
findepi
approved these changes
Oct 23, 2019
Member
There was a problem hiding this comment.
"Early" is no longer as clear as it was, since there is no "normal"
Member
There was a problem hiding this comment.
nit: ideally code cleanup should go in separate commit
Member
There was a problem hiding this comment.
nit: ideally code cleanup should go in separate commit
Member
Author
|
Test failure is from a stuck container, don’t think it’s related to the changes but I haven’t had a chance to look harder yet. |
Commit 0f7982b refactored ParquetPredicateUtils.getDictionaries from getDictionariesByColumnOrdinal, removing a nested loop iteration but accidentally leaving in a break statement. The effect has been that at most 1 dictionary was returned from getDictionaries, limiting the effectiveness of predicate pushdown on dictionaries.
No more parquet dictionaries need to be read once a dictionary predicate pushdown check succeeds.
9a8bd7a to
ecba77c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Parquet dictionary pushdown was refactored in prestodb/presto#6892 to remove a nested loop iteration but accidentally left the inner loop
breakstatement behind. This meant that dictionary predicate pushdown would read at most 1 dictionary.In addition to fixing the pushdown behavior, this PR adds support for checking the dictionary pushdown on each column skipping additional dictionary reads once the block can already be filtered.