Parquet Dictionary Predicate Pushdown Fixes#13594
Merged
zhenxiao merged 2 commits intoprestodb:masterfrom Oct 24, 2019
Merged
Conversation
Commit 0f7982b refactored ParquetPredicateUtils.getDictionaries from getDictionariesByColumnOrdinal, removing a nested loop iteration but accidentally leaving in a break statement. The effect has been that at most 1 dictionary was returned from getDictionaries, limiting the effectiveness of predicate pushdown on dictionaries.
No more parquet dictionaries need to be read once a dictionary predicate pushdown check succeeds.
|
@pettyjamesm does this fix #13457 ? |
Contributor
Author
Unlikely, the query referenced in that issue has no predicates and the bug it does fix was introduced in 0.174 |
zhenxiao
approved these changes
Oct 24, 2019
Collaborator
zhenxiao
left a comment
There was a problem hiding this comment.
nice catch, @pettyjamesm
This was referenced Nov 11, 2019
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Port of trinodb/trino#1846.
Parquet dictionary pushdown was refactored in #6892 to remove a nested loop iteration but accidentally left the inner loop break statement behind. This meant that dictionary predicate pushdown would read at most 1 dictionary.
In addition to fixing the pushdown behavior, this PR adds support for checking the dictionary pushdown on each column skipping additional dictionary reads once the block can already be filtered.