Skip to content

Parquet Dictionary Predicate Pushdown Fixes#1846

Merged
martint merged 2 commits intotrinodb:masterfrom
pettyjamesm:parquet-predicate-fix
Oct 24, 2019
Merged

Parquet Dictionary Predicate Pushdown Fixes#1846
martint merged 2 commits intotrinodb:masterfrom
pettyjamesm:parquet-predicate-fix

Conversation

@pettyjamesm
Copy link
Copy Markdown
Member

Parquet dictionary pushdown was refactored in prestodb/presto#6892 to remove a nested loop iteration but accidentally left the inner loop break statement behind. This meant that dictionary predicate pushdown would read at most 1 dictionary.

In addition to fixing the pushdown behavior, this PR adds support for checking the dictionary pushdown on each column skipping additional dictionary reads once the block can already be filtered.

@cla-bot cla-bot bot added the cla-signed label Oct 23, 2019
Copy link
Copy Markdown
Member

@martint martint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'll merge it once the tests pass

@martint martint self-assigned this Oct 23, 2019
@pettyjamesm pettyjamesm force-pushed the parquet-predicate-fix branch from 59a85e6 to 9a8bd7a Compare October 23, 2019 18:44
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Early" is no longer as clear as it was, since there is no "normal"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ideally code cleanup should go in separate commit

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ideally code cleanup should go in separate commit

@pettyjamesm
Copy link
Copy Markdown
Member Author

Test failure is from a stuck container, don’t think it’s related to the changes but I haven’t had a chance to look harder yet.

Commit 0f7982b refactored ParquetPredicateUtils.getDictionaries from
getDictionariesByColumnOrdinal, removing a nested loop iteration but
accidentally leaving in a break statement. The effect has been that
at most 1 dictionary was returned from getDictionaries, limiting the
effectiveness of predicate pushdown on dictionaries.
No more parquet dictionaries need to be read once a dictionary
predicate pushdown check succeeds.
@pettyjamesm pettyjamesm force-pushed the parquet-predicate-fix branch from 9a8bd7a to ecba77c Compare October 24, 2019 15:54
@martint martint merged commit ca87e6b into trinodb:master Oct 24, 2019
@pettyjamesm pettyjamesm deleted the parquet-predicate-fix branch October 24, 2019 19:35
@martint martint mentioned this pull request Oct 31, 2019
5 tasks
@martint martint added this to the 324 milestone Nov 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants