-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Implement predicate push down for parquet dereference column in Iceberg #17133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement predicate push down for parquet dereference column in Iceberg #17133
Conversation
d296d47 to
b9ae609
Compare
b9ae609 to
bf77929
Compare
|
Fixing product test: "2023-04-28T05:52:04.9369644Z tests | 2023-04-28 11:37:04 INFO: FAILURE / io.trino.tests.product.iceberg.TestIcebergSparkCompatibility.testIdBasedFieldMapping [PARQUET, 2] (Groups: iceberg_jdbc, profile_specific_tests, iceberg, iceberg_rest) took 3.7 seconds |
bf77929 to
2323605
Compare
|
I fixed the bug and submitted the PR for another check |
1194880 to
234a076
Compare
...rg/src/test/java/io/trino/plugin/iceberg/TestIcebergParquetComplexTypePredicatePushDown.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assertNoDataRead is currently a bit misleading because it builds on processedInputDataSize and not physicalInputDataSize
However this change relieves the engine of dealing with additional computations when the data leaves the parquet reader. Very good catch @leetcode-1533
No change requested in this PR
|
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
|
👋 @leetcode-1533 @findinpath @findepi - this PR has become inactive. If you're still interested in working on it, please let us know. We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks. |
234a076 to
77e5049
Compare
...g/src/test/java/io/trino/plugin/iceberg/TestIcebergParquetComplexTypesPredicatePushDown.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestParquetPredicates.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestParquetPredicates.java
Outdated
Show resolved
Hide resolved
...g/src/test/java/io/trino/plugin/iceberg/TestIcebergParquetComplexTypesPredicatePushDown.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 5 ?
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestParquetPredicates.java
Outdated
Show resolved
Hide resolved
77e5049 to
38c58a3
Compare
|
Thank you @leetcode-1533 for this contribution. |
|
Thanks @raunaqmorarka and @findinpath for finishing it up |
Description
From https://trino.io/blog/2020/08/14/dereference-pushdown.html: "Another future improvement will be the pushdown of predicates on subfields for data stored in Parquet format. Although the pruning of nested fields occurs with Parquet, the predicates are not yet pushed down into the reader."
This PR enables Parquet page source to use statistics for nested fields in the iceberg connector.
Additional context and related issues
Related ORC commit: 5069a55
Fixes #9928
Hive change PR: #15163
Release notes
() This is not user-visible or docs only and no release notes are required.
() Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text: