-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Implement Dereference pushdown for the Iceberg connector #8129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Dereference pushdown for the Iceberg connector #8129
Conversation
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to store projected columns here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was based on the Hive implementation. Seems like the only place it's used is to avoid duplicating work if the same projections are given to the Metadata multiple times? #7360
It's also used for testing, but maybe we can skip it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the only place it's used is to avoid duplicating work if the same projections are given to the Metadata multiple times
This is actually important. The contract is that if call to applyProjection is a no-op should return Optional.empty()
#7750 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the optimization in #7360 could also be achieved without looking at the tablehandle. but don't mind keeping it as is given other connectors do the same.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
f04e2ab to
75dba60
Compare
|
Finally had a chance to get back to this PR. I added the PageSource level projection adapter for if two column handles overlap like |
|
I added a second commit here with support for Parquet Iceberg tables. It was a little difference because the OrcReader accepts nested OrcColumns and knows how to read them back but parts of the ParquetReader column descriptions/types needed to be set up from the root of the column. |
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergColumnHandle.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergColumnHandle.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>= or >? Is x parent of x
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, for the one place this is used it is important that x is a parent of x. Could probably inline this method though, I thought it was going to be more re-usable than it was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indexOfSublist is an option to avoid comparing sizes, but I'd leave it up to you. Also, we should just inline this
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSource.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is preexistent; but check for empty is not obvious to me. Should we rather check if we are missing mapping for some columns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the assumption is that if the Id field exists for one column it must exist for all of them. I haven't seen a case where they don't exist yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a checkstate that it holds? So the result map is either empty or the size matches fileColumns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, columns can be missing if they've been added since this data file was created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if an old table is migrated to iceberg without rewriting data (which is going to be most common scenario initially), files won't contain ids at all.
We should probably refactor this to inspect one field to see if it has an id, and set a flag. If true, everything is expected to have an id. That'd be easier to reason about than looking at emptiness of a map. But it doesn't need to be in scope here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if an old table is migrated to iceberg without rewriting data (which is going to be most common scenario initially), files won't contain ids at all.
Are there any tests for that situation? Would like to add a projected test case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some migrated cases to the Spark compatibility tests. @phd3 are you familiar with the Iceberg schema.name-mapping.default property? Looks like a more robust way we could be mapping column names for migrated tables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we had the full Iceberg schema here as suggested in #8754 I think we could get the mapping from schema.findField(String name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that possible at this point? It feels we are rebuilding it above if it was empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it builds a backup mapping by name for the columns where the id is empty
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a chance to get naming conflict here; get same projectedColumnName for different handles. It should not be possible if path elements and name cannot contain dots. It is a corner case, but still can you verify if that is possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there's a validation for this in the Iceberg schema code:
trino:default> CREATE TABLE foo ("a.b" INT, a ROW (b INT));
Query 20210707_163341_00003_xkr2b failed: Invalid schema: multiple fields for name a.b: 1 and 3
org.apache.iceberg.exceptions.ValidationException: Invalid schema: multiple fields for name a.b: 1 and 3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i missed this comment so i checked this myself.
the code should be adorned with comment informing the reader about such limitations/assumptions where we leverage them
...in/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergOrcProjectionPushdown.java
Outdated
Show resolved
Hide resolved
c1ed4b3 to
32d6705
Compare
|
Rebased for merge conflicts |
32d6705 to
d043e30
Compare
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergColumnHandle.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergColumnHandle.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergColumnHandle.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergColumnHandle.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we end up projecting all of root.getNestedColumns(), each of them fully, could we return fullyProjectedLayout?
(OK to address in follow-up)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we're doing tree traversal here (effectively).
wonder whether we can improve representation of dereferences so that we don't need to copy all the elements when recursing (OK to address in follow-up)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we're doing tree traversal here (effectively).
wonder whether we can improve representation of dereferences so that we don't need to copy all the elements when recursing (OK to address in follow-up)
Consistently use the OrcColumn name between the FieldMapper and FieldLayouts in the Orc reader. This fixes the case where a field has been renamed and the Trino type name does not match the Orc column name. This is tested in following commit using Iceberg projection pushdown.
d043e30 to
3aa8126
Compare
|
Had to rebase to resolve merge conflicts but the last set of comments is in the last fixup commit. Thanks |
|
Test failures are in |
please make sure there is an issue for this
please make sure there is an issue for this |
phd3
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for your work on this @alexjo2144.
With Iceberg, pushdown on metadata side itself gives us huge benefits because of split generation time pruning. However, I think there're some missing pieces on pagesource side as mentioned in comments. Please let me know if there's an issue with my understanding.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergTableHandle.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we intentionally not pushing predicates into parquet reader? (line 591)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably just because it's relatively new. The change allowing for passing a Predicate to ParquetReader was just merged in August. If it's useful here, I'd rather do it in followup, since it's related to all the column index changes to the parquet reader and I don't really understand those yet. 6eb42f2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was mainly thinking of the pushdown in rowgroups (ORC stripe-equivalent, even before newly added code). but look like it's missing in hive connector too. cc @JamesRTaylor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am not sure i understand the conclusion. should we have a follow-up issue for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exclusion of nested predicates was added initially when only ORC support was added for dereference pushdown. However, seems like it didn't change in #3396 while adding parquet support. I'm not sure if that was intentional. but you're right that this deserves a separate issue. #9928
For iceberg connector, I don't see corresponding changes like #3396 here for propagating projected layout, so assumed that we're tackling it separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
phd3
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can merge this one PR with ORC improvements once #8129 (comment) is resolved, and handle Parquet stuff separately. Can you please squash commits?
|
CI #8611 |
Refactor needed for Iceberg projection pushdown. Iceberg uses field ids to specify columns rather than column names.
Annotate nullable fields with the @nullable annotation and allow for any ColumnHandle implementation.
3aa8126 to
01c38ed
Compare
|
Created a follow up Issue: #9931 and squashed |
phd3
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work on this @alexjo2144 ! I'll merge this unless someone else has further comments.
|
Merged, thanks! |
This implementation only applies to tables using the ORC file format.This is very much a WIP PR to get initial feedbackFixes #5179
TODO: