-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Implement Dereference pushdown for the Delta Lake connector #17085
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ef81ebc to
b0b04f0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: Possible to make it common?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Please add a product test reading from Delta Lake tables (created either by Databricks/Delta OSS) with column mapping mode |
.../trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeColumnProjectionInfo.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeConfig.java
Outdated
Show resolved
Hide resolved
...n/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/transactionlog/MetadataEntry.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java
Outdated
Show resolved
Hide resolved
b0b04f0 to
9fe1849
Compare
|
Added some more tests. |
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
...rino-delta-lake/src/test/java/io/trino/plugin/deltalake/BaseDeltaLakeMinioConnectorTest.java
Outdated
Show resolved
Hide resolved
testing/trino-testing/src/main/java/io/trino/testing/TestingConnectorBehavior.java
Outdated
Show resolved
Hide resolved
b1c173b to
3e3167b
Compare
3e3167b to
46dc82e
Compare
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeColumnHandle.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeColumnHandle.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could put these tests in a base class in io.trino.testing and then run it for the relevant connectors and file formats by adding derived classes (e.g. BaseOrcWithBloomFiltersTest)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am considering moving tests in follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great finding Raunaq.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...rino-delta-lake/src/test/java/io/trino/plugin/deltalake/BaseDeltaLakeMinioConnectorTest.java
Outdated
Show resolved
Hide resolved
41659e2 to
7e7acaa
Compare
4eee057 to
dd60e5c
Compare
|
rebased with master and resolved conflicts |
.../trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeColumnProjectionInfo.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeColumnHandle.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
dd60e5c to
b6be925
Compare
|
Addressed rest of the comments: will raise follow-up PR for below comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we use projected column name here if it dereference field?
I see this as part of #17164
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: FYI This seems to be done as well in deserializeColumnValue one line below.
No change needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deserializeColumnValue throw a verification exception to verify the column is base column.
verify(column.isBaseColumn(), "Unexpected dereference: %s", column);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have tests for tables that underwent insert/update/delete/merge with onDelta + dereference pushdown (for all column mapping modes)
@findepi what is the rationale for this request in the context of this read oriented feature ? Is this more about being future-proof?
|
@findepi can you pls run this PR with secrets? |
|
/test-with-secrets sha=b6be925a0b09044923a17b37133fd58809e00540 |
findepi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please squash (after the test with secrets run)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to remember to solve this in #17365
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeSessionProperties.java
Outdated
Show resolved
Hide resolved
...uct-tests/src/main/java/io/trino/tests/product/deltalake/TestDeltaLakeColumnMappingMode.java
Outdated
Show resolved
Hide resolved
|
test-with-secrets run is 🟢 https://github.com/trinodb/trino/actions/runs/4912972173 |
b6be925 to
1b9b988
Compare
|
Addressed comments |
|
@krvikash thank you so much! |
|
Thank you so much, @findepi | @findinpath | @raunaqmorarka | @alexjo2144 for the reviews and constant support. |
Description
This PR implements dereference pushdown for Delta Lake connector(similar to #8129).
This adds significant performance improvements for queries accessing nested fields inside struct/row columns. They have been optimized through the pushdown of dereference expressions. With this feature, the query execution prunes structural data eagerly, extracting the necessary fields.
For Example:
I have a table having a nested field
col. When perform selectingcol.a, we can see the difference inInputandPhysical Inputvalues in the query plan when running with and without dereference pushdown.Table Schema as below:
Query Plan without Dereference pushdown:
Query Plan with Dereference pushdown:
More Details about dereference pushdown: https://trino.io/blog/2020/08/14/dereference-pushdown.html
Additional context and related issues
The feature is enabled by default.
The feature can be disabled by setting
delta.projection-pushdown-enabledconfiguration property ordelta.projection_pushdown_enabledsession property tofalse.Release notes
(X) Release notes are required, with the following suggested text: