Skip to content

Add support for writing partitionValues_parsed for the add entries in the Delta Lake checkpoint#19662

Merged
ebyhr merged 4 commits intotrinodb:masterfrom
findinpath:findinpath/delta-write-part-values-parsed
Nov 30, 2023
Merged

Add support for writing partitionValues_parsed for the add entries in the Delta Lake checkpoint#19662
ebyhr merged 4 commits intotrinodb:masterfrom
findinpath:findinpath/delta-write-part-values-parsed

Conversation

@findinpath
Copy link
Copy Markdown
Contributor

@findinpath findinpath commented Nov 7, 2023

Description

Write partitionValues_parsed field for the add entries in the Delta Lake checkpoint file.

This information can be used to massively reduce the amount of information read from the checkpoint file in case the SELECT query from the Delta Lake table has a partition filter. See #19588 for details.

Fixes #19586

Additional context and related issues

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.

@cla-bot cla-bot bot added the cla-signed label Nov 7, 2023
@findinpath findinpath added release-notes delta-lake Delta Lake connector labels Nov 7, 2023
@github-actions github-actions bot added the docs label Nov 7, 2023
@findinpath findinpath force-pushed the findinpath/delta-write-part-values-parsed branch from 00ab843 to d17b1f7 Compare November 8, 2023 13:17
@findinpath findinpath force-pushed the findinpath/delta-write-part-values-parsed branch 2 times, most recently from 94fb6e1 to d49d727 Compare November 9, 2023 22:35
@findinpath findinpath removed the docs label Nov 9, 2023
@findinpath findinpath self-assigned this Nov 9, 2023
@findinpath findinpath marked this pull request as ready for review November 9, 2023 22:38
@findinpath
Copy link
Copy Markdown
Contributor Author

@ebyhr pls do run the PR with secrets.

@findinpath findinpath changed the title Add support for writing partitionValues_parsed in the Delta Lake checkpoint Add support for writing partitionValues_parsed for the add entries in the Delta Lake checkpoint Nov 9, 2023
@ebyhr
Copy link
Copy Markdown
Member

ebyhr commented Nov 9, 2023

/test-with-secrets sha=d49d727bc2d0a1aac8968aaa6a9bbb7af724781d

@github-actions
Copy link
Copy Markdown

github-actions bot commented Nov 9, 2023

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/6818424653

@findinpath findinpath force-pushed the findinpath/delta-write-part-values-parsed branch 3 times, most recently from e54e63a to 406aaf3 Compare November 13, 2023 10:20
@ebyhr
Copy link
Copy Markdown
Member

ebyhr commented Nov 20, 2023

Could you rebase on master to resolve conflicts?

@findinpath findinpath force-pushed the findinpath/delta-write-part-values-parsed branch from 406aaf3 to 7df2c68 Compare November 20, 2023 12:43
@findinpath findinpath force-pushed the findinpath/delta-write-part-values-parsed branch from 7df2c68 to f5d9db2 Compare November 27, 2023 09:52
@findinpath findinpath force-pushed the findinpath/delta-write-part-values-parsed branch from f5d9db2 to 14e9de8 Compare November 28, 2023 11:44
@findinpath
Copy link
Copy Markdown
Contributor Author

CI hit #16315

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is TIMESTAMP_NTZ type disallowed as a partition column? It would be nice to create the table in Spark and cover the type if it's supported.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added corresponding test in TestDeltaLakeBasic.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, but TestDeltaLakeBasic doesn't ensure that the field in checkpoint is readable by Spark.

@findinpath findinpath force-pushed the findinpath/delta-write-part-values-parsed branch 2 times, most recently from 6885610 to 60b723b Compare November 29, 2023 11:43
@findinpath findinpath force-pushed the findinpath/delta-write-part-values-parsed branch from 60b723b to d585417 Compare November 29, 2023 11:44
@findinpath
Copy link
Copy Markdown
Contributor Author

@ebyhr AC

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, but TestDeltaLakeBasic doesn't ensure that the field in checkpoint is readable by Spark.

@ebyhr ebyhr merged commit 312980d into trinodb:master Nov 30, 2023
@github-actions github-actions bot added this to the 435 milestone Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed delta-lake Delta Lake connector

Development

Successfully merging this pull request may close these issues.

Add support for writing partitionValues_parsed field in Delta Lake checkpoint

3 participants