Skip to content

Partition Column transform support for Year, Month, Day & Hour#21303

Merged
tdcmeehan merged 1 commit intoprestodb:masterfrom
gupteaj:test_transform
Dec 19, 2023
Merged

Partition Column transform support for Year, Month, Day & Hour#21303
tdcmeehan merged 1 commit intoprestodb:masterfrom
gupteaj:test_transform

Conversation

@gupteaj
Copy link
Contributor

@gupteaj gupteaj commented Nov 3, 2023

Description

Presto issue #20570

  • Fix getColumnTransform() for date and timestamp columns to support transform based on iceberg partition spec
  • Add a new function transformBlock() to write partition column values using INTEGER type

Motivation and Context

Beyond selecting a particular column to partition by, you can select a “transform” and partition the table by the transformed value of the column.

Available in Iceberg include: https://iceberg.apache.org/spec/#partition-specs
Day, Month, Year, Hour, Bucket, Truncate

Presto Iceberg Connector currently already supports Bucket & Truncate transform with partition column in Iceberg Table. This feature request is to add transform support for Day, Month, Year, Hour

Impact

  • Presto Iceberg connector will support these partition transform functions.

Test Plan

  • Add new cases for Year, Month, Day, Hour in IcebergDistributedSmokeTestBase.java
  • Manual testing

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==


Iceberg Changes
* Day, Month & Year partition column transform functions are supported for date type in Presto Iceberg connector.
* Day, Month, Year & Hour partition column transform functions are supported for timestamp type in Presto Iceberg connector.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the documentation! I made suggestions for consistency with examples earlier in the page, and a very small formatting error.

If my rephrasing suggestions change your intended meaning in a way you disagree with, let me know and I'm happy to discuss it.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My apologies! I overlooked this single edit but I focused on the code block edit. I should have included this suggested change in yesterday's review. This is everything, no other suggestions.

@gupteaj gupteaj requested a review from steveburnett November 7, 2023 18:31
@tdcmeehan tdcmeehan self-assigned this Nov 7, 2023
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Thanks!

@gupteaj gupteaj force-pushed the test_transform branch 3 times, most recently from 6da5a77 to 0e3fde1 Compare November 15, 2023 06:49
@gupteaj gupteaj marked this pull request as ready for review November 16, 2023 18:23
@gupteaj gupteaj requested a review from a team as a code owner November 16, 2023 18:23
@github-actions
Copy link

github-actions bot commented Nov 16, 2023

Codenotify: Notifying subscribers in CODENOTIFY files for diff 038ff56...b7c6bff.

Notify File(s)
@steveburnett presto-docs/src/main/sphinx/connector/iceberg.rst

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(redoing review because CODENOTIFY pinged me)

LGTM! (docs)

New local build of docs, everything looks great.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Two very small nits to consider.

Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some problems when calculating partition values.

@gupteaj gupteaj marked this pull request as draft November 19, 2023 06:34
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Thanks!

@gupteaj gupteaj marked this pull request as ready for review December 19, 2023 06:08
Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the fix.

assertUpdate("INSERT INTO " + tableName + " VALUES (3, date '1970-10-01'), (4, date '1971-11-05')", 2);

assertQuery("SELECT c2_day, row_count, file_count FROM " + "\"" + tableName + "$partitions\" ORDER BY c2_day",
"VALUES ('1970-10-01', 1, 1), ('1971-11-05', 1, 1), ('2022-10-01', 1, 1), ('2023-11-02', 1, 1)");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused. According to the spec, partition transform by day should return an integer. Shouldn't c2_day then return the number of days from 1970-01-01, not the string date?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that Iceberg implementation has special treatment for partition function day. Referring to the codes in function org.apache.iceberg.transforms.Dates.getResultType().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. I raised apache/iceberg#9345 to fix this.

@tdcmeehan tdcmeehan merged commit 8b37cec into prestodb:master Dec 19, 2023
@gupteaj gupteaj deleted the test_transform branch December 20, 2023 01:57
@wanglinsong wanglinsong mentioned this pull request Feb 12, 2024
64 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants