Skip to content

Conversation

@chenjian2664
Copy link
Contributor

@chenjian2664 chenjian2664 commented May 18, 2025

Description

Closes #25642

This PR assumes that the underlying partition paths already follow the user-defined format, specifically for date projection.

If the projection storage.location.template isn't set, or storage.location.template is set and compatible with the Hive path -- the partition paths follow the default Hive format: partName=${partValue}. In this case, partValue is escaped for special characters (e.g., / becomes %2F), same as the existing writing logic in Hive today.

If the storage.location.template is set, NOT compatible with the hive format, the partition value is inserted into the path that follow the template(or hive format) WITHOUT escaped special characters:
• With a custom date projection format and a partition column dt formatted as yyyy/MM/dd, this PR allows reading the partition path: dt=2015%2F10%2F10
• If the storage.location.template is set to something like /aaa/bbb/${dt}-xxx, this PR enables reading the partition path:/aaa/bbb/2015/10/10-xxx

This PR not support writing DateProjection with user-defined format that not compatible with Hive date/timestamp format.

Additional context and related issues

Alternative for #25657

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

## Hive
* Fix reading specifying format for date partition projection in Hive connector. ({issue}`25642`)

@cla-bot cla-bot bot added the cla-signed label May 18, 2025
@github-actions github-actions bot added the hive Hive connector label May 18, 2025
@chenjian2664 chenjian2664 force-pushed the ref_date_projection branch from af8efa8 to 6ef1fa6 Compare May 18, 2025 13:52
@chenjian2664 chenjian2664 requested a review from pettyjamesm May 19, 2025 01:05
@chenjian2664 chenjian2664 force-pushed the ref_date_projection branch from 6ef1fa6 to bc09276 Compare May 26, 2025 06:52
@github-actions github-actions bot added the hudi Hudi connector label May 26, 2025
@chenjian2664 chenjian2664 force-pushed the ref_date_projection branch 5 times, most recently from e60ed0f to 6c9b3ea Compare May 30, 2025 02:03
@metadaddy
Copy link
Member

Hi folks - any progress on this one? I'm writing an article explaining how to use Trino to query access logs, and I've had to include a caveat explaining that partition projection currently doesn't work, so time-based queries scan all rows, rather than just those included by the query conditions.

Replace legacy formatting with `DateTimeFormatter` for better compatibility
 with the `java.time` package, providing more precise and predictable behavior
This refactor follows AWS Athena's date-type partition projection format
(https://docs.aws.amazon.com/athena/latest/ug/partition-projection-supported-types.html)
in the description.
Additionally, this commit replaces the error-prone Supplier<Instant>
with a direct Instant type for `leftBound` and `rightBound`.
@chenjian2664 chenjian2664 force-pushed the ref_date_projection branch from 6c9b3ea to ab6238b Compare June 20, 2025 01:48
@chenjian2664 chenjian2664 requested a review from pettyjamesm June 20, 2025 01:56
@chenjian2664 chenjian2664 force-pushed the ref_date_projection branch 2 times, most recently from 3d352bd to e8232e7 Compare June 20, 2025 10:10
@chenjian2664 chenjian2664 force-pushed the ref_date_projection branch from e8232e7 to 19099b7 Compare June 27, 2025 05:24
@chenjian2664
Copy link
Contributor Author

@pettyjamesm comments addressed, PTAL

@chenjian2664 chenjian2664 force-pushed the ref_date_projection branch 2 times, most recently from 145fd3e to 2e4699f Compare July 13, 2025 04:03
Copy link
Member

@pettyjamesm pettyjamesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good to me, test coverage looks good. Some code style comments / nits.

@chenjian2664 chenjian2664 force-pushed the ref_date_projection branch from 2e4699f to 3be16bc Compare July 15, 2025 01:54
Copy link
Member

@pettyjamesm pettyjamesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes here look good to me, thanks for taking this over the finish line @chenjian2664!

@pettyjamesm pettyjamesm merged commit e933971 into trinodb:master Jul 16, 2025
115 of 116 checks passed
@github-actions github-actions bot added this to the 477 milestone Jul 16, 2025
@chenjian2664
Copy link
Contributor Author

@pettyjamesm I really appreciate your guidance and the time you took to assist me

@metadaddy
Copy link
Member

Thanks for your work on this, @chenjian2664. I'll test it against the Backblaze B2 Bucket Access Log and let you know how it works out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed hive Hive connector hudi Hudi connector

Development

Successfully merging this pull request may close these issues.

Hive partition projection date partition values are not parsed with partition_projection_format

3 participants