Skip to content

Comments

Fix textfile ambiguous timestamps and different storage timezones#23593

Merged
rschlussel merged 1 commit intoprestodb:masterfrom
rschlussel:fix-text-timestamp
Sep 9, 2024
Merged

Fix textfile ambiguous timestamps and different storage timezones#23593
rschlussel merged 1 commit intoprestodb:masterfrom
rschlussel:fix-text-timestamp

Conversation

@rschlussel
Copy link
Contributor

@rschlussel rschlussel commented Sep 5, 2024

Description

For TEXTFILE tables, when structural types (array, map, row) contain timestamp columns, we weren't converting to the hive storage time zone as we were doing for primitive types. Instead the timestamps would be interpreted in the JVM timezone

The conversion to hive storage time zone had a secondary benefit (even when the timezones were the same) of fixing the handling of ambiguous timestamps. Ambiguous timestamps are local times that can have more than one unixtime representation. It happens commonly during the fall DST conversion where the hour from 1-2am repeats. Generally in Presto we use the earlier of thetwo possible times when the unixtime is ambiguous. However, the hive library we use for parsing textfiles uses the later time. The code for adjusting the time based on the hive storage time zone has a secondary benefit of correcting ambiguous timestamps to the earlier unixtime representation.

This change fixes those two issues for the structural type code path.

Motivation and Context

The motivation for this change is to provide consistent results for ambiguous timestamps regardless of where it is used. We also want to fix incorrectly not using the storage time zone for timestamps in structural types.

Impact

Ambiguous timestamps inside structural types in textfiles will now be interpreted in the earliest possible unixtime.
The hive.time-zone property will now be respected for timestamps inside structural types in textfiles

Test Plan

new tests

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

Hive Connector Changes
* Fix timestamps inside array, map, or row types for tables using TEXTFILE format to respect the ``hive.time-zone property``.
* Fix interpretation of ambiguous timestamps inside array, map, or row types for tables using TEXTFILE format to interpret the timestamps as the earliest possible unixtime for consistency with the rest of Presto.


For TEXTFILE tables, when structural types (array, map, row) contain
timestamp columns, we weren't converting to the hive storage time zone
as we were doing for primitive types. Instead the timestamps would be
interpreted in the JVM timezone

The conversion to hive storage time zone had a secondary benefit (even
when the timezones were the same) of fixing the handling of ambiguous
timestamps. Ambiguous timestamps are local times that can have more than
one unixtime representation. It happens commonly during the fall DST
conversion where the hour from 1-2am repeats. Generally in Presto we use
the earlier of thetwo possible times when the unixtime is ambiguous. However,
the hive library we use for parsing textfiles uses the later time. The code
for adjusting the time based on the hive storage time zone has a secondary
benefit of correcting ambiguous timestamps to the earlier unixtime
representation.

This change fixes those two issues for the structural type code path.
@rschlussel rschlussel marked this pull request as ready for review September 5, 2024 20:39
@rschlussel rschlussel requested a review from a team as a code owner September 5, 2024 20:39
Copy link
Contributor

@spershin spershin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for looking into this and fixing it @rschlussel !

Looks good, need a committer's approval too.

{
Timestamp timestamp = getTimestamp(object, inspector);
return timestamp.getTime();
long parsedJvmMillis = timestamp.getTime();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@zacw7 zacw7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!!

@rschlussel rschlussel merged commit 68bac89 into prestodb:master Sep 9, 2024
@jaystarshot jaystarshot mentioned this pull request Nov 1, 2024
25 tasks
@tdcmeehan tdcmeehan added the from:Meta PR from Meta label Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:Meta PR from Meta

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants