Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow comparison of Timestamps with different Timezones #11653

Closed
jeffreyssmith2nd opened this issue Jul 25, 2024 · 4 comments · Fixed by #11711
Closed

Allow comparison of Timestamps with different Timezones #11653

jeffreyssmith2nd opened this issue Jul 25, 2024 · 4 comments · Fixed by #11711
Assignees
Labels
enhancement New feature or request

Comments

@jeffreyssmith2nd
Copy link
Contributor

jeffreyssmith2nd commented Jul 25, 2024

Is your feature request related to a problem or challenge?

Currently, comparing two times with different timezones gives an error (see reproducer below). The error occurs when attempting type coercion on the two inputs to the less greater than comparison.

You can see the reproducer below, this feels like a natural query that someone may want to execute (they want the data back in UTC, but know the time based on their local timezone for filtering).

Describe the solution you'd like

According to the arrow documentation, all timestamps with timezones are stored relative to UTC epoch, with the timezone used to offset. In theory this should allow us to compare two times from different timezones.

The question becomes what type should the comparison return? Always the type on the left, the right, UTC if exists, something else? In this specific example, we know that UTC would be an appropriate Signature type based on the rest of the SELECT statement, maybe there is enough information (or enough can be provided) to make that determination during planning.

Alternatively, Signature::comparison could be updated to take separate types for the left and right hand sides.

Describe alternatives you've considered

No response

Additional context

Simple Reproducer

copy (
 values
  (arrow_cast('2024-07-25', 'Timestamp(Nanosecond, Some("UTC"))')),
  (arrow_cast('2024-07-24', 'Timestamp(Nanosecond, Some("UTC"))'))
)
to 'example.parquet';
select column1 from 'example.parquet' where column1 > '2020-06-01T00:00:00' at time zone 'America/New_York';

Provides the error:

Error during planning: Cannot infer common argument type for comparison operation Timestamp(Nanosecond, Some("UTC")) > Timestamp(Nanosecond, Some("America/New_York"))
@jeffreyssmith2nd jeffreyssmith2nd added the enhancement New feature or request label Jul 25, 2024
@alamb
Copy link
Contributor

alamb commented Jul 25, 2024

I suggest coercing both timestamps to UTC

My rationale is that:

  1. To the best of my knowledge, there should be a 1->1 mapping from a timestamp in any timezone to UTC
  2. It is not clear to me that there is a 1->1 mapping from every timestamp in any timezone to a timestamp in any other timezone (like during DST conversion I think sometimes there are multiple times with the same UTC time in a ceratin timestamp)
  3. Since the coercion rules are for comparison (and will result in the output type boolean) I don't think the choice of UTC (over the given timezone) would affect the output of the query.

@findepi
Copy link
Member

findepi commented Jul 25, 2024

  1. To the best of my knowledge, there should be a 1->1 mapping from a timestamp in any timezone to UTC

at least in SQL spec, the timestamp with time zone has point-in-time semantics, and every point in time has its representation in UTC

Since the coercion rules are for comparison (and will result in the output type boolean) I don't think the choice of UTC (over the given timezone) would affect the output of the query.

can the comparison operator be a function that accepts timestamps in different zones?

@alamb
Copy link
Contributor

alamb commented Jul 26, 2024

can the comparison operator be a function that accepts timestamps in different zones?

It could, but I think that would just postpone the conversation to execution (and it would likely be implemented the same, by calling the arrow cast). Doing it in coerce_types just does the conversion earlier in the plan.

@jeffreyssmith2nd
Copy link
Contributor Author

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants