Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using date_bin with a time zone in a time range that contains daylight savings does not work #10308

Open
Abdullahsab3 opened this issue Apr 30, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@Abdullahsab3
Copy link
Contributor

Abdullahsab3 commented Apr 30, 2024

Describe the bug

When using the date_bin function, while providing a time zone to the column used in date_bin , and using a range that contains daylight savings (i.e. part of the range would be for example with an offset of +01:00, and another part with an offset of +02:00), the querying fails with the following error:

rpc error: code = InvalidArgument desc = External error: Arrow error: Cast error: Cannot cast timezone to different timezone

The query that I used is:

select 
    date_bin(interval '1 hour', time at time zone 'Europe/Brussels') 
from 
    raw_data 
where 
    time >= '2021-03-27T22:00:00.000Z' 
and 
    time <= '2021-03-29T00:00:00.000Z'
group by date_bin(interval '1 hour', time at time zone 'Europe/Brussels') 

Notice that the interval used [2021-03-27T22:00:00.000Z, 2021-03-29T00:00:00.000Z] contains daylight savings within the range, meaning that in this case part of the range will be with an offset of +01:00 (before the daylight savings), and another part will be with an offset of +02:00 (after the daylight savings)

I suspect the issue is that Arrow is attempting to case the times with a 2 hours offset to the times of an 1 hour offset

Update:
The issue is the daylight savings hour. This query works fine:

select 
    date_bin(interval '1 hour', time at time zone 'Europe/Brussels') 
from 
    raw_data 
where 
    time >= '2021-03-27T22:00:00.000Z' 
and 
    time <= '2021-03-29T00:00:00.000Z'
-- exclude the daylight savings hour
and time not between '2021-03-28T02:00:00Z' and '2021-03-28T03:00:00Z'
group by date_bin(interval '1 hour', time at time zone 'Europe/Brussels') 

In the query above, we exclude the daylight savings hour of that year for that timezone.

To Reproduce

Execute the following query

select 
    date_bin(interval '1 hour', time at time zone 'Europe/Brussels') 
from 
    raw_data 
where 
    time >= '2021-03-27T22:00:00.000Z' 
and 
    time <= '2021-03-29T00:00:00.000Z'
group by date_bin(interval '1 hour', time at time zone 'Europe/Brussels') 

Expected behavior

I should get the data aggregated by time. Part of the interval will be with a 2 hours offset, another part will be with a 1 hour offset

Additional context

raw_data is a table that contains values for each 15 minutes. The table with its data is stored in influxDB 3.0 that uses the FDAP stack

No response

@Abdullahsab3 Abdullahsab3 added the bug Something isn't working label Apr 30, 2024
@Abdullahsab3
Copy link
Contributor Author

I think the wider image of this issue is the fact that the hour at which the daylight savings occur will not be able to get transformed to the desired timezone. For example:

select '2019-03-31T02:00:00Z'::timestamp at time zone 'Europe/Brussels';

This would also fail for the same reason

@jayzhan211
Copy link
Contributor

I remember the daylight savings time is quite tricky because it has ambiguous possible time.

See #8899 (comment)

@Abdullahsab3
Copy link
Contributor Author

Abdullahsab3 commented May 2, 2024

I am testing some things out with the at time zone operator to understand it better. I think the UTC timestamp casting should be explictly specified first. Some context:

Daylight savings in 2019 is in March 31 at 2AM in local time.
This is 2019-03-31T00:00:00 in UTC I think.

The query select '2019-03-31T02:00:00Z'::timestamp at time zone 'Europe/Brussels'; seems to be ignoring the Z at the end, and it considers the timestamp local time.

Some experiments:

select '2019-03-31T02:00:00Z'::timestamp at time zone 'Europe/Brussels';

result> Arrow error: Cast error: Cannot cast timezone to different timezone
select '2019-03-31T00:00:00'::timestamp at time zone 'UTC' at time zone 'Europe/Brussels'
result> 2019-03-31T01:00:00+01:00
select '2019-03-31T01:00:00'::timestamp at time zone 'UTC' at time zone 'Europe/Brussels'
result> 2019-03-31T03:00:00+02:00

It seems to me that, regardless of the Z provided in the query, the timestamp will considered as a naive date time, and the additional timezone information that is provided in the query will be considered the source of truth.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants