Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

realtime_add_local_datetime converts date time incorrect #157

Closed
nscott-grnland opened this issue Jun 24, 2021 · 3 comments
Closed

realtime_add_local_datetime converts date time incorrect #157

nscott-grnland opened this issue Jun 24, 2021 · 3 comments

Comments

@nscott-grnland
Copy link

realtime_add_local_datetime converts times for timezones incorrectly. When station_tz and tz_used are the same, Date and local_datetime are different.

Steps to reproduce the behavior:

  1. Run:
    rt <- realtime_dd(prov_terr_state_loc = "ON")
    realtime_add_local_datetime(rt, set_tz = NULL)
  2. Note where station_tz and tz_used both equal "America/Toronto", Date = 2021-05-24 05:00:00 and local_datetime = 2021-05-24 01:00:00

I expect Date and local_datetime to be the same if "America/Toronto" is being converted to "America/Toronto"

image

R version 4.0.4 (2021-02-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
tidyhydat_0.5.2

@boshek
Copy link
Collaborator

boshek commented Jun 25, 2021

👋 @nscott-grnland,

Thanks for the report.

So this probably could be documented better but I think this behaviour makes sense. First when one grabs data that spans multiple time zones I think the only sensible approach is to use "UTC". That is the default for tidyhydat. Eg:

library(tidyhydat)
rt <- realtime_dd(prov_terr_state_loc = "ON")
attributes(rt$Date)$tzone
[1] "UTC"

In your example there are multiple timezones across Ontario. AFAIK it isn't possible (or advisable) to have different time zones in a single column in R. So what tidyhydat does is process all to one timezone with a warning (in this case "America/Toronto"). You can see that here all the case where station_tz differs from tz_used:

rt_added[rt_added$station_tz != rt_added$tz_used,]
  Queried on: 2021-06-25 16:00:16 (UTC)
  Date range: 2021-05-25 to 2021-06-24 
# A tibble: 962,268 x 11
   STATION_NUMBER PROV_TERR_STATE_LOC Date                station_tz         
   <chr>          <chr>               <dttm>              <chr>              
 1 02AB008        ON                  2021-05-25 05:03:00 America/Thunder_Bay
 2 02AB008        ON                  2021-05-25 05:08:00 America/Thunder_Bay
 3 02AB008        ON                  2021-05-25 05:13:00 America/Thunder_Bay
 4 02AB008        ON                  2021-05-25 05:18:00 America/Thunder_Bay
 5 02AB008        ON                  2021-05-25 05:23:00 America/Thunder_Bay
 6 02AB008        ON                  2021-05-25 05:28:00 America/Thunder_Bay
 7 02AB008        ON                  2021-05-25 05:33:00 America/Thunder_Bay
 8 02AB008        ON                  2021-05-25 05:38:00 America/Thunder_Bay
 9 02AB008        ON                  2021-05-25 05:43:00 America/Thunder_Bay
10 02AB008        ON                  2021-05-25 05:48:00 America/Thunder_Bay
# ... with 962,258 more rows, and 7 more variables: local_datetime <dttm>,
#   tz_used <chr>, Parameter <chr>, Value <dbl>, Grade <chr>, Symbol <chr>,
#   Code <chr>

So have deliberately left Date untouched making it easier to work across timezones. A couple thoughts:

  • Maybe the warning should be converted to an error. Basically if you try to convert to local date time over data that span timezones the function with error.
  • Alternatively, rather than just choosing the first station_tz tidyhydat checks for the most common one and converts to that.

Any other thoughts on how the documentation could be better written?

@nscott-grnland
Copy link
Author

Thanks for responding so quickly.

If I understand you correctly, Date is being shown in "UTC" and station_tz is the timezone of the station's location, not the timezone being displayed in the Date column.

Now that you have clarified this for me, I re-read the documentation and it does make sense. I think what threw me off was in the realtime_add_local_datetime section, under 'Details', it states "Date from realtime_dd is supplied in UTC". I either skimmed the documentation too quickly or interpreted this to mean the underlying date was being supplied from UTC and then converted. It might prevent others from making the same mistake as me by adding a note in this section that mentions station_tz is the station's timezone and not the Date timezone. I know station_tz is listed in the realtime_dd section of the documentation but if you are only looking at the realtime_add_local_datetime section, it can be easily overlooked.

@boshek
Copy link
Collaborator

boshek commented Jun 25, 2021

@nscott-grnland added some text here which hopefully makes things a bit clear: d231b2f

@boshek boshek closed this as completed in d231b2f Jun 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants