Skip to content

Conversation

@dragosmg
Copy link
Contributor

@dragosmg dragosmg commented Feb 2, 2022

No description provided.

@github-actions
Copy link

github-actions bot commented Feb 2, 2022

@github-actions
Copy link

github-actions bot commented Feb 2, 2022

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@dragosmg
Copy link
Contributor Author

dragosmg commented Feb 2, 2022

TL/DR:
In order to mimic the base R behaviour, if the timezone argument (tz) isn't specified, arrow::format() needs to extract it from the object (where possible). Only if that isn't possible, it should use the system timezone.

=================

I think the failures I am seeing so far are due to the difference between base::format() and base::strftime(). While in {arrow} they behave similarly (due to the nature of my implementation), in base R they do not:

library(dplyr, warn.conflicts = FALSE)
library(lubridate, warn.conflicts = FALSE)
times <- tibble(
  datetime = c(lubridate::ymd_hms("2018-10-07 19:04:05", tz = "Etc/GMT+6"), NA),
  date = c(as.Date("2021-01-01"), NA)
)
formats <- "%a %A %w %d %b %B %m %y %Y %H %I %p %M %z %Z %j %U %W %x %X %% %G %V %u"
a <- times$datetime[1]
a
#> [1] "2018-10-07 19:04:05 -06"
format(a, format = formats)
#> [1] "Sun Sunday 0 07 Oct October 10 18 2018 19 07 PM 04 -0600 -06 280 40 40 10/07/2018 19:04:05 % 2018 40 7"
strftime(a, format = formats)
#> [1] "Mon Monday 1 08 Oct October 10 18 2018 02 02 AM 04 +0100 BST 281 40 41 10/08/2018 02:04:05 % 2018 41 1"

Created on 2022-02-02 by the reprex package (v2.0.1)

The difference boils down to:

# format()
format(as.POSIXlt(a), format = formats)
#> [1] "Sun Sunday 0 07 Oct October 10 18 2018 19 07 PM 04 -0600 -06 280 40 40 10/07/2018 19:04:05 % 2018 40 7"
# strftime()
format(as.POSIXlt(a, tz = ""), format = formats)
#> [1] "Mon Monday 1 08 Oct October 10 18 2018 02 02 AM 04 +0100 BST 281 40 41 10/08/2018 02:04:05 % 2018 41 1"

Created on 2022-02-02 by the reprex package (v2.0.1)

@dragosmg dragosmg marked this pull request as ready for review February 3, 2022 12:07
@dragosmg dragosmg marked this pull request as draft February 3, 2022 13:23
@dragosmg dragosmg marked this pull request as ready for review February 3, 2022 15:35
Copy link
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! a few comments / suggestions at a high level

@dragosmg dragosmg requested a review from jonkeane February 4, 2022 20:10
Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The datetime bits are looking good! I added a few comments that may help to handle other types of objects that might get passed to format().

@dragosmg
Copy link
Contributor Author

Follow-up ticket to remove / revisit the manual casting step - ARROW-15762.

@dragosmg dragosmg requested a review from jonkeane February 23, 2022 10:01
@dragosmg dragosmg requested a review from jonkeane March 3, 2022 21:19
@dragosmg
Copy link
Contributor Author

dragosmg commented Mar 3, 2022

@jonkeane I think this is ready for another pass if/when you have a bit of time

Copy link
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good — one super minor test addition and a question about build_expr() + clauses catching non-expressions

@dragosmg dragosmg requested a review from jonkeane March 5, 2022 13:43
@dragosmg dragosmg requested a review from jonkeane March 8, 2022 16:18
Copy link
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there! One comment about a comment + a question about the tests

@dragosmg dragosmg requested a review from jonkeane March 9, 2022 21:43
Copy link
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thanks for sticking through this.

@jonkeane jonkeane closed this in dc2e0b2 Mar 11, 2022
@ursabot
Copy link

ursabot commented Mar 11, 2022

Benchmark runs are scheduled for baseline = 33fd1c2 and contender = dc2e0b2. dc2e0b2 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.13% ⬆️0.0%] test-mac-arm
[Finished ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.13% ⬆️0.0%] ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@dragosmg dragosmg deleted the format_bindings branch April 26, 2022 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants