-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-9528: [Python] Honor tzinfo when converting from datetime #7816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@emkornfield I've cherry picked ARROW-9223, ARROW-9528, addressed Antoine's reviews comments and simplified the conversion Timestamp conversion path a bit. I'm going to add timezone support for the Time types as well. |
|
Thanks @kszucs per discussion I'm ok with this not making it into 1.0 release. I am working on making moving tzinfo_to_string to c++ so we can accurately capture tz instead of going to utc. I also still think a flag is a good idea to revert to buggy behavior. Want me to post here? |
|
@emkornfield sure, please feel free to push to this PR. I'm going to work on the Time type support then. |
|
@emkornfield I managed to implement TzinfoToString on the C++ side. Currently adding timezone inference support. |
python/pyarrow/tests/test_pandas.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might not be true anymore with StringToTz?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed it seems like pa.array(result) should recover struct and we want to verify that?
python/pyarrow/tests/test_types.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for the thorough tests.
|
@kszucs I pushed a few commits to:
I'd appreciate any thoughts you have on testing the backwards compatibility (maybe it is sufficient to use this with spark integration tests?) |
|
Thanks Micah! Testing it via the spark integration test SGTM. If we think that ignoring timezones during conversion could be a useful feature we could even expose it properly on the python side. |
|
I'll let others chime in but I think this PR captures correct behavior, so I'd like to deprecate backwards compatibility after a release or two |
|
I mean ignore_tomezone could be a useful conversion option for mixed timezone aware/naive input values (should be rare or rather discouraged though). I agree with the deprecation plan. I'm on vacation next week, but hopefully @pitrou and @jorisvandenbossche will be able to take a look at it. |
|
@github-actions crossbow submit test-conda-python-3.8-spark-master |
|
Revision: e27e86cdc83dede4bb70c49ac5af2fb3a0494c51 Submitted crossbow builds: ursa-labs/crossbow @ actions-455
|
|
@github-actions crossbow submit test-conda-python-3.8-spark-master |
|
Revision: b4e3b9d4dcf3da6c7fa3e652d5b77fc4e7abad13 Submitted crossbow builds: ursa-labs/crossbow @ actions-458
|
|
@github-actions crossbow submit test-conda-python-3.7-spark-branch-3.0 |
|
Revision: b4e3b9d4dcf3da6c7fa3e652d5b77fc4e7abad13 Submitted crossbow builds: ursa-labs/crossbow @ actions-462
|
wesm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few scattered comments. I need to pull this down and kick the tires a bit to make sure that the new behavior matches my expectations
cpp/src/arrow/python/datetime.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these need to be exported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the other utility functions are exported as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit ugly, could we pass an options struct instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought here was this parameter will only be need until the next release, so it would be better to add it here and then remove it without the need to introduce options. We can introduce options though if you feel strongly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be consistent about const?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, probably just late, but not sure what you mean here.
python/pyarrow/tests/test_pandas.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed it seems like pa.array(result) should recover struct and we want to verify that?
- Ports string_to_timezone to C++ - Causes nested timestamp columns within structs that have timezones to go through object conversion - Copy timezone on to_object path. Open to other suggestions on how to solve this. It still seems like this is inconsistent with nested timestamps in lists which isn't great. Closes apache#7604 from emkornfield/datetime Lead-authored-by: Micah Kornfield <[email protected]> Co-authored-by: Joris Van den Bossche <[email protected]> Signed-off-by: Wes McKinney <[email protected]>
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that @wesm already commented extensively on this, I've only looked at a small part of this PR.
|
Sorry for the delay, didn't have a lot of time for development this week. I am going to pull this down and kick the tires a bit this weekend and will merge later today or tomorrow |
wesm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 from me. Merging
Follow up of:
TODOs: