Skip to content

Conversation

@kszucs
Copy link
Member

@kszucs kszucs commented Jul 21, 2020

Follow up of:

TODOs:

  • Store all Timestamp values normalized to UTC
  • Infer timezone from the array values if no explicit type was given
  • Testing (especially pandas object roundtrip)
  • Testing of timezone-naive roundtrips
  • Testing mixed pandas and datetime objects

@kszucs
Copy link
Member Author

kszucs commented Jul 21, 2020

@emkornfield I've cherry picked ARROW-9223, ARROW-9528, addressed Antoine's reviews comments and simplified the conversion Timestamp conversion path a bit. I'm going to add timezone support for the Time types as well.

@emkornfield
Copy link
Contributor

Thanks @kszucs per discussion I'm ok with this not making it into 1.0 release. I am working on making moving tzinfo_to_string to c++ so we can accurately capture tz instead of going to utc. I also still think a flag is a good idea to revert to buggy behavior. Want me to post here?

@kszucs
Copy link
Member Author

kszucs commented Jul 21, 2020

@emkornfield sure, please feel free to push to this PR. I'm going to work on the Time type support then.

@kszucs
Copy link
Member Author

kszucs commented Jul 22, 2020

@emkornfield I managed to implement TzinfoToString on the C++ side. Currently adding timezone inference support.

@kszucs kszucs changed the title [Python] Better timezone support ARROW-9528: [Python] Honor tzinfo when converting from datetime Jul 22, 2020
@github-actions
Copy link

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might not be true anymore with StringToTz?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it seems like pa.array(result) should recover struct and we want to verify that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for the thorough tests.

@emkornfield
Copy link
Contributor

@kszucs I pushed a few commits to:

  1. Fix typos I found on rereview.
  2. Plumb through turning off the use of TZ informaiton in C++ via options.
  3. Population in python via ARROW_NO_TZ environment variable

I'd appreciate any thoughts you have on testing the backwards compatibility (maybe it is sufficient to use this with spark integration tests?)

@kszucs
Copy link
Member Author

kszucs commented Aug 2, 2020

Thanks Micah!

Testing it via the spark integration test SGTM. If we think that ignoring timezones during conversion could be a useful feature we could even expose it properly on the python side.

@emkornfield
Copy link
Contributor

I'll let others chime in but I think this PR captures correct behavior, so I'd like to deprecate backwards compatibility after a release or two

@kszucs
Copy link
Member Author

kszucs commented Aug 2, 2020

I mean ignore_tomezone could be a useful conversion option for mixed timezone aware/naive input values (should be rare or rather discouraged though).

I agree with the deprecation plan. I'm on vacation next week, but hopefully @pitrou and @jorisvandenbossche will be able to take a look at it.

@emkornfield
Copy link
Contributor

@github-actions crossbow submit test-conda-python-3.8-spark-master

@github-actions
Copy link

github-actions bot commented Aug 3, 2020

Revision: e27e86cdc83dede4bb70c49ac5af2fb3a0494c51

Submitted crossbow builds: ursa-labs/crossbow @ actions-455

Task Status
test-conda-python-3.8-spark-master Github Actions

@emkornfield
Copy link
Contributor

@github-actions crossbow submit test-conda-python-3.8-spark-master

@github-actions
Copy link

github-actions bot commented Aug 3, 2020

Revision: b4e3b9d4dcf3da6c7fa3e652d5b77fc4e7abad13

Submitted crossbow builds: ursa-labs/crossbow @ actions-458

Task Status
test-conda-python-3.8-spark-master Github Actions

@emkornfield
Copy link
Contributor

@github-actions crossbow submit test-conda-python-3.7-spark-branch-3.0

@github-actions
Copy link

github-actions bot commented Aug 3, 2020

Revision: b4e3b9d4dcf3da6c7fa3e652d5b77fc4e7abad13

Submitted crossbow builds: ursa-labs/crossbow @ actions-462

Task Status
test-conda-python-3.7-spark-branch-3.0 Github Actions

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few scattered comments. I need to pull this down and kick the tires a bit to make sure that the new behavior matches my expectations

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these need to be exported?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the other utility functions are exported as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit ugly, could we pass an options struct instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought here was this parameter will only be need until the next release, so it would be better to add it here and then remove it without the need to introduce options. We can introduce options though if you feel strongly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be consistent about const?

Copy link
Contributor

@emkornfield emkornfield Aug 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, probably just late, but not sure what you mean here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it seems like pa.array(result) should recover struct and we want to verify that?

- Ports string_to_timezone to C++
- Causes nested timestamp columns within structs that have timezones to go through
  object conversion
- Copy timezone on to_object path.

Open to other suggestions on how to solve this.  It still seems like this is inconsistent with nested timestamps in lists which isn't great.

Closes apache#7604 from emkornfield/datetime

Lead-authored-by: Micah Kornfield <[email protected]>
Co-authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Wes McKinney <[email protected]>
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that @wesm already commented extensively on this, I've only looked at a small part of this PR.

@emkornfield
Copy link
Contributor

@wesm @kszucs thoughts on any remaining items on this?

@wesm
Copy link
Member

wesm commented Aug 15, 2020

Sorry for the delay, didn't have a lot of time for development this week. I am going to pull this down and kick the tires a bit this weekend and will merge later today or tomorrow

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 from me. Merging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants