Make `iris.pandas.as_data_frame()` n-dimensional behaviour opt-in #5059

trexfeathers · 2022-11-10T14:58:21Z

🚀 Pull Request

Description

I think the diff might render unhelpfully. Here's a summary of what I have done:

TL;DR: I have written barely any new code - I've moved some stuff around and added some info for the user.

Restored iris.pandas.as_data_frame()'s default behaviour from Iris main
Contained the new improved behaviour within a check for iris.FUTURE.pandas_ndim
Restored the testing for iris.pandas.as_series() from Iris main
Restored the testing for iris.pandas.as_data_frame()'s default behaviour from Iris main
Moved the tests for the new improved behaviour into their own class (TestAsDataFrameNDim)
Added various warnings about the FUTURE switch into the code and the docstring
Added tests for the FUTURE warning, and some more for deprecation warnings that I felt were missing

Outstanding question

Currently iris.FUTURE.pandas_ndim only controls the behaviour of iris.pandas.as_data_frame() - this is the only function that definitely needs controlling as it's the only one with old and new behaviour. But for consistent UX, should I actually make all of iris.pandas sensitive to the iris.FUTURE.pandas_ndim switch?

Here is what that would look like:

Function	Dimensional Intelligence	`pandas_ndim == False`	`pandas_ndim == True`
`as_cube()`	Creates 1D/2D `Cube`s (deprecated)	✔ Enable	❌ Disable
`as_series()`	Converts 1D `Cube`s (deprecated)	✔ Enable	❌ Disable
`as_cubes()`	Creates n-D `Cube`s	❌ Disable	✔ Enable
`as_data_frame()`	Converts 2D OR n-D `Cube`s (opt-in)	💀 Legacy 2D behaviour	✨ Opt-in n-D behaviour

Consult Iris pull request check list

lbdreyer

Just a couple small tweaks and then this should be good to go.

Regarding your outstanding question

should I actually make all of iris.pandas sensitive to the iris.FUTURE.pandas_ndim switch?

I think it doesn't too much which we go for as both are quite reasonable. I think it's more of a question of how you document the future switch as that will determine a user's expectations of its behaviour.
I am however leaning towards keeping the switch only applicable to as_data_frame. The only real benefit I can see for making it applicable to all is that you are saying to the user "by enabling this, you are opting into the new world of pandas integration" but I don't think a user would feel much benefit from this? By doing that we are effectively forcing them to upgrade. They may have the (admittedly unusual) scenario of wanting to use as_series but also the new as_data_frame. They could handle all this with context managers but maybe that's just more annoying to have to keep including in your code?
So overall, I don't really mind but I lean towards keeping it applicable to as_data_frame only as I don't think the other option benefits the user more.

lbdreyer · 2022-11-16T11:11:53Z

lib/iris/tests/test_pandas.py

+        assert cube.data[0] == 99
+
+    def test_copy_int64_false(self):
+        cube = Cube(np.array([0, 1, 2, 3, 4], dtype=np.int32), long_name="foo")


Suggested change

cube = Cube(np.array([0, 1, 2, 3, 4], dtype=np.int32), long_name="foo")

cube = Cube(np.array([0, 1, 2, 3, 4], dtype=np.int64), long_name="foo")

Although you are restoring these tests, this test does look wrong and this looks like an easy fix so worth doing?

lbdreyer · 2022-11-16T11:35:59Z

docs/src/whatsnew/latest.rst

+   :func:`iris.pandas.as_data_frame`\'s conversion of :class:`~iris.cube.Cube`\s to
+   :class:`~pandas.DataFrame`\s. This includes better handling of multiple
+   :class:`~iris.cube.Cube` dimensions, auxiliary coordinates and attribute
+   information. **Note:** the improvements are opt-in, via :class:`iris.Future`.


Perhaps mentions the full future flag name iris.FUTURE.pandas_ndim so a user doesn't have to go looking in the docs for it (like how we do here in the 3.3 release)

trexfeathers · 2022-11-16T15:13:01Z

So overall, I don't really mind but I lean towards keeping it applicable to as_data_frame only as I don't think the other option benefits the user more.

Sounds good, especially as that's the path of no further changes!

trexfeathers · 2022-11-16T15:44:39Z

I can't fix the link-check failure without bringing in #5064. Should I do a cherry-pick, or should we just wait until the feature branch is merged into main?

lbdreyer · 2022-11-16T16:50:05Z

I can't fix the link-check failure without bringing in #5064. Should I do a cherry-pick, or should we just wait until the feature branch is merged into main?

I guess it depends how much more work you expect to do on this branch. If not much more (i.e. if you plan to merge the feature branch onto main right after this PR goes in) we can probably just leave it. If you plan to make more changes a cherry pick would be best

* added link to the docs archive. * added whatsnew

trexfeathers · 2022-11-16T17:14:06Z

Thanks @lbdreyer!

trexfeathers added 2 commits November 10, 2022 11:57

Restore original as_data_frame behaviour, with new behaviour as opt-in.

65bfb92

Document and test opt-in nature of mulit-d as_data_frame behaviour.

f860024

trexfeathers added Release: Minor Status: Decision Required Experience: Medium Type: Feature Branch Highlight this for a feature branch labels Nov 10, 2022

trexfeathers mentioned this pull request Nov 10, 2022

Improve iris.pandas cube -> data.frame #4669

Merged

9 tasks

trexfeathers added 2 commits November 10, 2022 15:33

Test fixes.

ef85735

Update for Pandas 1.5 deprecation.

eb21448

lbdreyer requested changes Nov 16, 2022

View reviewed changes

trexfeathers added 2 commits November 16, 2022 15:07

More explicit What's New.

3b9f21a

Fix int64 test.

4a934cf

added link to the docs archive. (SciTools#5064)

c5f6824

* added link to the docs archive. * added whatsnew

lbdreyer merged commit 6443ac5 into SciTools:pandas_ndim Nov 16, 2022

trexfeathers deleted the pandas_ndim_optin branch November 29, 2022 12:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `iris.pandas.as_data_frame()` n-dimensional behaviour opt-in #5059

Make `iris.pandas.as_data_frame()` n-dimensional behaviour opt-in #5059

Uh oh!

trexfeathers commented Nov 10, 2022 •

edited

Loading

Uh oh!

lbdreyer left a comment

Uh oh!

lbdreyer Nov 16, 2022

Uh oh!

trexfeathers Nov 16, 2022

Uh oh!

lbdreyer Nov 16, 2022

Uh oh!

trexfeathers Nov 16, 2022

Uh oh!

trexfeathers commented Nov 16, 2022

Uh oh!

trexfeathers commented Nov 16, 2022

Uh oh!

lbdreyer commented Nov 16, 2022

Uh oh!

trexfeathers commented Nov 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	cube = Cube(np.array([0, 1, 2, 3, 4], dtype=np.int32), long_name="foo")
	cube = Cube(np.array([0, 1, 2, 3, 4], dtype=np.int64), long_name="foo")

Make iris.pandas.as_data_frame() n-dimensional behaviour opt-in #5059

Make iris.pandas.as_data_frame() n-dimensional behaviour opt-in #5059

Uh oh!

Conversation

trexfeathers commented Nov 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 Pull Request

Description

TL;DR: I have written barely any new code - I've moved some stuff around and added some info for the user.

Outstanding question

Uh oh!

lbdreyer left a comment

Choose a reason for hiding this comment

Uh oh!

lbdreyer Nov 16, 2022

Choose a reason for hiding this comment

Uh oh!

trexfeathers Nov 16, 2022

Choose a reason for hiding this comment

Uh oh!

lbdreyer Nov 16, 2022

Choose a reason for hiding this comment

Uh oh!

trexfeathers Nov 16, 2022

Choose a reason for hiding this comment

Uh oh!

trexfeathers commented Nov 16, 2022

Uh oh!

trexfeathers commented Nov 16, 2022

Uh oh!

lbdreyer commented Nov 16, 2022

Uh oh!

trexfeathers commented Nov 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Make `iris.pandas.as_data_frame()` n-dimensional behaviour opt-in #5059

Make `iris.pandas.as_data_frame()` n-dimensional behaviour opt-in #5059

trexfeathers commented Nov 10, 2022 •

edited

Loading