Skip to content

Conversation

@trexfeathers
Copy link
Contributor

@trexfeathers trexfeathers commented Nov 10, 2022

🚀 Pull Request

Description

I think the diff might render unhelpfully. Here's a summary of what I have done:

TL;DR: I have written barely any new code - I've moved some stuff around and added some info for the user.

  • Restored iris.pandas.as_data_frame()'s default behaviour from Iris main
  • Contained the new improved behaviour within a check for iris.FUTURE.pandas_ndim
  • Restored the testing for iris.pandas.as_series() from Iris main
  • Restored the testing for iris.pandas.as_data_frame()'s default behaviour from Iris main
  • Moved the tests for the new improved behaviour into their own class (TestAsDataFrameNDim)
  • Added various warnings about the FUTURE switch into the code and the docstring
  • Added tests for the FUTURE warning, and some more for deprecation warnings that I felt were missing

Outstanding question

Currently iris.FUTURE.pandas_ndim only controls the behaviour of iris.pandas.as_data_frame() - this is the only function that definitely needs controlling as it's the only one with old and new behaviour. But for consistent UX, should I actually make all of iris.pandas sensitive to the iris.FUTURE.pandas_ndim switch?

Here is what that would look like:

Function Dimensional Intelligence pandas_ndim == False pandas_ndim == True
as_cube() Creates 1D/2D Cubes (deprecated) ✔ Enable ❌ Disable
as_series() Converts 1D Cubes (deprecated) ✔ Enable ❌ Disable
as_cubes() Creates n-D Cubes ❌ Disable ✔ Enable
as_data_frame() Converts 2D OR n-D Cubes (opt-in) 💀 Legacy 2D behaviour ✨ Opt-in n-D behaviour

Consult Iris pull request check list

Copy link
Member

@lbdreyer lbdreyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple small tweaks and then this should be good to go.

Regarding your outstanding question

should I actually make all of iris.pandas sensitive to the iris.FUTURE.pandas_ndim switch?

I think it doesn't too much which we go for as both are quite reasonable. I think it's more of a question of how you document the future switch as that will determine a user's expectations of its behaviour.
I am however leaning towards keeping the switch only applicable to as_data_frame. The only real benefit I can see for making it applicable to all is that you are saying to the user "by enabling this, you are opting into the new world of pandas integration" but I don't think a user would feel much benefit from this? By doing that we are effectively forcing them to upgrade. They may have the (admittedly unusual) scenario of wanting to use as_series but also the new as_data_frame. They could handle all this with context managers but maybe that's just more annoying to have to keep including in your code?
So overall, I don't really mind but I lean towards keeping it applicable to as_data_frame only as I don't think the other option benefits the user more.

assert cube.data[0] == 99

def test_copy_int64_false(self):
cube = Cube(np.array([0, 1, 2, 3, 4], dtype=np.int32), long_name="foo")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cube = Cube(np.array([0, 1, 2, 3, 4], dtype=np.int32), long_name="foo")
cube = Cube(np.array([0, 1, 2, 3, 4], dtype=np.int64), long_name="foo")

Although you are restoring these tests, this test does look wrong and this looks like an easy fix so worth doing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:func:`iris.pandas.as_data_frame`\'s conversion of :class:`~iris.cube.Cube`\s to
:class:`~pandas.DataFrame`\s. This includes better handling of multiple
:class:`~iris.cube.Cube` dimensions, auxiliary coordinates and attribute
information. **Note:** the improvements are opt-in, via :class:`iris.Future`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps mentions the full future flag name iris.FUTURE.pandas_ndim so a user doesn't have to go looking in the docs for it (like how we do here in the 3.3 release)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trexfeathers
Copy link
Contributor Author

So overall, I don't really mind but I lean towards keeping it applicable to as_data_frame only as I don't think the other option benefits the user more.

Sounds good, especially as that's the path of no further changes!

@trexfeathers
Copy link
Contributor Author

I can't fix the link-check failure without bringing in #5064. Should I do a cherry-pick, or should we just wait until the feature branch is merged into main?

@lbdreyer
Copy link
Member

I can't fix the link-check failure without bringing in #5064. Should I do a cherry-pick, or should we just wait until the feature branch is merged into main?

I guess it depends how much more work you expect to do on this branch. If not much more (i.e. if you plan to merge the feature branch onto main right after this PR goes in) we can probably just leave it. If you plan to make more changes a cherry pick would be best

* added link to the docs archive.

* added whatsnew
@lbdreyer lbdreyer merged commit 6443ac5 into SciTools:pandas_ndim Nov 16, 2022
@trexfeathers
Copy link
Contributor Author

Thanks @lbdreyer!

@trexfeathers trexfeathers deleted the pandas_ndim_optin branch November 29, 2022 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants