Improve iris.pandas cube -> data.frame #4669

hsteptoe · 2022-03-29T14:17:06Z

🚀 Pull Request

Description

Improve the conversion of a Cube to a Pandas data.frame. Aims to address #4526

Aims to deal with:

An arbitrary number of dimensions -> d8c5b40
Missing cube dimensions -> a46c08a
Add auxiliary coordinate information to dataframe -> 1df3072
Add global attribute data to dataframe -> 2c2ce69
Deal with copy issue
Update doc strings
Update tests (but not unit & integration testing)
Update Whats New
Check alignment with @trexfeathers Improvements to Pandas-to-Iris bridge #4890

Major proposed (possibly breaking) changes

Having functions to convert to a Series and a DataFrame seems superfluous. A Series is arguably a special case of a DataFrame, so I propose having only a function to do the DataFrame conversion, and then let folk further convert to a Series if they want to via standard pandas functions. This is done in 61b801a.
-I don't think the copy argument is valid any more. Working with 'long' format DataFrames there is no continuity between the cube.data array and the DataFrame in memory. copy=True is the default, but there is not option for copy=False any more. This is done in 071d07f. Copy ability retained.

WORK IN PROGRESS

for more information, see https://pre-commit.ci

hsteptoe · 2022-08-03T10:29:39Z

Good discussion with @trexfeathers about our joint work on Pandas-Iris bridging. We agreed that he would focus (#4890) on the DataFrame -> Cube, so this PR will only focus on Cube -> DataFrame

trexfeathers

Having learnt more about NumPy views and Pandas' copy behaviour, I think this module's original copy option is both valuable and also something we might be able to keep working. It's value is in cases where a large in-memory Cube is being converted, since copying will double the memory demand and could even blow the machine's memory. So if there is a prospect of avoiding copying I reckon it's worth exploring.

See my comments below.

lib/iris/pandas.py

for more information, see https://pre-commit.ci

lib/iris/pandas.py

trexfeathers

Thanks @hsteptoe! Looking good.

A few more for you...

lib/iris/pandas.py

Co-authored-by: Martin Yeo <[email protected]>

trexfeathers

Thanks @hsteptoe! We're basically there, just two final comments:

lib/iris/pandas.py

docs/src/whatsnew/latest.rst

Co-authored-by: Martin Yeo <[email protected]>

for more information, see https://pre-commit.ci

trexfeathers · 2022-11-04T17:19:36Z

Well I'm perplexed. They all passed for me locally!

for more information, see https://pre-commit.ci

hsteptoe · 2022-11-04T17:21:40Z

Well I'm perplexed. They all passed for me locally!

And for me... but it looked like there was some difference in the number of cols being printed in the github tests vs my local tests... now checked the output and matched to github test version...

trexfeathers · 2022-11-04T17:25:33Z

Well I'm perplexed. They all passed for me locally!

And for me... but it looked like there was some difference in the number of cols being printed in the github tests vs my local tests... now checked the output and matched to github test version...

I'm guessing Pandas is sensitive to available horizontal space when running. Rather frustrating as this gives us a situation where local behaves differently to CI. I won't let this get in the way of merging and I'll discuss with the team in future.

hsteptoe · 2022-11-04T17:30:01Z

Well I'm perplexed. They all passed for me locally!

And for me... but it looked like there was some difference in the number of cols being printed in the github tests vs my local tests... now checked the output and matched to github test version...

I'm guessing Pandas is sensitive to available horizontal space when running. Rather frustrating as this gives us a situation where local behaves differently to CI. I won't let this get in the way of merging and I'll discuss with the team in future.

Possibly some pandas options that we could explore: https://stackoverflow.com/a/11711637

trexfeathers · 2022-11-04T17:34:18Z

Super! Thanks @hsteptoe. And we made it in under 100 commits 😂

hsteptoe · 2022-11-07T07:38:38Z

Super! Thanks @hsteptoe. And we made it in under 100 commits 😂

🎉 @trexfeathers Thanks for all your help!

kaedonkers · 2022-11-17T13:45:35Z

Awesome, thanks gents!

Update as_data_frame()

914d47b

SciTools-assistant added the Blocked: CLA needed See https://scitools.org.uk. Submit the form at: https://scitools.org.uk/cla/v4/form label Mar 29, 2022

[pre-commit.ci] auto fixes from pre-commit.com hooks

26be4a5

for more information, see https://pre-commit.ci

hsteptoe mentioned this pull request Mar 29, 2022

Converting a cube to a Pandas dataframe #4526

Closed

2 tasks

rcomer added the Type: Enhancement label Mar 31, 2022

SciTools-assistant removed the Blocked: CLA needed See https://scitools.org.uk. Submit the form at: https://scitools.org.uk/cla/v4/form label Mar 31, 2022

hsteptoe and others added 15 commits July 27, 2022 16:02

Minor typo fixes

5c2fe98

Matching data raveling with dimension meshgrid

3bf85b2

Revise test_simple to check long-syle dataframe

a0e280c

Revise NaN and 1D dataframe tests

d41cbb6

Better pandas.MultiIndex solution

d8c5b40

Add 3D test case

03ded10

Fixes for cube with partially defined dims

a46c08a

Update tests for partially defined dims

2df0af3

Update time tests

1e0a5c8

Reuse _as_pandas_coord()

26cbcdf

[pre-commit.ci] auto fixes from pre-commit.com hooks

a7ce983

for more information, see https://pre-commit.ci

Remove Series conversion

61b801a

Remove option for copy

071d07f

Merge branch 'main' into better-pandas-conversion-issue-4526

2c20e87

[pre-commit.ci] auto fixes from pre-commit.com hooks

9bbada5

for more information, see https://pre-commit.ci

trexfeathers mentioned this pull request Aug 4, 2022

Improvements to Pandas-to-Iris bridge #4890

Merged

7 tasks

Merge branch 'SciTools:main' into better-pandas-conversion-issue-4526

e9d5d53

trexfeathers reviewed Aug 5, 2022

View reviewed changes

lib/iris/pandas.py Show resolved Hide resolved

lib/iris/pandas.py Outdated Show resolved Hide resolved

lib/iris/pandas.py Outdated Show resolved Hide resolved

lib/iris/pandas.py Outdated Show resolved Hide resolved

hsteptoe and others added 4 commits August 9, 2022 16:45

First go at adding aux coords

1df3072

First go at adding global attributes

2c2ce69

Update doc string

1c9fbff

[pre-commit.ci] auto fixes from pre-commit.com hooks

ef55874

for more information, see https://pre-commit.ci

hsteptoe commented Aug 9, 2022

View reviewed changes

lib/iris/pandas.py Outdated Show resolved Hide resolved

hsteptoe added 2 commits November 2, 2022 14:50

Split out metadata for consistency with as_cubes

4e65380

Add cell_measure tests

ea4502c

trexfeathers requested changes Nov 2, 2022

View reviewed changes

hsteptoe added 6 commits November 4, 2022 09:07

Docstring fixes

dcf8f63

Test kwarg fixes

f337c2d

_make_aux_coord_list optimisation

897973f

Refactor metadata merging

b25648e

Roundabout :issue: fix (remove)

213afb6

Roundabout :issue: fix (re-add)

d39b052

trexfeathers reviewed Nov 4, 2022

View reviewed changes

lib/iris/pandas.py Outdated Show resolved Hide resolved

Docstring typo fixes

be4fee8

Co-authored-by: Martin Yeo <[email protected]>

trexfeathers requested changes Nov 4, 2022

View reviewed changes

lib/iris/pandas.py Show resolved Hide resolved

trexfeathers reviewed Nov 4, 2022

View reviewed changes

docs/src/whatsnew/latest.rst Outdated Show resolved Hide resolved

hsteptoe and others added 4 commits November 4, 2022 16:58

Fix doctests

eca08e9

origin vs remote fixes

9432890

Update docs/src/whatsnew/latest.rst

6b1625d

Co-authored-by: Martin Yeo <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

3d5acc4

for more information, see https://pre-commit.ci

hsteptoe and others added 2 commits November 4, 2022 17:19

Further doctest fixes

64447f8

[pre-commit.ci] auto fixes from pre-commit.com hooks

9eff2a5

for more information, see https://pre-commit.ci

trexfeathers approved these changes Nov 4, 2022

View reviewed changes

trexfeathers merged commit 8744f67 into SciTools:pandas_ndim Nov 4, 2022

hsteptoe deleted the better-pandas-conversion-issue-4526 branch November 7, 2022 07:36

trexfeathers mentioned this pull request Nov 16, 2022

iris.pandas.as_data_frame() n-dimensional Cube conversion #5074

Merged

Improve iris.pandas cube -> data.frame #4669

Improve iris.pandas cube -> data.frame #4669

Uh oh!

Conversation

hsteptoe commented Mar 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 Pull Request

Description

Major proposed (possibly breaking) changes

Uh oh!

hsteptoe commented Aug 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

trexfeathers left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

trexfeathers left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

trexfeathers left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

trexfeathers commented Nov 4, 2022

Uh oh!

hsteptoe commented Nov 4, 2022

Uh oh!

trexfeathers commented Nov 4, 2022

Uh oh!

hsteptoe commented Nov 4, 2022

Uh oh!

trexfeathers commented Nov 4, 2022

Uh oh!

hsteptoe commented Nov 7, 2022

Uh oh!

kaedonkers commented Nov 17, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

hsteptoe commented Mar 29, 2022 •

edited

Loading

hsteptoe commented Aug 3, 2022 •

edited

Loading

trexfeathers left a comment •

edited

Loading

trexfeathers left a comment •

edited

Loading