Skip to content

Conversation

@hsteptoe
Copy link
Contributor

@hsteptoe hsteptoe commented Mar 29, 2022

🚀 Pull Request

Description

Improve the conversion of a Cube to a Pandas data.frame. Aims to address #4526

Aims to deal with:

Major proposed (possibly breaking) changes

  • Having functions to convert to a Series and a DataFrame seems superfluous. A Series is arguably a special case of a DataFrame, so I propose having only a function to do the DataFrame conversion, and then let folk further convert to a Series if they want to via standard pandas functions. This is done in 61b801a.
    -I don't think the copy argument is valid any more. Working with 'long' format DataFrames there is no continuity between the cube.data array and the DataFrame in memory. copy=True is the default, but there is not option for copy=False any more. This is done in 071d07f. Copy ability retained.

WORK IN PROGRESS

@SciTools-assistant SciTools-assistant added the Blocked: CLA needed See https://scitools.org.uk. Submit the form at: https://scitools.org.uk/cla/v4/form label Mar 29, 2022
@SciTools-assistant SciTools-assistant removed the Blocked: CLA needed See https://scitools.org.uk. Submit the form at: https://scitools.org.uk/cla/v4/form label Mar 31, 2022
@hsteptoe
Copy link
Contributor Author

hsteptoe commented Aug 3, 2022

Good discussion with @trexfeathers about our joint work on Pandas-Iris bridging. We agreed that he would focus (#4890) on the DataFrame -> Cube, so this PR will only focus on Cube -> DataFrame

Copy link
Contributor

@trexfeathers trexfeathers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having learnt more about NumPy views and Pandas' copy behaviour, I think this module's original copy option is both valuable and also something we might be able to keep working. It's value is in cases where a large in-memory Cube is being converted, since copying will double the memory demand and could even blow the machine's memory. So if there is a prospect of avoiding copying I reckon it's worth exploring.

See my comments below.

Copy link
Contributor

@trexfeathers trexfeathers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hsteptoe! Looking good.

A few more for you...

Co-authored-by: Martin Yeo <[email protected]>
Copy link
Contributor

@trexfeathers trexfeathers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hsteptoe! We're basically there, just two final comments:

@trexfeathers
Copy link
Contributor

Well I'm perplexed. They all passed for me locally!

@hsteptoe
Copy link
Contributor Author

hsteptoe commented Nov 4, 2022

Well I'm perplexed. They all passed for me locally!

And for me... but it looked like there was some difference in the number of cols being printed in the github tests vs my local tests... now checked the output and matched to github test version...

@trexfeathers
Copy link
Contributor

Well I'm perplexed. They all passed for me locally!

And for me... but it looked like there was some difference in the number of cols being printed in the github tests vs my local tests... now checked the output and matched to github test version...

I'm guessing Pandas is sensitive to available horizontal space when running. Rather frustrating as this gives us a situation where local behaves differently to CI. I won't let this get in the way of merging and I'll discuss with the team in future.

@hsteptoe
Copy link
Contributor Author

hsteptoe commented Nov 4, 2022

Well I'm perplexed. They all passed for me locally!

And for me... but it looked like there was some difference in the number of cols being printed in the github tests vs my local tests... now checked the output and matched to github test version...

I'm guessing Pandas is sensitive to available horizontal space when running. Rather frustrating as this gives us a situation where local behaves differently to CI. I won't let this get in the way of merging and I'll discuss with the team in future.

Possibly some pandas options that we could explore: https://stackoverflow.com/a/11711637

@trexfeathers
Copy link
Contributor

Super! Thanks @hsteptoe. And we made it in under 100 commits 😂

@trexfeathers trexfeathers merged commit 8744f67 into SciTools:pandas_ndim Nov 4, 2022
@hsteptoe hsteptoe deleted the better-pandas-conversion-issue-4526 branch November 7, 2022 07:36
@hsteptoe
Copy link
Contributor Author

hsteptoe commented Nov 7, 2022

Super! Thanks @hsteptoe. And we made it in under 100 commits 😂

🎉 @trexfeathers Thanks for all your help!

@kaedonkers
Copy link
Member

Awesome, thanks gents!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants