-
Notifications
You must be signed in to change notification settings - Fork 300
Improve iris.pandas cube -> data.frame #4669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve iris.pandas cube -> data.frame #4669
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
|
Good discussion with @trexfeathers about our joint work on Pandas-Iris bridging. We agreed that he would focus (#4890) on the DataFrame -> Cube, so this PR will only focus on Cube -> DataFrame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having learnt more about NumPy views and Pandas' copy behaviour, I think this module's original copy option is both valuable and also something we might be able to keep working. It's value is in cases where a large in-memory Cube is being converted, since copying will double the memory demand and could even blow the machine's memory. So if there is a prospect of avoiding copying I reckon it's worth exploring.
See my comments below.
trexfeathers
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hsteptoe! Looking good.
A few more for you...
Co-authored-by: Martin Yeo <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hsteptoe! We're basically there, just two final comments:
Co-authored-by: Martin Yeo <[email protected]>
for more information, see https://pre-commit.ci
|
Well I'm perplexed. They all passed for me locally! |
for more information, see https://pre-commit.ci
And for me... but it looked like there was some difference in the number of cols being printed in the github tests vs my local tests... now checked the output and matched to github test version... |
I'm guessing Pandas is sensitive to available horizontal space when running. Rather frustrating as this gives us a situation where local behaves differently to CI. I won't let this get in the way of merging and I'll discuss with the team in future. |
Possibly some |
|
Super! Thanks @hsteptoe. And we made it in under 100 commits 😂 |
🎉 @trexfeathers Thanks for all your help! |
|
Awesome, thanks gents! |
🚀 Pull Request
Description
Improve the conversion of a Cube to a Pandas data.frame. Aims to address #4526
Aims to deal with:
copyissueMajor proposed (possibly breaking) changes
pandasfunctions. This is done in 61b801a.-
I don't think theCopy ability retained.copyargument is valid any more. Working with 'long' format DataFrames there is no continuity between the cube.data array and the DataFrame in memory.copy=Trueis the default, but there is not option forcopy=Falseany more. This is done in 071d07f.WORK IN PROGRESS