Skip to content

Revise the version of Pandas being targeted #120

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TinyMarsh opened this issue May 19, 2023 · 7 comments · Fixed by #289
Closed

Revise the version of Pandas being targeted #120

TinyMarsh opened this issue May 19, 2023 · 7 comments · Fixed by #289
Assignees
Labels
bug Something isn't working infrastructure

Comments

@TinyMarsh
Copy link
Contributor

While experimenting with extending the CI workflow (#119) to test on a matrix of python versions (3.8 - 3.11) and platforms (Linux, Windows and MacOS), tests failed for Python versions 3.10 and 3.11 on all platforms.

The cause of failure in each of these combinations was the targeting of pandas<=1.3. As per the documentation, Pandas 1.3 does not support Python versions 3.10+. The minimum version of Pandas which supports up to Python 3.11 is Pandas 1.5.

Suggest investigating if MUSE's dependency of pandas<=1.3 is still relevant and, is possible, switch to targeting pandas>=1.5.

@TinyMarsh TinyMarsh changed the title Revise the version of pandas being targeted Revise the version of Pandas being targeted May 19, 2023
@dalonsoa dalonsoa added this to MUSE May 22, 2023
@dalonsoa dalonsoa moved this to 📋 Backlog in MUSE May 22, 2023
@dalonsoa dalonsoa added upstream Issue related to a dependency that needs updating of bug fixing infrastructure labels May 22, 2023
@dalonsoa dalonsoa mentioned this issue May 23, 2023
8 tasks
@dalonsoa dalonsoa moved this from 📋 Backlog to 🔖 Ready in MUSE Jun 1, 2023
@dalonsoa dalonsoa self-assigned this Jun 1, 2023
@dalonsoa dalonsoa moved this from 🔖 Ready to 🏗 In progress in MUSE Jun 1, 2023
@dalonsoa
Copy link
Collaborator

dalonsoa commented Jun 2, 2023

I've been investigating the removal of the pinning of some of the dependencies, in particular, pandas, to try to support Python 3.10 and 3.11. These are my findings:

  • It seems pandas version can be increased to <1.5 without any issues - all tests pass in Python 3.8 and 3.9.
  • In more modern versions of pandas (>=1.5), using iteritems is deprecated, which causes an error in xarray==2022.3.0. There might be other issues, but that's the one that pops first and makes things fail.
  • In Python 3.10, while keeping pandas==1.4.4, xarray==2022.3.0 fails on its own, too, due to a change in importlib. The fix was implemented in xarray==2022.6.0
  • If we increase xarray version to >=2022.6.0 everything breaks down in MUSE due to a massive refactoring of how xarray works. As a result, pretty much all in-place modifications of DataArrays and Datasets affecting the indexes are no longer possible. MUSE relies heavily in these in-place modifications. As an example of a change that would be needed, the following type of assignments:
finest.coords["finest_timeslice"] = index

will need to become:

finest = finest.drop_vars({"month", "finest_timeslice", "day", "hour"}).assign_coords(finest_timeslice=index)

as finest cannot have its indexing coordinate finest_timeslice modified in-place.

In summary, even though we can increase pandas version to 1.4.4, it does't help MUSE supporting more modern Python versions because then xarray refactoring becomes the limiting factor.

Adapting MUSE to use the newest versions of xarray and pandas is essential to ensure the sustainability of the tool, however, it will also be a major undertaking lasting several weeks of dedicated (and painful) effort and that will result in the modification of large portions of the codebase.

@sgiarols @ahawkes , up to you how you want to proceed.

@sgiarols
Copy link
Collaborator

sgiarols commented Jun 3, 2023

Thanks. I often regret that we moved to xarray, rather than just sticking to pandas, as xarray constantly undergoes massive updates. I think we need the tool should be sustained in the long-term and that an upgrade to the newest packages seems crucial at this stage. I would leave to @ahawkes final considerations, as the refactoring time, if we decide to go for it, may require a conversation on the engagement plans.

@dalonsoa
Copy link
Collaborator

dalonsoa commented Jun 3, 2023

I think xarray is the right tool for the job, but it is true that we picked it when it was in a very early state of development and it was not stable. That was possibly an error, indeed. The update in 2022.6.0 has annoyed a lot of people, but I think it has made the tool more mature and things should be more stable from now on.

Thinking in the future, I think it is important not to kick the can down the road and when incompatibilities arise with any new version of any package, we fix them rather than pinning to an older version because that only makes the tool to accumulate technical debt until it explodes.

Anyway, let's thing about it and maybe we can have a chat about how to best proceed.

@dalonsoa
Copy link
Collaborator

I've already started to work on this. Fixed a couple of issues, but I'm stuck in another. I've posted a question in StackOverflow in case someone can point me in the right direction: https://stackoverflow.com/questions/76471238/why-i-cannot-add-a-dataarray-to-an-existing-dataset-with-a-multiindex

@dalonsoa
Copy link
Collaborator

@sgiarols , it seems I might have hit a bug in 'xarray'. See pydata/xarray#7921 .

I'm not sure if I will be able to help fixing it, but I'll keep an eye.

If this is really a bug and is fixed, then I've the feeling our work refactoring 'muse' to work with the latest versions of 'xarray' and 'pandas' will be much easier.

In the meantime, I'll work on other stuff.

@dalonsoa
Copy link
Collaborator

Following on this, the minimum working example meant to reproduce our problem that I described in pydata/xarray#7921 is working fine. It is unclear why it is not working for us. I suspect the culprit is in the timeslices, which are a really complex structure and might have been created in a way not fully compatible with the current way of doing things. I'm trying to get to a minimum working example that works with our structure.

@dalonsoa
Copy link
Collaborator

@sgiarols I'm going to pack this as it has been identified as a proper bug and added to the list of bugs to be sorted out related to indexes (see https://github.com/pydata/xarray/projects/1#card-89778835). Until that is sorted out, there's really not much point on us trying to make MUSE to work with modern versions of pandas and xarray. I will move to other stuff.

@dalonsoa dalonsoa added bug Something isn't working on-hold Waiting for something else to happen before this can be tackled labels Aug 7, 2023
@dalonsoa dalonsoa moved this from 🏗 In progress to On hold in MUSE Sep 6, 2023
@dalonsoa dalonsoa moved this from On hold to 🔖 Ready in MUSE Apr 22, 2024
@dalonsoa dalonsoa moved this from 🔖 Ready to 🏗 In progress in MUSE May 3, 2024
@dalonsoa dalonsoa removed upstream Issue related to a dependency that needs updating of bug fixing on-hold Waiting for something else to happen before this can be tackled labels May 20, 2024
@dalonsoa dalonsoa moved this from 🏗 In progress to 👀 In review in MUSE Jun 5, 2024
@github-project-automation github-project-automation bot moved this from 👀 In review to ✅ Done in MUSE Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working infrastructure
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants