Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor update coordinates to better handle multi-coordinate indexes #8094

Merged
merged 14 commits into from
Aug 29, 2023

Conversation

benbovy
Copy link
Member

@benbovy benbovy commented Aug 21, 2023

This refactor should better handle multi-coordinate indexes when updating (or assigning) new coordinates.

It also fixes, better isolates and better warns a bunch of deprecated pandas multi-index special cases (i.e., directly passing pd.MultiIndex objects or updating a multi-index dimension coordinate). I very much look forward to seeing support for those cases dropped :).

Fix more cases with multi-coordinate indexes:

- do not try to align existing indexed coordinates with the new
coordinates that will fully replace them

- raise early if the new coordinates would corrupt the existing indexed
coordinates

- isolate PandasMultiIndex special cases so that it will be easier to
drop support for it later (and warn now about deprecation)
when DataArray objects are passed as new coordinate objects
Need to update (replace) coordinates and data variables separately to
ensure it goes through all (indexed) coordinate update checks.
@benbovy benbovy added the run-benchmark Run the ASV benchmark workflow label Aug 21, 2023
@dcherian
Copy link
Contributor

Can you check if this benchmark regression is real

indexing.AssignmentOptimized.time_assign_no_reindex

@benbovy
Copy link
Member Author

benbovy commented Aug 22, 2023

@dcherian the benchmark regression is fixed.

I think this is ready for review.

Regarding the multi-index deprecation warnings: there are still a lot of places where pandas.MultiIndex objects are passed around directly as coordinates in xarray's internals (both code and tests). They may also be promoted indirectly as in swap_dims(), which doesn't show any warning when this occurs. I will fix that in a follow-up PR.

@dcherian
Copy link
Contributor

dcherian commented Aug 22, 2023

I can't review the code really but have suggestions on the warning message.

Can you silence these two:

xarray/tests/test_variable.py: 1 warning
xarray/tests/test_dataset.py::TestDataset::test_constructor_with_coords

I think the warning should show the syntax for "passing it as coordinates"

 If you want to keep this behavior, you need to first wrap it explicitly using `mindex = xarray.Coordinates.from_pandas_multiindex()` and pass it as coordinates with
`Dataset.coords['x'] = mindex` or ``DataArray.coords['x'] = mindex

@benbovy
Copy link
Member Author

benbovy commented Aug 23, 2023

Can you silence these two?

Do you mind if I address this in a follow-up PR? I started silencing warnings (there are a few dozen more in the tests!). Also there are still quite many cases where the warning should be emitted but is not (e.g., passing pd.MultiIndex as DataArray coords). This will need more review that is out of the scope of this PR.

@benbovy benbovy removed the run-benchmark Run the ASV benchmark workflow label Aug 23, 2023
@benbovy benbovy added the plan to merge Final call for comments label Aug 25, 2023
@benbovy benbovy merged commit 1fedfd8 into pydata:main Aug 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment