Dataset.from_dataframe: deprecate expanding the multi-index #8166

benbovy · 2023-09-10T15:54:31Z

What is your issue?

Let's continue here the discussion about changing the behavior of Dataset.from_dataframe (see #8140 (comment)).

The current behaviour of Dataset.from_dataframe where it always unstacks feels wrong to me.
To me, it seems sensible that Dataset.from_dataframe(df) automatically creates a Dataset with PandasMultiIndex if df has a MultiIndex. The user can then use that or quite easily unstack to a dense or sparse array.

If we don't unstack anymore the multi-index in Dataset.from_dataframe, are we OK that the "Dataset -> DataFrame -> Dataset" round-trip will not yield expected results unless we unstack explicitly?

ds = xr.Dataset(
    {"foo": (("x", "y"), [[1, 2], [3, 4]])},
    coords={"x": ["a", "b"], "y": [1, 2]},
)

df = ds.to_dataframe()
ds2 = xr.Dataset.from_dataframe(df, dim="z")

ds2.identical(ds)  # False

ds2.unstack("z").identical(ds)  # True

cc @max-sixty @dcherian

The text was updated successfully, but these errors were encountered:

max-sixty · 2023-09-10T18:50:55Z

That's a good point, and these invariants are indeed nice to uphold.

Is there a branch with the dim= code on? Or it's just a mental model atm? (I wrote a message but not sure it's correct so removed it, will rewrite with either the code or more thought!)

dcherian · 2023-09-11T03:31:50Z

Sorry I wasn't very clear in that thread.

I think we should avoid the dim argument for this reason.

We could just use "dim_X" if Index.name is None, and have the user manually rename to a name they like.

benbovy · 2023-09-11T06:20:50Z

Is there a branch with the dim= code on?

See #8170

max-sixty · 2024-10-19T18:58:35Z

Without any magical ideas for maintaining the from_dataframe / to_dataframe round-trip, I would be +1 on deprecating unstacking / expanding the multi-index; to the extent it helps us with finishing off the index refactor and fixing bugs such as #8646.

(personally I don't even use from_dataframe, I just do xr.Dataset(df), which doesn't unstack... So this would also have the advantage of unifying that behavior...)

benbovy added needs triage Issue that has not been reviewed by xarray team member design question and removed needs triage Issue that has not been reviewed by xarray team member labels Sep 10, 2023

benbovy linked a pull request Sep 11, 2023 that will close this issue

Dataset.from_dataframe: optionally keep multi-index unexpanded #8170

Draft

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset.from_dataframe: deprecate expanding the multi-index #8166

Dataset.from_dataframe: deprecate expanding the multi-index #8166

benbovy commented Sep 10, 2023 •

edited by dcherian

Loading

max-sixty commented Sep 10, 2023

dcherian commented Sep 11, 2023 •

edited

Loading

benbovy commented Sep 11, 2023

max-sixty commented Oct 19, 2024

Dataset.from_dataframe: deprecate expanding the multi-index #8166

Dataset.from_dataframe: deprecate expanding the multi-index #8166

Comments

benbovy commented Sep 10, 2023 • edited by dcherian Loading

What is your issue?

max-sixty commented Sep 10, 2023

dcherian commented Sep 11, 2023 • edited Loading

benbovy commented Sep 11, 2023

max-sixty commented Oct 19, 2024

benbovy commented Sep 10, 2023 •

edited by dcherian

Loading

dcherian commented Sep 11, 2023 •

edited

Loading