-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass variable name to encode_zarr_variable
#8809
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I don't know as much as others about the code, this looks good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @slevang! One comment on backward compat but otherwise looks good.
Co-authored-by: Joe Hamman <[email protected]>
I think the real error is that resetting the MultiIndex doesn't clear out the index for the levels: Using
makes it all work. I don't think this change is needed. See #8887. I don't understand why only Zarr errors out though. |
Here's an approximate stack of function calls that leads to this being an issue, and why it only applies to zarr: # ZarrStore inherits from:
class AbstractWritableDataStore:
def encode():
variables = {k: self.encode_variable(v) for k, v in variables.items()}
encode_zarr_variable(variable) # <- NO NAME PASSED HERE
conventions.encode_cf_variable(var, name=name) # <- name is now always None
ensure_not_multiindex(var, name=name) # <- fails
# NetCDF4DataStore/H5NetCDFArrayWrapper inherit from:
class WritableCFDataStore(AbstractWritableDataStore):
# which overrides encode:
def encode():
conventions.cf_encoder(variables, attributes)
new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()} # <- NAME IS PASSED HERE
ensure_not_multiindex(var, name=name) # <- all good I don't know the background on why this structure exists, but it does lead to a subtle but fundamental difference in the multiindex check specifically. #8888 interestingly fixes the test I added here, but still doesn't fix the zarr serialization issue with DataTree in my original post. Not sure why. |
The bug likely happens when creating a new dataset by passing a DataArray object wrapping a multi-index (i.e., I would advise against doing that if the goal is to pass the whole thing, i.e., the multi-index and both its dimension and level coordinate variables (I'd like to deprecate it in #8140). Instead I'd strongly suggest passing all multi-index coordinates explicitly with |
Ah that does it, thanks! Happy to either leave this up or close it then - understand this was a pretty niche issue. |
I'll close this since it's probably unnecessary but seems like we should try to push #8140 through to prevent this sort of confusion in the future. |
xarray>=2024.1.0
xarray-contrib/xeofs#148whats-new.rst
The change from #8672 mostly fixed the issue of serializing a reset multiindex in the backends, but there was an additional niche issue that turned up in xeofs that was causing serialization to still fail on the zarr backend.
The issue is that zarr is the only backend that uses a custom version of
encode_cf_variable
calledencode_zarr_variable
, and the way this gets called we don't pass through thename
of the variable before runningensure_not_multiindex
.As a minimal fix, this PR just passes
name
through as an additional arg to the generalencode_variable
function. See @benbovy's comment that maybe we should actually unwrap the level coordinate inreset_index
and clean up the checks inensure_not_multiindex
, but I wasn't able to get that working easily.The exact workflow this turned up in involves DataTree and looks like this:
But we can reproduce in xarray with the test added here.