diff --git a/doc/internals/zarr-encoding-spec.rst b/doc/internals/zarr-encoding-spec.rst index c34c2f21ddd..7bbf8ab3bd4 100644 --- a/doc/internals/zarr-encoding-spec.rst +++ b/doc/internals/zarr-encoding-spec.rst @@ -43,9 +43,9 @@ When accessing arrays with zarr-python, this information is available in the arr metadata but not in the attributes dictionary. When reading a Zarr group, Xarray looks for dimension information in the appropriate -location based on the format version, raising an error if it can't be found. The +location based on the inferred format version, raising an error if it can't be found. The dimension information is used to define the variable dimension names and then -(for Zarr V2) removed from the attributes dictionary returned to the user. +(for Zarr V2) is removed from the attributes dictionary returned to the user. CF Conventions -------------- @@ -59,17 +59,14 @@ used to describe metadata in NetCDF and Zarr. Compatibility and Reading ------------------------- -Because of these encoding choices, Xarray cannot read arbitrary Zarr arrays, but only -Zarr data with valid dimension metadata. Xarray supports: +Because of these encoding choices, Xarray cannot read arbitrary Zarr groups, but only +Zarr groups containing arrays with valid dimension metadata. Xarray supports: -- Zarr V2 arrays with ``_ARRAY_DIMENSIONS`` attributes -- Zarr V3 arrays with ``dimension_names`` metadata -- `NCZarr `_ format - (dimension names are defined in the ``.zarray`` file) +1. Zarr V3 arrays with ``dimension_names`` metadata +2. Zarr V2 arrays with ``_ARRAY_DIMENSIONS`` attributes +3. `NCZarr `_ format (dimension names are defined in the ``dimrefs`` field in the custom ``.zarray`` file) -After decoding the dimension information and assigning the variable dimensions, -Xarray proceeds to [optionally] decode each variable using its standard CF decoding -machinery used for NetCDF data. +Xarray checks each of these three conventions, in the order given above, when looking for dimension name metadata. Note that while Xarray can read NCZarr groups, it currently does not write NCZarr groups. After decoding the dimension information and assigning the variable dimensions, Xarray proceeds to [optionally] decode each variable using its standard CF decoding machinery used for NetCDF data. Finally, it's worth noting that Xarray writes (and attempts to read) "consolidated metadata" by default (the ``.zmetadata`` file), which is another diff --git a/doc/whats-new.rst b/doc/whats-new.rst index 7e3badc7143..aa8391b8d4f 100644 --- a/doc/whats-new.rst +++ b/doc/whats-new.rst @@ -29,6 +29,9 @@ Bug Fixes - Ensure that ``keep_attrs='drop'`` and ``keep_attrs=False`` remove attrs from result, even when there is only one xarray object given to ``apply_ufunc`` (:issue:`10982` :pull:`10997`). By `Julia Signell `_. +- Slightly amend `Xarray's Zarr Encoding Specification doc `_ for clarity, and provide a code comment in ``xarray.backends.zarr._get_zarr_dims_and_attrs`` referencing the doc (:issue:`8749` :pull:`11013`). + By `Ewan Short `_. + Documentation ~~~~~~~~~~~~~ diff --git a/xarray/backends/zarr.py b/xarray/backends/zarr.py index fe004c212b6..b37989e6bbd 100644 --- a/xarray/backends/zarr.py +++ b/xarray/backends/zarr.py @@ -355,6 +355,9 @@ def _determine_zarr_chunks(enc_chunks, var_chunks, ndim, name): def _get_zarr_dims_and_attrs(zarr_obj, dimension_key, try_nczarr): + # Check for attributes and dimension name metadata as discussed in the Zarr encoding + # specification https://docs.xarray.dev/en/stable/internals/zarr-encoding-spec.html + # Zarr V3 explicitly stores the dimension names in the metadata try: # if this exists, we are looking at a Zarr V3 array