Implement default dimension names in `open_zarr` #8749 #11006

eshort0401 · 2025-12-12T11:02:19Z

Closes Lack of resilience towards missing _ARRAY_DIMENSIONS xarray's special zarr attribute #280 #8749
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst

Addresses issue #8749 by implementing default dimensions when reading zarr stores with missing metadata. With this PR, if dimension names are missing xarray will try to build a Dataset from a zarr store using default dimension names, dim_0, dim_1 etc. Note we can only use default dimensions if each variable in the store has a consistent shape, as discussed by @TomNicholas and @etienneschalk discussed in #8749.

Motivating Example

Extending the example of @etienneschalk to both zarr 2 and 3 specifications, consider

import xarray as xr
import numpy as np
import json
from pathlib import Path
import shutil
import glob

# Create example dataset
da_a = xr.DataArray(np.arange(3 * 18).reshape(3, 18), dims=["label", "z"])
da_b = xr.DataArray([1, 2, 3], dims="label")
ds = xr.Dataset({"a": da_a, "b": da_b})
print(f"Original Dataset\n----------------\n{ds}\n")

# Save to zarr
ds_path = "./ds.zarr"
kwargs = {"consolidated": True, "zarr_format": 3} # Change these to check other cases
ds.to_zarr(ds_path, mode="w", **kwargs)

# Now simulate loading stored zarr without dimension name metadata

# Create functions for stripping dimension metadata from stored zarr
def strip_zarr_3(ds_path, stripped_ds_path):
    """Create a copy of a zarr 3 with dimension_names metadata removed."""    
    shutil.rmtree(stripped_ds_path, ignore_errors=True)
    shutil.copytree(ds_path, stripped_ds_path, dirs_exist_ok=True)
    # Get all the zarr.json metadata files. 
    metadata_files = glob.glob(f"{stripped_ds_path}/**/zarr.json", recursive=True)
    # Iterate through and remove all "dimension_names" entries
    for file in metadata_files:
        with open(file, "r") as f:
            metadata = json.load(f)
        metadata.pop("dimension_names", None)
        con_metadata = metadata.get("consolidated_metadata", None)
        if con_metadata:
            for k in con_metadata["metadata"].keys():
                con_metadata["metadata"][k].pop("dimension_names", None)
                
        with open(file, "w") as f:
            json.dump(metadata, f, indent=2)

def strip_zarr_2(ds_path, stripped_ds_path):
    """Create a copy of a zarr 2 with _ARRAY_DIMENSIONS metadata removed."""    
    # Get all the .zattrs metadata files. 
    # Note .zattrs are optional in zarr 2 
    # https://zarr-specs.readthedocs.io/en/latest/v2/v2.0.html#attributes
    shutil.rmtree(stripped_ds_path, ignore_errors=True)
    shutil.copytree(ds_path, stripped_ds_path, dirs_exist_ok=True)
    zattrs_files = glob.glob(f"{stripped_ds_path}/**/.zattrs", recursive=True)
    # Iterate through and remove all "_ARRAY_DIMENSIONS" entries
    for file in zattrs_files:
        with open(file, "r") as f:
            metadata = json.load(f)
        metadata.pop("_ARRAY_DIMENSIONS", None)
        with open(file, "w") as f:
            json.dump(metadata, f, indent=2)
    zmetadata_file = Path(stripped_ds_path) / ".zmetadata"
    if zmetadata_file.exists():
        with open(zmetadata_file, "r") as f:
            metadata = json.load(f)
        for k in metadata["metadata"].keys():
            metadata["metadata"][k].pop("_ARRAY_DIMENSIONS", None)
        with open(zmetadata_file, "w") as f:
            json.dump(metadata, f, indent=2)

# Strip dimension name metadata from the stored zarr
stripped_ds_path = "./stripped_ds.zarr"
if kwargs["zarr_format"] == 3:
    strip_zarr_3(ds_path, stripped_ds_path)
else:
    strip_zarr_2(ds_path, stripped_ds_path)

# Now load the stripped zarr; default dimension names are created. 
loaded_ds_2 = xr.open_zarr(stripped_ds_path, **kwargs).compute()
print(f"Stripped Dataset\n--------------\n{loaded_ds_2}\n")

With this PR, the code above will no longer raise an error, but instead return

Original Dataset
----------------
<xarray.Dataset> Size: 456B
Dimensions:  (label: 3, z: 18)
Dimensions without coordinates: label, z
Data variables:
    a        (label, z) int64 432B 0 1 2 3 4 5 6 7 8 ... 46 47 48 49 50 51 52 53
    b        (label) int64 24B 1 2 3

Stripped Dataset
--------------
<xarray.Dataset> Size: 456B
Dimensions:  (dim_0: 3, dim_1: 18)
Dimensions without coordinates: dim_0, dim_1
Data variables:
    a        (dim_0, dim_1) int64 432B 0 1 2 3 4 5 6 7 ... 47 48 49 50 51 52 53
    b        (dim_0) int64 24B 1 2 3

General Notes

It appears we have at least 3 zarr conventions considered by xarray.

xarray flavoured zarr 2, with the optional .zattrs https://zarr-specs.readthedocs.io/en/latest/v2/v2.0.html#attributes used to store dimension names in _ARRAY_DIMENSIONS.
zarr 3, with dimension names stored in the optional dimension_names metadata attribute https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html#dimension-names
NetCDF zarr https://docs.unidata.ucar.edu/netcdf/NUG/nczarr_head.html, which stores the dimension names in dim_refs.

The _get_zarr_dims_and_attrs function tries to get the dimension names by checking all three of these conventions. Perhaps the convention should be handled more explicitly somehow?

TomNicholas · 2025-12-12T13:00:54Z

Thanks for having a look at this! See my comment here #8749 (comment).

The _get_zarr_dims_and_attrs function tries to get the dimension names by checking all three of these conventions. Perhaps the convention should be handled more explicitly somehow?

This behaviour should be described in https://docs.xarray.dev/en/stable/internals/zarr-encoding-spec.html, and if it's not we should improve that docs page.

eshort0401 · 2025-12-13T03:54:11Z

Thanks heaps for the review @TomNicholas!

From #8749 (comment)

I have changed my mind - I don't think that trying to auto-infer some default dimension names makes sense for Zarr.

After thinking about this more I agree with you. I had interpreted @etienneschalk's example (#8749 (comment))

xr.Dataset({"xda_1": xr.DataArray([1]), "xda_2": xr.DataArray([2])})

as suggesting that Dataset infers dimension names, but of course it doesn't!

From #8749 (comment)

So actually I think the only correct thing to do here is either

raise an error. We can improve the error message (a PR for that would be welcome), but we shouldn't be trying to auto-infer names for the dimensions.

and from above

This behaviour should be described in https://docs.xarray.dev/en/stable/internals/zarr-encoding-spec.html, and if it's not we should improve that docs page.

Ok I'll close this PR and have another look at the error messages and the https://docs.xarray.dev/en/stable/internals/zarr-encoding-spec.html page as you suggest! Thanks again for your review, and your patience as I learn the depths of zarr and xarray!

TomNicholas · 2025-12-13T08:22:37Z

Thanks! I'm glad you agree.

eshort0401 added 4 commits December 12, 2025 20:55

implement default dims in open_zarr GH8749

45df347

fix precommit

98faf7e

Merge remote-tracking branch 'upstream/main' into make-zarr-robust

d639c14

fix docs

a296702

github-actions bot added topic-backends topic-zarr Related to zarr storage library io topic-NamedArray Lightweight version of Variable labels Dec 12, 2025

fix doctests and min version tests

96994f2

eshort0401 closed this Dec 13, 2025

eshort0401 mentioned this pull request Dec 13, 2025

Slightly Amend Zarr Encoding Specification Doc #8749 #11013

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Implement default dimension names in `open_zarr` #8749 #11006

Implement default dimension names in `open_zarr` #8749 #11006

eshort0401 commented Dec 12, 2025

Uh oh!

TomNicholas commented Dec 12, 2025

Uh oh!

eshort0401 commented Dec 13, 2025

Uh oh!

TomNicholas commented Dec 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Implement default dimension names in open_zarr #8749 #11006

Implement default dimension names in open_zarr #8749 #11006

Conversation

eshort0401 commented Dec 12, 2025

Motivating Example

General Notes

Uh oh!

TomNicholas commented Dec 12, 2025

Uh oh!

eshort0401 commented Dec 13, 2025

Uh oh!

TomNicholas commented Dec 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Implement default dimension names in `open_zarr` #8749 #11006

Implement default dimension names in `open_zarr` #8749 #11006