Skip to content

Commit

Permalink
more edits
Browse files Browse the repository at this point in the history
  • Loading branch information
dcherian committed Jan 31, 2025
1 parent 148a06e commit 27479dc
Show file tree
Hide file tree
Showing 2 changed files with 14 additions and 14 deletions.
26 changes: 13 additions & 13 deletions docs/docs/icechunk-python/xarray.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,15 +12,21 @@ and `icechunk.xarray.to_icechunk` methods.
pip install "xarray>=2025.1.1"
```

!!! note "`to_icechunk` vs `to_zarr`"
!!!note "`to_icechunk` vs `to_zarr`"

[`xarray.Dataset.to_zarr`](https://docs.xarray.dev/en/latest/generated/xarray.Dataset.to_zarr.html#xarray.Dataset.to_zarr)
and [`to_icechunk`](./reference.md#icechunk.xarray.to_icechunk) are nearly functionally identical. In a a distributed context, e.g.
and [`to_icechunk`](./reference.md#icechunk.xarray.to_icechunk) are nearly functionally identical.

In a distributed context, e.g.
writes orchestrated with `multiprocesssing` or a `dask.distributed.Client` and `dask.array`, you *must* use `to_icechunk`.
This will ensure that you can execute a commit that successfully records all remote writes.
See [these docs on orchestrating parallel writes](./parallel.md) and [these docs on dask.array with distributed](./dask.md#icechunk-dask-xarray)
for more.

If using `to_zarr`, remember to set `zarr_format=3, consolidated=False`. Consolidated metadata
is unnecessary (and unsupported) in Icechunk. Icechunk already organizes the dataset metadata
in a way that makes it very fast to fetch from storage.


In this example, we'll explain how to create a new Icechunk repo, write some sample data
to it, and append data a second block of data using Icechunk's version control features.
Expand Down Expand Up @@ -82,19 +88,13 @@ Create a new writable session on the `main` branch to get the `IcechunkStore`:
session = repo.writable_session("main")
```

Writing Xarray data to Icechunk is as easy as calling `Dataset.to_zarr`:
Writing Xarray data to Icechunk is as easy as calling `to_icechunk`:

```python
ds1.to_zarr(session.store, zarr_format=3, consolidated=False)
```
from icechunk.xarray import to_icechunk

!!! note

1. [Consolidated metadata](https://docs.xarray.dev/en/latest/user-guide/io.html#consolidated-metadata)
is unnecessary (and unsupported) in Icechunk.
Icechunk already organizes the dataset metadata in a way that makes it very
fast to fetch from storage.
2. `zarr_format=3` is required until the default Zarr format changes in Xarray.
to_icechunk(ds, session)
```

After writing, we commit the changes using the session:

Expand All @@ -111,7 +111,7 @@ this reason. Again, we'll use `Dataset.to_zarr`, this time with `append_dim='tim
```python
# we have to get a new session after committing
session = repo.writable_session("main")
ds2.to_zarr(session.store, append_dim='time')
to_icechunk(ds2, session, append_dim='time')
```

And then we'll commit the changes:
Expand Down
2 changes: 1 addition & 1 deletion icechunk-python/python/icechunk/xarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -282,7 +282,7 @@ def to_icechunk(
- If ``region`` is set, _all_ variables in a dataset must have at
least one dimension in common with the region. Other variables
should be written in a separate single call to ``to_zarr()``.
should be written in a separate single call to ``to_icechunk()``.
- Dimensions cannot be included in both ``region`` and
``append_dim`` at the same time. To create empty arrays to fill
in with ``region``, use the `XarrayDatasetWriter` directly.
Expand Down

0 comments on commit 27479dc

Please sign in to comment.