You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* remove too-long underline
* draft section on data alignment
* fixes
* draft section on coordinate inheritance
* various improvements
* more improvements
* link from other page
* align call include all 3 datasets
* link back to use cases
* clarification
* small improvements
* remove TODO after #9532
* add todo about #9475
* correct xr.align example call
* add links to netCDF4 documentation
* Consistent voice
Co-authored-by: Maximilian Roos <[email protected]>
* keep indexes in lat lon selection to dodge #9475
* unpack generator properly
Co-authored-by: Stephan Hoyer <[email protected]>
* ideas for next section
* briefly summarize what alignment means
* clarify that it's the data in each node that was previously unrelated
* fix incorrect indentation of code block
* display the tree with redundant coordinates again
* remove content about non-inherited coords for a follow-up PR
* remove todo
* remove todo now that aggregations are re-implemented
* remove link to (unmerged) migration guide
* remove todo about improving error message
* correct statement in data-structures docs
* fix internal link
---------
Co-authored-by: Maximilian Roos <[email protected]>
Co-authored-by: Stephan Hoyer <[email protected]>
The data in different datatree nodes are not totally independent. In particular dimensions (and indexes) in child nodes must be exactly aligned with those in their parent nodes.
661
+
Exact aligment means that shared dimensions must be the same length, and indexes along those dimensions must be equal.
662
+
663
+
.. note::
664
+
If you were a previous user of the prototype `xarray-contrib/datatree <https://github.com/xarray-contrib/datatree>`_ package, this is different from what you're used to!
665
+
In that package the data model was that the data stored in each node actually was completely unrelated. The data model is now slightly stricter.
666
+
This allows us to provide features like :ref:`coordinate-inheritance`.
667
+
668
+
To demonstrate, let's first generate some example datasets which are not aligned with one another:
669
+
670
+
.. ipython:: python
671
+
672
+
# (drop the attributes just to make the printed representation shorter)
But we :ref:`previously said <why>` that multi-resolution data is a good use case for :py:class:`~xarray.DataTree`, so surely we should be able to store these in a single :py:class:`~xarray.DataTree`?
695
+
If we first try to create a :py:class:`~xarray.DataTree` with these different-length time dimensions present in both parents and children, we will still get an alignment error:
This is because DataTree checks that data in child nodes align exactly with their parents.
703
+
704
+
.. note::
705
+
This requirement of aligned dimensions is similar to netCDF's concept of `inherited dimensions <https://www.unidata.ucar.edu/software/netcdf/workshops/2007/groups-types/Introduction.html>`_, as in netCDF-4 files dimensions are `visible to all child groups <https://docs.unidata.ucar.edu/netcdf-c/current/groups.html>`_.
706
+
707
+
This alignment check is performed up through the tree, all the way to the root, and so is therefore equivalent to requiring that this :py:func:`~xarray.align` command succeeds:
708
+
709
+
.. code:: python
710
+
711
+
xr.align(child.dataset, *(parent.dataset for parent in child.parents), join="exact")
712
+
713
+
To represent our unalignable data in a single :py:class:`~xarray.DataTree`, we must instead place all variables which are a function of these different-length dimensions into nodes that are not direct descendents of one another, e.g. organize them as siblings.
Now we have a valid :py:class:`~xarray.DataTree` structure which contains all the data at each different time frequency, stored in a separate group.
723
+
724
+
This is a useful way to organise our data because we can still operate on all the groups at once.
725
+
For example we can extract all three timeseries at a specific lat-lon location:
726
+
727
+
.. ipython:: python
728
+
729
+
dt.sel(lat=75, lon=300)
730
+
731
+
or compute the standard deviation of each timeseries to find out how it varies with sampling frequency:
732
+
733
+
.. ipython:: python
734
+
735
+
dt.std(dim="time")
736
+
737
+
.. _coordinate-inheritance:
738
+
739
+
Coordinate Inheritance
740
+
~~~~~~~~~~~~~~~~~~~~~~
741
+
742
+
Notice that in the trees we constructed above there is some redundancy - the ``lat`` and ``lon`` variables appear in each sibling group, but are identical across the groups.
743
+
744
+
.. ipython:: python
745
+
746
+
dt
747
+
748
+
We can use "Coordinate Inheritance" to define them only once in a parent group and remove this redundancy, whilst still being able to access those coordinate variables from the child groups.
749
+
750
+
.. note::
751
+
This is also a new feature relative to the prototype `xarray-contrib/datatree <https://github.com/xarray-contrib/datatree>`_ package.
752
+
753
+
Let's instead place only the time-dependent variables in the child groups, and put the non-time-dependent ``lat`` and ``lon`` variables in the parent (root) group:
754
+
755
+
.. ipython:: python
756
+
757
+
dt = xr.DataTree.from_dict(
758
+
{
759
+
"/": ds.drop_dims("time"),
760
+
"daily": ds_daily.drop_vars(["lat", "lon"]),
761
+
"weekly": ds_weekly.drop_vars(["lat", "lon"]),
762
+
"monthly": ds_monthly.drop_vars(["lat", "lon"]),
763
+
}
764
+
)
765
+
dt
766
+
767
+
This is preferred to the previous representation because it now makes it clear that all of these datasets share common spatial grid coordinates.
768
+
Defining the common coordinates just once also ensures that the spatial coordinates for each group cannot become out of sync with one another during operations.
769
+
770
+
We can still access the coordinates defined in the parent groups from any of the child groups as if they were actually present on the child groups:
771
+
772
+
.. ipython:: python
773
+
774
+
dt.daily.coords
775
+
dt["daily/lat"]
776
+
777
+
As we can still access them, we say that the ``lat`` and ``lon`` coordinates in the child groups have been "inherited" from their common parent group.
778
+
779
+
If we print just one of the child nodes, it will still display inherited coordinates, but explicitly mark them as such:
780
+
781
+
.. ipython:: python
782
+
783
+
print(dt["/daily"])
784
+
785
+
This helps to differentiate which variables are defined on the datatree node that you are currently looking at, and which were defined somewhere above it.
786
+
787
+
We can also still perform all the same operations on the whole tree:
0 commit comments