Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable passing a CFTimedeltaCoder to decode_timedelta #9966

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions doc/internals/time-coding.rst
Original file line number Diff line number Diff line change
Expand Up @@ -473,3 +473,51 @@ on-disk resolution, if possible.

coder = xr.coders.CFDatetimeCoder(time_unit="s")
xr.open_dataset("test-datetimes2.nc", decode_times=coder)

Similar logic applies for decoding timedelta values. The default resolution is
``"ns"``:

.. ipython:: python

attrs = {"units": "hours"}
ds = xr.Dataset({"time": ("time", [0, 1, 2, 3], attrs)})
ds.to_netcdf("test-timedeltas1.nc")

.. ipython:: python

xr.open_dataset("test-timedeltas1.nc")

By default, timedeltas will be decoded to the same resolution as datetimes:

.. ipython:: python

coder = xr.coders.CFDatetimeCoder(time_unit="s")
xr.open_dataset("test-timedeltas1.nc", decode_times=coder)

but if one would like to decode timedeltas to a different resolution, one can
provide a coder specifically for timedeltas to ``decode_timedelta``:

.. ipython:: python

timedelta_coder = xr.coders.CFTimedeltaCoder(time_unit="ms")
xr.open_dataset(
"test-timedeltas1.nc", decode_times=coder, decode_timedelta=timedelta_coder
)

As with datetimes, if a coarser unit is requested the timedeltas are decoded
into their native on-disk resolution, if possible:

.. ipython:: python

attrs = {"units": "milliseconds"}
ds = xr.Dataset({"time": ("time", [0, 1, 2, 3], attrs)})
ds.to_netcdf("test-timedeltas2.nc")

.. ipython:: python

xr.open_dataset("test-timedeltas2.nc")

.. ipython:: python

coder = xr.coders.CFDatetimeCoder(time_unit="s")
xr.open_dataset("test-timedeltas2.nc", decode_times=coder)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still some unresolved issue, where we should think about a way forward. At least we could document it, like this:

Suggested change
xr.open_dataset("test-timedeltas2.nc", decode_times=coder)
xr.open_dataset("test-timedeltas2.nc", decode_times=coder)
To opt-out of timedelta decoding (see issue `Undesired decoding to timedelta64 <https://github.com/pydata/xarray/)issues/1621>`_) pass ``False`` to ``decode_timedelta``:
.. ipython:: python
xr.open_dataset("test-timedeltas2.nc", decode_times=False)

I'm not sure if it would be a good idea to change the default to decode_times=False in this cycle of changes (as discussed in #1621), but we could at least add a future warning when decode_timedelta=None. WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that is a good idea—thanks for reminding me about that issue. I'll work on an update when I get a chance.

37 changes: 25 additions & 12 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,23 +19,36 @@ What's New
v2025.01.2 (unreleased)
-----------------------

This release brings non-nanosecond datetime resolution to xarray. In the
last couple of releases xarray has been prepared for that change. The code had
to be changed and adapted in numerous places, affecting especially the test suite.
The documentation has been updated accordingly and a new internal chapter
on :ref:`internals.timecoding` has been added.

To make the transition as smooth as possible this is designed to be fully backwards
compatible, keeping the current default of ``'ns'`` resolution on decoding.
To opt-in decoding into other resolutions (``'us'``, ``'ms'`` or ``'s'``) the
new :py:class:`coders.CFDatetimeCoder` is used as parameter to ``decode_times``
kwarg (see also :ref:`internals.default_timeunit`):
This release brings non-nanosecond datetime and timedelta resolution to xarray.
In the last couple of releases xarray has been prepared for that change. The
code had to be changed and adapted in numerous places, affecting especially the
test suite. The documentation has been updated accordingly and a new internal
chapter on :ref:`internals.timecoding` has been added.

To make the transition as smooth as possible this is designed to be fully
backwards compatible, keeping the current default of ``'ns'`` resolution on
decoding. To opt-into decoding to other resolutions (``'us'``, ``'ms'`` or
``'s'``) an instance of the newly public :py:class:`coders.CFDatetimeCoder`
class can be passed through the ``decode_times`` keyword argument (see also
:ref:`internals.default_timeunit`):

.. code-block:: python

coder = xr.coders.CFDatetimeCoder(time_unit="s")
ds = xr.open_dataset(filename, decode_times=coder)

Similar control of the resoution of decoded timedeltas can be achieved through
passing a :py:class:`coders.CFTimedeltaCoder` instance to the
``decode_timedelta`` keyword argument:

.. code-block:: python

coder = xr.coders.CFTimedeltaCoder(time_unit="s")
ds = xr.open_dataset(filename, decode_timedelta=coder)

though by default timedeltas will be decoded to the same ``time_unit`` as
datetimes.

There might slight changes when encoding/decoding times as some warning and
error messages have been removed or rewritten. Xarray will now also allow
non-nanosecond datetimes (with ``'us'``, ``'ms'`` or ``'s'`` resolution) when
Expand All @@ -50,7 +63,7 @@ eventually be deprecated.

New Features
~~~~~~~~~~~~
- Relax nanosecond datetime restriction in CF time decoding (:issue:`7493`, :pull:`9618`).
- Relax nanosecond datetime / timedelta restriction in CF time decoding (:issue:`7493`, :pull:`9618`, :pull:`9966`).
By `Kai Mühlbauer <https://github.com/kmuehlbauer>`_ and `Spencer Clark <https://github.com/spencerkclark>`_.
- Improve the error message raised when no key is matching the available variables in a dataset. (:pull:`9943`)
By `Jimmy Westling <https://github.com/illviljan>`_.
Expand Down
43 changes: 31 additions & 12 deletions xarray/backends/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
_normalize_path,
)
from xarray.backends.locks import _get_scheduler
from xarray.coders import CFDatetimeCoder
from xarray.coders import CFDatetimeCoder, CFTimedeltaCoder
from xarray.core import indexing
from xarray.core.combine import (
_infer_concat_order_from_positions,
Expand Down Expand Up @@ -486,7 +486,10 @@ def open_dataset(
| CFDatetimeCoder
| Mapping[str, bool | CFDatetimeCoder]
| None = None,
decode_timedelta: bool | Mapping[str, bool] | None = None,
decode_timedelta: bool
| CFTimedeltaCoder
| Mapping[str, bool | CFTimedeltaCoder]
| None = None,
use_cftime: bool | Mapping[str, bool] | None = None,
concat_characters: bool | Mapping[str, bool] | None = None,
decode_coords: Literal["coordinates", "all"] | bool | None = None,
Expand Down Expand Up @@ -554,11 +557,14 @@ def open_dataset(
Pass a mapping, e.g. ``{"my_variable": False}``,
to toggle this feature per-variable individually.
This keyword may not be supported by all the backends.
decode_timedelta : bool or dict-like, optional
decode_timedelta : bool, CFTimedeltaCoder, or dict-like, optional
If True, decode variables and coordinates with time units in
{"days", "hours", "minutes", "seconds", "milliseconds", "microseconds"}
into timedelta objects. If False, leave them encoded as numbers.
If None (default), assume the same value of decode_time.
If None (default), assume the same value of ``decode_times``; if
``decode_times`` is a :py:class:`coders.CFDatetimeCoder` instance, this
takes the form of a :py:class:`coders.CFTimedeltaCoder` instance with a
matching ``time_unit``.
Pass a mapping, e.g. ``{"my_variable": False}``,
to toggle this feature per-variable individually.
This keyword may not be supported by all the backends.
Expand Down Expand Up @@ -711,7 +717,7 @@ def open_dataarray(
| CFDatetimeCoder
| Mapping[str, bool | CFDatetimeCoder]
| None = None,
decode_timedelta: bool | None = None,
decode_timedelta: bool | CFTimedeltaCoder | None = None,
use_cftime: bool | None = None,
concat_characters: bool | None = None,
decode_coords: Literal["coordinates", "all"] | bool | None = None,
Expand Down Expand Up @@ -784,7 +790,10 @@ def open_dataarray(
If True, decode variables and coordinates with time units in
{"days", "hours", "minutes", "seconds", "milliseconds", "microseconds"}
into timedelta objects. If False, leave them encoded as numbers.
If None (default), assume the same value of decode_time.
If None (default), assume the same value of ``decode_times``; if
``decode_times`` is a :py:class:`coders.CFDatetimeCoder` instance, this
takes the form of a :py:class:`coders.CFTimedeltaCoder` instance with a
matching ``time_unit``.
This keyword may not be supported by all the backends.
use_cftime: bool, optional
Only relevant if encoded dates come from a standard calendar
Expand Down Expand Up @@ -926,7 +935,10 @@ def open_datatree(
| CFDatetimeCoder
| Mapping[str, bool | CFDatetimeCoder]
| None = None,
decode_timedelta: bool | Mapping[str, bool] | None = None,
decode_timedelta: bool
| CFTimedeltaCoder
| Mapping[str, bool | CFTimedeltaCoder]
| None = None,
use_cftime: bool | Mapping[str, bool] | None = None,
concat_characters: bool | Mapping[str, bool] | None = None,
decode_coords: Literal["coordinates", "all"] | bool | None = None,
Expand Down Expand Up @@ -994,7 +1006,10 @@ def open_datatree(
If True, decode variables and coordinates with time units in
{"days", "hours", "minutes", "seconds", "milliseconds", "microseconds"}
into timedelta objects. If False, leave them encoded as numbers.
If None (default), assume the same value of decode_time.
If None (default), assume the same value of ``decode_times``; if
``decode_times`` is a :py:class:`coders.CFDatetimeCoder` instance, this
takes the form of a :py:class:`coders.CFTimedeltaCoder` instance with a
matching ``time_unit``.
Pass a mapping, e.g. ``{"my_variable": False}``,
to toggle this feature per-variable individually.
This keyword may not be supported by all the backends.
Expand Down Expand Up @@ -1149,7 +1164,10 @@ def open_groups(
| CFDatetimeCoder
| Mapping[str, bool | CFDatetimeCoder]
| None = None,
decode_timedelta: bool | Mapping[str, bool] | None = None,
decode_timedelta: bool
| CFTimedeltaCoder
| Mapping[str, bool | CFTimedeltaCoder]
| None = None,
use_cftime: bool | Mapping[str, bool] | None = None,
concat_characters: bool | Mapping[str, bool] | None = None,
decode_coords: Literal["coordinates", "all"] | bool | None = None,
Expand Down Expand Up @@ -1221,9 +1239,10 @@ def open_groups(
If True, decode variables and coordinates with time units in
{"days", "hours", "minutes", "seconds", "milliseconds", "microseconds"}
into timedelta objects. If False, leave them encoded as numbers.
If None (default), assume the same value of decode_time.
Pass a mapping, e.g. ``{"my_variable": False}``,
to toggle this feature per-variable individually.
If None (default), assume the same value of ``decode_times``; if
``decode_times`` is a :py:class:`coders.CFDatetimeCoder` instance, this
takes the form of a :py:class:`coders.CFTimedeltaCoder` instance with a
matching ``time_unit``.
This keyword may not be supported by all the backends.
use_cftime: bool or dict-like, optional
Only relevant if encoded dates come from a standard calendar
Expand Down
6 changes: 2 additions & 4 deletions xarray/coders.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,6 @@
"encoding/decoding" process.
"""

from xarray.coding.times import CFDatetimeCoder
from xarray.coding.times import CFDatetimeCoder, CFTimedeltaCoder

__all__ = [
"CFDatetimeCoder",
]
__all__ = ["CFDatetimeCoder", "CFTimedeltaCoder"]
21 changes: 18 additions & 3 deletions xarray/coding/times.py
Original file line number Diff line number Diff line change
Expand Up @@ -1343,6 +1343,20 @@ def decode(self, variable: Variable, name: T_Name = None) -> Variable:


class CFTimedeltaCoder(VariableCoder):
"""Coder for CF Timedelta coding.

Parameters
----------
time_unit : PDDatetimeUnitOptions
Target resolution when decoding timedeltas. Defaults to "ns".
"""

def __init__(
self,
time_unit: PDDatetimeUnitOptions = "ns",
) -> None:
self.time_unit = time_unit

def encode(self, variable: Variable, name: T_Name = None) -> Variable:
if np.issubdtype(variable.data.dtype, np.timedelta64):
dims, data, attrs, encoding = unpack_for_encoding(variable)
Expand All @@ -1362,9 +1376,10 @@ def decode(self, variable: Variable, name: T_Name = None) -> Variable:
dims, data, attrs, encoding = unpack_for_decoding(variable)

units = pop_to(attrs, encoding, "units")
transform = partial(decode_cf_timedelta, units=units)
# todo: check, if we can relax this one here, too
dtype = np.dtype("timedelta64[ns]")
dtype = np.dtype(f"timedelta64[{self.time_unit}]")
transform = partial(
decode_cf_timedelta, units=units, time_unit=self.time_unit
)
data = lazy_elemwise_func(data, transform, dtype=dtype)

return Variable(dims, data, attrs, encoding, fastpath=True)
Expand Down
24 changes: 17 additions & 7 deletions xarray/conventions.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@

import numpy as np

from xarray.coders import CFDatetimeCoder
from xarray.coding import strings, times, variables
from xarray.coders import CFDatetimeCoder, CFTimedeltaCoder
from xarray.coding import strings, variables
from xarray.coding.variables import SerializationWarning, pop_to
from xarray.core import indexing
from xarray.core.common import (
Expand Down Expand Up @@ -90,7 +90,7 @@ def encode_cf_variable(

for coder in [
CFDatetimeCoder(),
times.CFTimedeltaCoder(),
CFTimedeltaCoder(),
variables.CFScaleOffsetCoder(),
variables.CFMaskCoder(),
variables.NativeEnumCoder(),
Expand All @@ -114,7 +114,7 @@ def decode_cf_variable(
decode_endianness: bool = True,
stack_char_dim: bool = True,
use_cftime: bool | None = None,
decode_timedelta: bool | None = None,
decode_timedelta: bool | CFTimedeltaCoder | None = None,
) -> Variable:
"""
Decodes a variable which may hold CF encoded information.
Expand Down Expand Up @@ -158,6 +158,8 @@ def decode_cf_variable(

.. deprecated:: 2025.01.1
Please pass a :py:class:`coders.CFDatetimeCoder` instance initialized with ``use_cftime`` to the ``decode_times`` kwarg instead.
decode_timedelta : None, bool, or CFTimedeltaCoder
Decode cf timedeltas ("hours") to np.timedelta64.

Returns
-------
Expand All @@ -171,7 +173,10 @@ def decode_cf_variable(
original_dtype = var.dtype

if decode_timedelta is None:
decode_timedelta = True if decode_times else False
if isinstance(decode_times, CFDatetimeCoder):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we could add a FutureWarning, that decode_timedelta=None will default to decode_timedelta=False. See #1621.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the one thing that is kind of awkward about this is that it will warn when anyone calls open_dataset, even if there are no timedelta-like variables in the dataset. Would it make sense to limit the warning to only when timedelta variables are decoded and decode_timedelta was None?

decode_timedelta = CFTimedeltaCoder(time_unit=decode_times.time_unit)
else:
decode_timedelta = True if decode_times else False

if concat_characters:
if stack_char_dim:
Expand All @@ -193,7 +198,9 @@ def decode_cf_variable(
var = coder.decode(var, name=name)

if decode_timedelta:
var = times.CFTimedeltaCoder().decode(var, name=name)
if not isinstance(decode_timedelta, CFTimedeltaCoder):
decode_timedelta = CFTimedeltaCoder()
var = decode_timedelta.decode(var, name=name)
if decode_times:
# remove checks after end of deprecation cycle
if not isinstance(decode_times, CFDatetimeCoder):
Expand Down Expand Up @@ -335,7 +342,10 @@ def decode_cf_variables(
decode_coords: bool | Literal["coordinates", "all"] = True,
drop_variables: T_DropVariables = None,
use_cftime: bool | Mapping[str, bool] | None = None,
decode_timedelta: bool | Mapping[str, bool] | None = None,
decode_timedelta: bool
| CFTimedeltaCoder
| Mapping[str, bool | CFTimedeltaCoder]
| None = None,
) -> tuple[T_Variables, T_Attrs, set[Hashable]]:
"""
Decode several CF encoded variables.
Expand Down
Loading
Loading