Skip to content

Commit 6bea715

Browse files
kmuehlbauerpre-commit-ci[bot]shoyerdcherianspencerkclark
authored
Relax nanosecond datetime restriction in CF time decoding (#9618)
Co-authored-by: Stephan Hoyer <[email protected]> Co-authored-by: Deepak Cherian <[email protected]> Co-authored-by: Spencer Clark <[email protected]> Co-authored-by: Spencer Clark <[email protected]> Co-authored-by: Stephan Hoyer <[email protected]> Co-authored-by: Spencer Clark <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix scalar handling for timedelta based indexer * remove stale error message and "ignore:Converting non-default" in testsuite * add per review suggestions * add/remove todo * rename timeunit -> format * return "ns" resolution per default for timedeltas, if not specified * Be specific on types/dtpyes * add comment * add suggestions from code review * fix docs * fix test which isn't run for numpy2 atm * add notes on to_datetime section, update examples showing usage of 'as_unit' * use np.timedelta64 for to_timedelta example, update as_unit example, update note * remove note * Apply suggestions from code review Co-authored-by: Deepak Cherian <[email protected]> * refactor timedelta decoding to _numbers_to_timedelta and res-use it within decode_cf_timedelta * fix conventions test, add todo * run times through pd.Timestamp to catch possible overflows * fix tests for cftime_to_nptime * fix cftime_to_nptime in cftimeindex * introduce pd.Timestamp instance check * warn if out-of-bound datetimes are encoded with standard calendar, fall back to cftime encoding, add fix for cftime issue where python datetimes are not encoded correctly with date2num. * fix time-coding.rst, add reference to time-series.rst. * try to fix typing, ignore one * try to fix docs * revert doc-changes * Add a non-ns test for polyval, polyfit * more doc cosmetics * add whats-new.rst entry * add/fix coder docstring * add xr.date_range example as suggested per review * Apply suggestions from code review Co-authored-by: Spencer Clark <[email protected]> * Implement `time_unit` option for `decode_cf_timedelta` (#3) * Fix timedelta encoding overflow issue; always decode to ns resolution * Implement time_unit for decode_cf_timedelta * Reduce diff * fix typing * use nanmin/nanmax, catch numpy RuntimeWarnings * Apply suggestions from code review Co-authored-by: Kai Mühlbauer <[email protected]> --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Stephan Hoyer <[email protected]> Co-authored-by: Deepak Cherian <[email protected]> Co-authored-by: Spencer Clark <[email protected]> Co-authored-by: Deepak Cherian <[email protected]>
1 parent 2c8b6e6 commit 6bea715

29 files changed

+1310
-490
lines changed

doc/internals/index.rst

+1
Original file line numberDiff line numberDiff line change
@@ -26,3 +26,4 @@ The pages in this section are intended for:
2626
how-to-add-new-backend
2727
how-to-create-custom-index
2828
zarr-encoding-spec
29+
time-coding

doc/internals/time-coding.rst

+475
Large diffs are not rendered by default.

doc/user-guide/io.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -540,8 +540,8 @@ The ``units`` and ``calendar`` attributes control how xarray serializes ``dateti
540540
``timedelta64`` arrays to datasets on disk as numeric values. The ``units`` encoding
541541
should be a string like ``'days since 1900-01-01'`` for ``datetime64`` data or a string
542542
like ``'days'`` for ``timedelta64`` data. ``calendar`` should be one of the calendar types
543-
supported by netCDF4-python: 'standard', 'gregorian', 'proleptic_gregorian' 'noleap',
544-
'365_day', '360_day', 'julian', 'all_leap', '366_day'.
543+
supported by netCDF4-python: ``'standard'``, ``'gregorian'``, ``'proleptic_gregorian'``, ``'noleap'``,
544+
``'365_day'``, ``'360_day'``, ``'julian'``, ``'all_leap'``, ``'366_day'``.
545545

546546
By default, xarray uses the ``'proleptic_gregorian'`` calendar and units of the smallest time
547547
difference between values, with a reference time of the first time value.

doc/user-guide/time-series.rst

+33-14
Original file line numberDiff line numberDiff line change
@@ -21,20 +21,40 @@ core functionality.
2121
Creating datetime64 data
2222
------------------------
2323

24-
Xarray uses the numpy dtypes ``datetime64[ns]`` and ``timedelta64[ns]`` to
25-
represent datetime data, which offer vectorized (if sometimes buggy) operations
26-
with numpy and smooth integration with pandas.
24+
Xarray uses the numpy dtypes ``datetime64[unit]`` and ``timedelta64[unit]``
25+
(where unit is one of ``"s"``, ``"ms"``, ``"us"`` and ``"ns"``) to represent datetime
26+
data, which offer vectorized operations with numpy and smooth integration with pandas.
2727

2828
To convert to or create regular arrays of ``datetime64`` data, we recommend
2929
using :py:func:`pandas.to_datetime` and :py:func:`pandas.date_range`:
3030

3131
.. ipython:: python
3232
3333
pd.to_datetime(["2000-01-01", "2000-02-02"])
34+
pd.DatetimeIndex(
35+
["2000-01-01 00:00:00", "2000-02-02 00:00:00"], dtype="datetime64[s]"
36+
)
3437
pd.date_range("2000-01-01", periods=365)
38+
pd.date_range("2000-01-01", periods=365, unit="s")
39+
40+
It is also possible to use corresponding :py:func:`xarray.date_range`:
41+
42+
.. ipython:: python
43+
44+
xr.date_range("2000-01-01", periods=365)
45+
xr.date_range("2000-01-01", periods=365, unit="s")
46+
47+
48+
.. note::
49+
Care has to be taken to create the output with the wanted resolution.
50+
For :py:func:`pandas.date_range` the ``unit``-kwarg has to be specified
51+
and for :py:func:`pandas.to_datetime` the selection of the resolution
52+
isn't possible at all. For that :py:class:`pd.DatetimeIndex` can be used
53+
directly. There is more in-depth information in section
54+
:ref:`internals.timecoding`.
3555

3656
Alternatively, you can supply arrays of Python ``datetime`` objects. These get
37-
converted automatically when used as arguments in xarray objects:
57+
converted automatically when used as arguments in xarray objects (with us-resolution):
3858

3959
.. ipython:: python
4060
@@ -51,12 +71,13 @@ attribute like ``'days since 2000-01-01'``).
5171
.. note::
5272

5373
When decoding/encoding datetimes for non-standard calendars or for dates
54-
before year 1678 or after year 2262, xarray uses the `cftime`_ library.
74+
before `1582-10-15`_, xarray uses the `cftime`_ library by default.
5575
It was previously packaged with the ``netcdf4-python`` package under the
5676
name ``netcdftime`` but is now distributed separately. ``cftime`` is an
5777
:ref:`optional dependency<installing>` of xarray.
5878

5979
.. _cftime: https://unidata.github.io/cftime
80+
.. _1582-10-15: https://en.wikipedia.org/wiki/Gregorian_calendar
6081

6182

6283
You can manual decode arrays in this form by passing a dataset to
@@ -66,17 +87,15 @@ You can manual decode arrays in this form by passing a dataset to
6687
6788
attrs = {"units": "hours since 2000-01-01"}
6889
ds = xr.Dataset({"time": ("time", [0, 1, 2, 3], attrs)})
90+
# Default decoding to 'ns'-resolution
6991
xr.decode_cf(ds)
92+
# Decoding to 's'-resolution
93+
coder = xr.coders.CFDatetimeCoder(time_unit="s")
94+
xr.decode_cf(ds, decode_times=coder)
7095
71-
One unfortunate limitation of using ``datetime64[ns]`` is that it limits the
72-
native representation of dates to those that fall between the years 1678 and
73-
2262. When a netCDF file contains dates outside of these bounds, dates will be
74-
returned as arrays of :py:class:`cftime.datetime` objects and a :py:class:`~xarray.CFTimeIndex`
75-
will be used for indexing. :py:class:`~xarray.CFTimeIndex` enables a subset of
76-
the indexing functionality of a :py:class:`pandas.DatetimeIndex` and is only
77-
fully compatible with the standalone version of ``cftime`` (not the version
78-
packaged with earlier versions ``netCDF4``). See :ref:`CFTimeIndex` for more
79-
information.
96+
From xarray 2025.01.2 the resolution of the dates can be one of ``"s"``, ``"ms"``, ``"us"`` or ``"ns"``. One limitation of using ``datetime64[ns]`` is that it limits the native representation of dates to those that fall between the years 1678 and 2262, which gets increased significantly with lower resolutions. When a store contains dates outside of these bounds (or dates < `1582-10-15`_ with a Gregorian, also known as standard, calendar), dates will be returned as arrays of :py:class:`cftime.datetime` objects and a :py:class:`~xarray.CFTimeIndex` will be used for indexing.
97+
:py:class:`~xarray.CFTimeIndex` enables most of the indexing functionality of a :py:class:`pandas.DatetimeIndex`.
98+
See :ref:`CFTimeIndex` for more information.
8099

81100
Datetime indexing
82101
-----------------

doc/user-guide/weather-climate.rst

+13-16
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Weather and climate data
1010
1111
import xarray as xr
1212
13-
Xarray can leverage metadata that follows the `Climate and Forecast (CF) conventions`_ if present. Examples include :ref:`automatic labelling of plots<plotting>` with descriptive names and units if proper metadata is present and support for non-standard calendars used in climate science through the ``cftime`` module(Explained in the :ref:`CFTimeIndex` section). There are also a number of :ref:`geosciences-focused projects that build on xarray<ecosystem>`.
13+
Xarray can leverage metadata that follows the `Climate and Forecast (CF) conventions`_ if present. Examples include :ref:`automatic labelling of plots<plotting>` with descriptive names and units if proper metadata is present and support for non-standard calendars used in climate science through the ``cftime`` module (explained in the :ref:`CFTimeIndex` section). There are also a number of :ref:`geosciences-focused projects that build on xarray<ecosystem>`.
1414

1515
.. _Climate and Forecast (CF) conventions: https://cfconventions.org
1616

@@ -57,15 +57,14 @@ CF-compliant coordinate variables
5757

5858
.. _CFTimeIndex:
5959

60-
Non-standard calendars and dates outside the nanosecond-precision range
61-
-----------------------------------------------------------------------
60+
Non-standard calendars and dates outside the precision range
61+
------------------------------------------------------------
6262

6363
Through the standalone ``cftime`` library and a custom subclass of
6464
:py:class:`pandas.Index`, xarray supports a subset of the indexing
6565
functionality enabled through the standard :py:class:`pandas.DatetimeIndex` for
6666
dates from non-standard calendars commonly used in climate science or dates
67-
using a standard calendar, but outside the `nanosecond-precision range`_
68-
(approximately between years 1678 and 2262).
67+
using a standard calendar, but outside the `precision range`_ and dates prior to `1582-10-15`_.
6968

7069
.. note::
7170

@@ -75,18 +74,14 @@ using a standard calendar, but outside the `nanosecond-precision range`_
7574
any of the following are true:
7675

7776
- The dates are from a non-standard calendar
78-
- Any dates are outside the nanosecond-precision range.
77+
- Any dates are outside the nanosecond-precision range (prior xarray version 2025.01.2)
78+
- Any dates are outside the time span limited by the resolution (from xarray version 2025.01.2)
7979

8080
Otherwise pandas-compatible dates from a standard calendar will be
81-
represented with the ``np.datetime64[ns]`` data type, enabling the use of a
82-
:py:class:`pandas.DatetimeIndex` or arrays with dtype ``np.datetime64[ns]``
83-
and their full set of associated features.
81+
represented with the ``np.datetime64[unit]`` data type (where unit can be one of ``"s"``, ``"ms"``, ``"us"``, ``"ns"``), enabling the use of a :py:class:`pandas.DatetimeIndex` or arrays with dtype ``np.datetime64[unit]`` and their full set of associated features.
8482

8583
As of pandas version 2.0.0, pandas supports non-nanosecond precision datetime
86-
values. For the time being, xarray still automatically casts datetime values
87-
to nanosecond-precision for backwards compatibility with older pandas
88-
versions; however, this is something we would like to relax going forward.
89-
See :issue:`7493` for more discussion.
84+
values. From xarray version 2025.01.2 on, non-nanosecond precision datetime values are also supported in xarray (this can be parameterized via :py:class:`~xarray.coders.CFDatetimeCoder` and ``decode_times`` kwarg). See also :ref:`internals.timecoding`.
9085

9186
For example, you can create a DataArray indexed by a time
9287
coordinate with dates from a no-leap calendar and a
@@ -115,7 +110,7 @@ instance, we can create the same dates and DataArray we created above using:
115110
Mirroring pandas' method with the same name, :py:meth:`~xarray.infer_freq` allows one to
116111
infer the sampling frequency of a :py:class:`~xarray.CFTimeIndex` or a 1-D
117112
:py:class:`~xarray.DataArray` containing cftime objects. It also works transparently with
118-
``np.datetime64[ns]`` and ``np.timedelta64[ns]`` data.
113+
``np.datetime64`` and ``np.timedelta64`` data (with "s", "ms", "us" or "ns" resolution).
119114

120115
.. ipython:: python
121116
@@ -137,7 +132,9 @@ Conversion between non-standard calendar and to/from pandas DatetimeIndexes is
137132
facilitated with the :py:meth:`xarray.Dataset.convert_calendar` method (also available as
138133
:py:meth:`xarray.DataArray.convert_calendar`). Here, like elsewhere in xarray, the ``use_cftime``
139134
argument controls which datetime backend is used in the output. The default (``None``) is to
140-
use ``pandas`` when possible, i.e. when the calendar is standard and dates are within 1678 and 2262.
135+
use ``pandas`` when possible, i.e. when the calendar is ``standard``/``gregorian`` and dates starting with `1582-10-15`_. There is no such restriction when converting to a ``proleptic_gregorian`` calendar.
136+
137+
.. _1582-10-15: https://en.wikipedia.org/wiki/Gregorian_calendar
141138

142139
.. ipython:: python
143140
@@ -241,6 +238,6 @@ For data indexed by a :py:class:`~xarray.CFTimeIndex` xarray currently supports:
241238
242239
da.resample(time="81min", closed="right", label="right", offset="3min").mean()
243240
244-
.. _nanosecond-precision range: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timestamp-limitations
241+
.. _precision range: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timestamp-limitations
245242
.. _ISO 8601 standard: https://en.wikipedia.org/wiki/ISO_8601
246243
.. _partial datetime string indexing: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#partial-string-indexing

doc/whats-new.rst

+33-2
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,39 @@ What's New
1919
v2025.01.2 (unreleased)
2020
-----------------------
2121

22+
This release brings non-nanosecond datetime resolution to xarray. In the
23+
last couple of releases xarray has been prepared for that change. The code had
24+
to be changed and adapted in numerous places, affecting especially the test suite.
25+
The documentation has been updated accordingly and a new internal chapter
26+
on :ref:`internals.timecoding` has been added.
27+
28+
To make the transition as smooth as possible this is designed to be fully backwards
29+
compatible, keeping the current default of ``'ns'`` resolution on decoding.
30+
To opt-in decoding into other resolutions (``'us'``, ``'ms'`` or ``'s'``) the
31+
new :py:class:`coders.CFDatetimeCoder` is used as parameter to ``decode_times``
32+
kwarg (see also :ref:`internals.default_timeunit`):
33+
34+
.. code-block:: python
35+
36+
coder = xr.coders.CFDatetimeCoder(time_unit="s")
37+
ds = xr.open_dataset(filename, decode_times=coder)
38+
39+
There might slight changes when encoding/decoding times as some warning and
40+
error messages have been removed or rewritten. Xarray will now also allow
41+
non-nanosecond datetimes (with ``'us'``, ``'ms'`` or ``'s'`` resolution) when
42+
creating DataArray's from scratch, picking the lowest possible resolution:
43+
44+
.. ipython:: python
45+
46+
xr.DataArray(data=[np.datetime64("2000-01-01", "D")], dims=("time",))
47+
48+
In a future release the current default of ``'ns'`` resolution on decoding will
49+
eventually be deprecated.
50+
2251
New Features
2352
~~~~~~~~~~~~
24-
53+
- Relax nanosecond datetime restriction in CF time decoding (:issue:`7493`, :pull:`9618`).
54+
By `Kai Mühlbauer <https://github.com/kmuehlbauer>`_ and `Spencer Clark <https://github.com/spencerkclark>`_.
2555

2656
Breaking changes
2757
~~~~~~~~~~~~~~~~
@@ -37,7 +67,8 @@ Bug fixes
3767

3868
Documentation
3969
~~~~~~~~~~~~~
40-
70+
- A chapter on :ref:`internals.timecoding` is added to the internal section (:pull:`9618`).
71+
By `Kai Mühlbauer <https://github.com/kmuehlbauer>`_.
4172

4273
Internal Changes
4374
~~~~~~~~~~~~~~~~

xarray/backends/api.py

+6-3
Original file line numberDiff line numberDiff line change
@@ -775,7 +775,8 @@ def open_dataarray(
775775
be replaced by NA. This keyword may not be supported by all the backends.
776776
decode_times : bool, CFDatetimeCoder or dict-like, optional
777777
If True, decode times encoded in the standard NetCDF datetime format
778-
into datetime objects. Otherwise, use :py:class:`coders.CFDatetimeCoder` or leave them encoded as numbers.
778+
into datetime objects. Otherwise, use :py:class:`coders.CFDatetimeCoder` or
779+
leave them encoded as numbers.
779780
Pass a mapping, e.g. ``{"my_variable": False}``,
780781
to toggle this feature per-variable individually.
781782
This keyword may not be supported by all the backends.
@@ -984,7 +985,8 @@ def open_datatree(
984985
This keyword may not be supported by all the backends.
985986
decode_times : bool, CFDatetimeCoder or dict-like, optional
986987
If True, decode times encoded in the standard NetCDF datetime format
987-
into datetime objects. Otherwise, use :py:class:`coders.CFDatetimeCoder` or leave them encoded as numbers.
988+
into datetime objects. Otherwise, use :py:class:`coders.CFDatetimeCoder` or
989+
leave them encoded as numbers.
988990
Pass a mapping, e.g. ``{"my_variable": False}``,
989991
to toggle this feature per-variable individually.
990992
This keyword may not be supported by all the backends.
@@ -1210,7 +1212,8 @@ def open_groups(
12101212
This keyword may not be supported by all the backends.
12111213
decode_times : bool, CFDatetimeCoder or dict-like, optional
12121214
If True, decode times encoded in the standard NetCDF datetime format
1213-
into datetime objects. Otherwise, use :py:class:`coders.CFDatetimeCoder` or leave them encoded as numbers.
1215+
into datetime objects. Otherwise, use :py:class:`coders.CFDatetimeCoder` or
1216+
leave them encoded as numbers.
12141217
Pass a mapping, e.g. ``{"my_variable": False}``,
12151218
to toggle this feature per-variable individually.
12161219
This keyword may not be supported by all the backends.

xarray/coding/cftime_offsets.py

+4-14
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@
6464
from xarray.core.common import _contains_datetime_like_objects, is_np_datetime_like
6565
from xarray.core.pdcompat import (
6666
count_not_none,
67-
nanosecond_precision_timestamp,
67+
default_precision_timestamp,
6868
)
6969
from xarray.core.utils import attempt_import, emit_user_level_warning
7070

@@ -81,14 +81,6 @@
8181
T_FreqStr = TypeVar("T_FreqStr", str, None)
8282

8383

84-
def _nanosecond_precision_timestamp(*args, **kwargs):
85-
# As of pandas version 3.0, pd.to_datetime(Timestamp(...)) will try to
86-
# infer the appropriate datetime precision. Until xarray supports
87-
# non-nanosecond precision times, we will use this constructor wrapper to
88-
# explicitly create nanosecond-precision Timestamp objects.
89-
return pd.Timestamp(*args, **kwargs).as_unit("ns")
90-
91-
9284
def get_date_type(calendar, use_cftime=True):
9385
"""Return the cftime date type for a given calendar name."""
9486
if TYPE_CHECKING:
@@ -97,7 +89,7 @@ def get_date_type(calendar, use_cftime=True):
9789
cftime = attempt_import("cftime")
9890

9991
if _is_standard_calendar(calendar) and not use_cftime:
100-
return _nanosecond_precision_timestamp
92+
return default_precision_timestamp
10193

10294
calendars = {
10395
"noleap": cftime.DatetimeNoLeap,
@@ -1427,10 +1419,8 @@ def date_range_like(source, calendar, use_cftime=None):
14271419
if is_np_datetime_like(source.dtype):
14281420
# We want to use datetime fields (datetime64 object don't have them)
14291421
source_calendar = "standard"
1430-
# TODO: the strict enforcement of nanosecond precision Timestamps can be
1431-
# relaxed when addressing GitHub issue #7493.
1432-
source_start = nanosecond_precision_timestamp(source_start)
1433-
source_end = nanosecond_precision_timestamp(source_end)
1422+
source_start = default_precision_timestamp(source_start)
1423+
source_end = default_precision_timestamp(source_end)
14341424
else:
14351425
if isinstance(source, CFTimeIndex):
14361426
source_calendar = source.calendar

xarray/coding/cftimeindex.py

+3-2
Original file line numberDiff line numberDiff line change
@@ -581,13 +581,14 @@ def to_datetimeindex(self, unsafe=False):
581581
CFTimeIndex([2000-01-01 00:00:00, 2000-01-02 00:00:00],
582582
dtype='object', length=2, calendar='standard', freq=None)
583583
>>> times.to_datetimeindex()
584-
DatetimeIndex(['2000-01-01', '2000-01-02'], dtype='datetime64[ns]', freq=None)
584+
DatetimeIndex(['2000-01-01', '2000-01-02'], dtype='datetime64[us]', freq=None)
585585
"""
586586

587587
if not self._data.size:
588588
return pd.DatetimeIndex([])
589589

590-
nptimes = cftime_to_nptime(self)
590+
# transform to us-resolution is needed for DatetimeIndex
591+
nptimes = cftime_to_nptime(self, time_unit="us")
591592
calendar = infer_calendar_name(self)
592593
if calendar not in _STANDARD_CALENDARS and not unsafe:
593594
warnings.warn(

0 commit comments

Comments
 (0)