Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using netcdf3 with datetime64[ns] quickly overflows int32 #8641

Closed
5 tasks done
eivindjahren opened this issue Jan 22, 2024 · 5 comments
Closed
5 tasks done

Using netcdf3 with datetime64[ns] quickly overflows int32 #8641

eivindjahren opened this issue Jan 22, 2024 · 5 comments

Comments

@eivindjahren
Copy link
Contributor

eivindjahren commented Jan 22, 2024

What happened?

While trying to store datetimes into netcdf, ran into the problem of overflowing int32 when datetimes include nanoseconds.

What did you expect to happen?

First surprised that my data did not store successfully, but after investigating, come to understand that the netcdf3 format is quite limited. It would probably make sense to include some warning when using datetime64 when storing to netcdf3.

Minimal Complete Verifiable Example

import numpy as np
import xarray as xr
import datetime
dataset = xr.combine_by_coords(
        [
            xr.Dataset(
                {"value": (["step"], [0.0])},
                coords={
                    "step": np.array(
                        [datetime.datetime(2000, 1, 1, 0, 0)], dtype="datetime64[ns]"
                    ),
                },
            ),
            xr.Dataset(
                {"value": (["step"], [0.0])},
                coords={
                    "step": np.array(
                        [datetime.datetime(2000, 1, 1, 1, 0, 0, 1)],
                        dtype="datetime64[ns]",
                    ),
                },
            ),
        ]
    )
dataset.to_netcdf("./out.nc", engine="scipy")

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

File ~/.local/share/virtualenvs/ert-0_7in3Ct/lib/python3.11/site-packages/xarray/core/dataset.py:2303, in Dataset.to_netcdf(self, path
, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf)
   2300     encoding = {}
   2301 from xarray.backends.api import to_netcdf
-> 2303 return to_netcdf(  # type: ignore  # mypy cannot resolve the overloads:(
   2304     self,
   2305     path,
   2306     mode=mode,
   2307     format=format,
   2308     group=group,
   2309     engine=engine,
   2310     encoding=encoding,
   2311     unlimited_dims=unlimited_dims,
   2312     compute=compute,
   2313     multifile=False,
   2314     invalid_netcdf=invalid_netcdf,
   2315 )

File ~/.local/share/virtualenvs/ert-0_7in3Ct/lib/python3.11/site-packages/xarray/backends/api.py:1315, in to_netcdf(dataset, path_or_f
ile, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf)
   1310 # TODO: figure out how to refactor this logic (here and in save_mfdataset)
   1311 # to avoid this mess of conditionals
   1312 try:
   1313     # TODO: allow this work (setting up the file for writing array data)
   1314     # to be parallelized with dask
-> 1315     dump_to_store(
   1316         dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims
   1317     )
   1318     if autoclose:
   1319         store.close()

File ~/.local/share/virtualenvs/ert-0_7in3Ct/lib/python3.11/site-packages/xarray/backends/api.py:1362, in dump_to_store(dataset, store
, writer, encoder, encoding, unlimited_dims)
   1359 if encoder:
   1360     variables, attrs = encoder(variables, attrs)
-> 1362 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)

File ~/.local/share/virtualenvs/ert-0_7in3Ct/lib/python3.11/site-packages/xarray/backends/common.py:352, in AbstractWritableDataStore.
store(self, variables, attributes, check_encoding_set, writer, unlimited_dims)
    349 if writer is None:
    350     writer = ArrayWriter()
--> 352 variables, attributes = self.encode(variables, attributes)
    354 self.set_attributes(attributes)
    355 self.set_dimensions(variables, unlimited_dims=unlimited_dims)

File ~/.local/share/virtualenvs/ert-0_7in3Ct/lib/python3.11/site-packages/xarray/backends/common.py:442, in WritableCFDataStore.encode
(self, variables, attributes)
    438 def encode(self, variables, attributes):
    439     # All NetCDF files get CF encoded by default, without this attempting
    440     # to write times, for example, would fail.
    441     variables, attributes = cf_encoder(variables, attributes)
--> 442     variables = {k: self.encode_variable(v) for k, v in variables.items()}
    443     attributes = {k: self.encode_attribute(v) for k, v in attributes.items()}
    444     return variables, attributes

File ~/.local/share/virtualenvs/ert-0_7in3Ct/lib/python3.11/site-packages/xarray/backends/common.py:442, in <dictcomp>(.0)
    438 def encode(self, variables, attributes):
    439     # All NetCDF files get CF encoded by default, without this attempting
    440     # to write times, for example, would fail.
    441     variables, attributes = cf_encoder(variables, attributes)
--> 442     variables = {k: self.encode_variable(v) for k, v in variables.items()}
    443     attributes = {k: self.encode_attribute(v) for k, v in attributes.items()}
    444     return variables, attributes

File ~/.local/share/virtualenvs/ert-0_7in3Ct/lib/python3.11/site-packages/xarray/backends/scipy_.py:213, in ScipyDataStore.encode_vari
able(self, variable)
    212 def encode_variable(self, variable):
--> 213     variable = encode_nc3_variable(variable)
    214     return variable

File ~/.local/share/virtualenvs/ert-0_7in3Ct/lib/python3.11/site-packages/xarray/backends/netcdf3.py:114, in encode_nc3_variable(var)
    112     var = coder.encode(var)
    113 data = _maybe_prepare_times(var)
--> 114 data = coerce_nc3_dtype(data)
    115 attrs = encode_nc3_attrs(var.attrs)
    116 return Variable(var.dims, data, attrs, var.encoding)

File ~/.local/share/virtualenvs/ert-0_7in3Ct/lib/python3.11/site-packages/xarray/backends/netcdf3.py:68, in coerce_nc3_dtype(arr)
     66     cast_arr = arr.astype(new_dtype)
     67     if not (cast_arr == arr).all():
---> 68         raise ValueError(
     69             f"could not safely cast array from dtype {dtype} to {new_dtype}"
     70         )
     71     arr = cast_arr
     72 return arr

ValueError: could not safely cast array from dtype int64 to int32

Anything else we need to know?

No response

Environment

/home/eivind/.local/share/virtualenvs/ert-0_7in3Ct/lib/python3.11/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptool s is replacing distutils. warnings.warn("Setuptools is replacing distutils.")

INSTALLED VERSIONS

commit: None
python: 3.11.4 (main, Dec 7 2023, 15:43:41) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 6.2.0-39-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.9.3-development

xarray: 2023.10.1
pandas: 2.1.1
numpy: 1.26.1
scipy: 1.11.3
netCDF4: 1.6.5
pydap: None
h5netcdf: None
h5py: 3.10.0
Nio: None
zarr: None
cftime: 1.6.3
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.8.0
cartopy: None
seaborn: 0.13.1
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 63.4.3
pip: 23.3.1
conda: None
pytest: 7.4.4
mypy: 1.6.1
IPython: 8.17.2
sphinx: 7.1.2

@eivindjahren eivindjahren added bug needs triage Issue that has not been reviewed by xarray team member labels Jan 22, 2024
Copy link

welcome bot commented Jan 22, 2024

Thanks for opening your first issue here at xarray! Be sure to follow the issue template!
If you have an idea for a solution, we would really welcome a Pull Request with proposed changes.
See the Contributing Guide for more.
It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better.
Thank you!

@max-sixty
Copy link
Collaborator

I think we'd be open to a contribution here...

@max-sixty max-sixty added topic-backends and removed bug needs triage Issue that has not been reviewed by xarray team member labels Jan 22, 2024
@kmuehlbauer
Copy link
Contributor

@eivindjahren Will the changes over in #8575 (comment) which introduced a more descriptive warning resolve your issue?

@eivindjahren
Copy link
Contributor Author

@eivindjahren Will the changes over in #8575 (comment) which introduced a more descriptive warning resolve your issue?

Yes, thank you!

@kmuehlbauer
Copy link
Contributor

Great, thanks @spencerkclark!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants