Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset.to_array() throws IndexError for empty datasets #7872

Open
4 tasks done
sehoffmann opened this issue May 24, 2023 · 2 comments
Open
4 tasks done

Dataset.to_array() throws IndexError for empty datasets #7872

sehoffmann opened this issue May 24, 2023 · 2 comments
Labels

Comments

@sehoffmann
Copy link

What happened?

>>> xr.__version__
'2023.4.2'
>>> xr.Dataset().to_array()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/mnt/qb/work2/goswami0/gkd021/conda/envs/wb/lib/python3.10/site-packages/xarray/core/dataset.py", line 6114, in to_array
    data = duck_array_ops.stack([b.data for b in broadcast_vars], axis=0)
  File "/mnt/qb/work2/goswami0/gkd021/conda/envs/wb/lib/python3.10/site-packages/xarray/core/duck_array_ops.py", line 326, in stack
    xp = get_array_namespace(arrays[0])
IndexError: list index out of range

What did you expect to happen?

The most reasonable way to handle this in my opinion would be to return an empty, i.e. default constructed, xr.DataArray:

>>> xr.DataArray()
<xarray.DataArray ()>
array(nan)

Minimal Complete Verifiable Example

No response

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.10.9 (main, Jan 11 2023, 15:21:40) [GCC 11.2.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1160.76.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.9.1

xarray: 2023.4.2
pandas: 1.5.2
numpy: 1.23.5
scipy: 1.9.3
netCDF4: 1.6.3
pydap: None
h5netcdf: 1.1.0
h5py: 3.8.0
Nio: None
zarr: 2.14.2
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: 1.3.5
dask: 2022.7.0
distributed: 2022.7.0
matplotlib: 3.6.2
cartopy: 0.21.1
seaborn: None
numbagg: None
fsspec: 2022.11.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 65.6.3
pip: 22.3.1
conda: None
pytest: 7.1.2
mypy: None
IPython: 8.8.0
sphinx: 5.0.2

@sehoffmann sehoffmann added bug needs triage Issue that has not been reviewed by xarray team member labels May 24, 2023
@welcome
Copy link

welcome bot commented May 24, 2023

Thanks for opening your first issue here at xarray! Be sure to follow the issue template!
If you have an idea for a solution, we would really welcome a Pull Request with proposed changes.
See the Contributing Guide for more.
It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better.
Thank you!

@dcherian dcherian removed the needs triage Issue that has not been reviewed by xarray team member label May 24, 2023
@benbovy
Copy link
Member

benbovy commented Jun 20, 2023

Hmm not sure that it is something related to the explicit indexes refactor?

v2022.10.0 (after the refactor) raised a slightly more meaningful error message:

>>> xr.Dataset().to_array()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[5], line 1
----> 1 xarray.Dataset().to_array()

File ~/Git/github/benbovy/xarray/xarray/core/dataset.py:6079, in Dataset.to_array(self, dim, name)
   6077 data_vars = [self.variables[k] for k in self.data_vars]
   6078 broadcast_vars = broadcast_variables(*data_vars)
-> 6079 data = duck_array_ops.stack([b.data for b in broadcast_vars], axis=0)
   6081 dims = (dim,) + broadcast_vars[0].dims
   6082 variable = Variable(dims, data, self.attrs, fastpath=True)

File ~/Git/github/benbovy/xarray/xarray/core/duck_array_ops.py:287, in stack(arrays, axis)
    285 def stack(arrays, axis=0):
    286     """stack() with better dtype promotion rules."""
--> 287     return _stack(as_shared_dtype(arrays), axis=axis)

File ~/Git/github/benbovy/xarray/xarray/core/duck_array_ops.py:187, in as_shared_dtype(scalars_or_arrays)
    182     arrays = [asarray(x) for x in scalars_or_arrays]
    183 # Pass arrays directly instead of dtypes to result_type so scalars
    184 # get handled properly.
    185 # Note that result_type() safely gets the dtype from dask arrays without
    186 # evaluating them.
--> 187 out_type = dtypes.result_type(*arrays)
    188 return [x.astype(out_type, copy=False) for x in arrays]

File ~/Git/github/benbovy/xarray/xarray/core/dtypes.py:183, in result_type(*arrays_and_dtypes)
    178     if any(issubclass(t, left) for t in types) and any(
    179         issubclass(t, right) for t in types
    180     ):
    181         return np.dtype(object)
--> 183 return np.result_type(*arrays_and_dtypes)

File <__array_function__ internals>:200, in result_type(*args, **kwargs)

ValueError: at least one array or dtype is required

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants