Skip to content

Add support for chunked arrays and dask operations #57

@tomsail

Description

@tomsail

The current indexing induces problems when doing dask operations on chunked arrays:

Although the chunking does not raise any Error:

import xarray as xr

file = "tests/data/r3d_bump.slf"
ds = xr.open_dataset(file)
print(ds)
ds = ds.chunk({"time": -1, "node": 40})

def analyze_block(ds_block: xr.Dataset) -> xr.Dataset:
    """Operate on a single Dask chunk."""
    result = ds_block.mean(dim="node")
    return result

result = xr.map_blocks(analyze_block, ds)
result.compute()

The actual call to the underlying data:

print(result.Z.values)

raises:

Traceback (most recent call last):
  File "./test_dask.py", line 19, in <module>
    result.compute()
  File "./.venv/lib/python3.12/site-packages/xarray/core/dataset.py", line 791, in compute
    return new.load(**kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "./.venv/lib/python3.12/site-packages/xarray/core/dataset.py", line 557, in load
    evaluated_data: tuple[np.ndarray[Any, Any], ...] = chunkmanager.compute(
                                                       ^^^^^^^^^^^^^^^^^^^^^
  File "./.venv/lib/python3.12/site-packages/xarray/namedarray/daskmanager.py", line 85, in compute
    return compute(*data, **kwargs)  # type: ignore[no-untyped-call, no-any-return]
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "./.venv/lib/python3.12/site-packages/dask/base.py", line 681, in compute
    results = schedule(expr, keys, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./.venv/lib/python3.12/site-packages/xarray/core/indexing.py", line 659, in __array__
    return np.asarray(self.get_duck_array(), dtype=dtype, copy=copy)
                      ^^^^^^^^^^^^^^^^^^^^^
  File "./.venv/lib/python3.12/site-packages/xarray/core/indexing.py", line 664, in get_duck_array
    return self.array.get_duck_array()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./.venv/lib/python3.12/site-packages/xarray/core/indexing.py", line 943, in get_duck_array
    duck_array = self.array.get_duck_array()
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./.venv/lib/python3.12/site-packages/xarray/core/indexing.py", line 897, in get_duck_array
    return self.array.get_duck_array()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./.venv/lib/python3.12/site-packages/xarray/core/indexing.py", line 737, in get_duck_array
    array = self.array[self.key]
            ~~~~~~~~~~^^^^^^^^^^
  File "./xarray_selafin/xarray_backend.py", line 187, in __getitem__
    return indexing.explicit_indexing_adapter(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./.venv/lib/python3.12/site-packages/xarray/core/indexing.py", line 1129, in explicit_indexing_adapter
    result = raw_indexing_method(raw_key.tuple)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./xarray_selafin/xarray_backend.py", line 246, in _raw_indexing_method
    temp = np.reshape(temp, (self.shape[1], self.shape[2]))  # (nplan, nnode)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./.venv/lib/python3.12/site-packages/numpy/_core/fromnumeric.py", line 324, in reshape
    return _wrapfunc(a, 'reshape', shape, order=order)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./.venv/lib/python3.12/site-packages/numpy/_core/fromnumeric.py", line 57, in _wrapfunc
    return bound(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^^
ValueError: cannot reshape array of size 3 into shape (5,1452)

This is because Dask might pass keys like (0,), (slice(0,1), slice(0,50)), or even (slice(None), slice(0,50)).
In those cases, the _raw_indexing_method logic doesn’t match

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions