Rename `ChunkManager` to `ComputeManager` #9435

TomNicholas · 2024-09-05T16:05:53Z

What is your issue?

In #8733 (comment) and #9286 (comment) it's become fairly clear that the ChunkManager abstraction isn't quite right - it's "too greedy" as @dcherian said. #9286 will fix this by removing .chunks and .rechunk from the ChunkManager's responsibilities, but the result will be a "ChunkManager" that doesn't explicitly handle chunks!

I think a better name to describe the new interface is a "ComputeManager", as it still handles the creation of lazily-computed parallel arrays, distribution of computation over parts of those arrays, and triggering the materialization of the arrays.

JAX is also an interesting potential use case because there you don't have chunks, but you do still have to manage dividing computation up over multiple devices. See #9286 (comment)

Renaming ChunkManagerEntrypoint to ComputeManagerEntrypoint will be a breaking change but:
a) this is a very advanced feature,
b) the docs for it have a fat "experimental" warning on them,
c) I'm only aware of 2 libraries using this outside of xarray itself: cubed (tagging @tomwhite), and @hmaarrfk's chunked data structure. The dask ChunkManager ships with xarray, so there is no breaking change there. (Users may have to pip install again to re-register entrypoints if upgrading a development version of xarray inside existing environments though.)

I'm separating this out from #9286 because that PR shouldn't be a breaking change, and the follow-up that closes this issue will be the minimal possible breaking change (i.e. just renaming ChunkManagerEntrypoint -> ComputeManagerEntrypoint).

The text was updated successfully, but these errors were encountered:

hmaarrfk · 2024-09-05T22:08:08Z

I'm happy to constrain version or add compatibility shims in my code. Let me know

shoyer · 2024-09-06T06:33:12Z

This all sounds fine to me.

On a semi-related note, it would be nice to consider separating the concept of triggering computation from returning a computed result. Dask calls these .persist() and .compute(). In Tensorstore, the former is obtained via .read() and the later via .result().

hmaarrfk · 2024-09-08T02:08:35Z

I would personally appreciate a resolution to #9403 so that we can start to test our codebase with numpy "2" series. I'm just trying to isolate breaking changes in our changes and I feel like #9403 is the last thing holding us up! (apart from tensorflow that is).

TomNicholas added API design topic-chunked-arrays Managing different chunked backends, e.g. dask labels Sep 5, 2024

TomNicholas mentioned this issue Sep 5, 2024

RFC: add materialize to materialize lazy arrays data-apis/array-api#839

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rename `ChunkManager` to `ComputeManager` #9435

Rename `ChunkManager` to `ComputeManager` #9435

TomNicholas commented Sep 5, 2024 •

edited

Loading

hmaarrfk commented Sep 5, 2024

shoyer commented Sep 6, 2024

hmaarrfk commented Sep 8, 2024

Rename ChunkManager to ComputeManager #9435

Rename ChunkManager to ComputeManager #9435

Comments

TomNicholas commented Sep 5, 2024 • edited Loading

What is your issue?

hmaarrfk commented Sep 5, 2024

shoyer commented Sep 6, 2024

hmaarrfk commented Sep 8, 2024

Rename `ChunkManager` to `ComputeManager` #9435

Rename `ChunkManager` to `ComputeManager` #9435

TomNicholas commented Sep 5, 2024 •

edited

Loading