Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename ChunkManager to ComputeManager #9435

Open
TomNicholas opened this issue Sep 5, 2024 · 3 comments
Open

Rename ChunkManager to ComputeManager #9435

TomNicholas opened this issue Sep 5, 2024 · 3 comments
Labels
API design topic-chunked-arrays Managing different chunked backends, e.g. dask

Comments

@TomNicholas
Copy link
Member

TomNicholas commented Sep 5, 2024

What is your issue?

In #8733 (comment) and #9286 (comment) it's become fairly clear that the ChunkManager abstraction isn't quite right - it's "too greedy" as @dcherian said. #9286 will fix this by removing .chunks and .rechunk from the ChunkManager's responsibilities, but the result will be a "ChunkManager" that doesn't explicitly handle chunks!

I think a better name to describe the new interface is a "ComputeManager", as it still handles the creation of lazily-computed parallel arrays, distribution of computation over parts of those arrays, and triggering the materialization of the arrays.

JAX is also an interesting potential use case because there you don't have chunks, but you do still have to manage dividing computation up over multiple devices. See #9286 (comment)

Renaming ChunkManagerEntrypoint to ComputeManagerEntrypoint will be a breaking change but:
a) this is a very advanced feature,
b) the docs for it have a fat "experimental" warning on them,
c) I'm only aware of 2 libraries using this outside of xarray itself: cubed (tagging @tomwhite), and @hmaarrfk's chunked data structure. The dask ChunkManager ships with xarray, so there is no breaking change there. (Users may have to pip install again to re-register entrypoints if upgrading a development version of xarray inside existing environments though.)

I'm separating this out from #9286 because that PR shouldn't be a breaking change, and the follow-up that closes this issue will be the minimal possible breaking change (i.e. just renaming ChunkManagerEntrypoint -> ComputeManagerEntrypoint).

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Sep 5, 2024

I'm happy to constrain version or add compatibility shims in my code. Let me know

@shoyer
Copy link
Member

shoyer commented Sep 6, 2024

This all sounds fine to me.

On a semi-related note, it would be nice to consider separating the concept of triggering computation from returning a computed result. Dask calls these .persist() and .compute(). In Tensorstore, the former is obtained via .read() and the later via .result().

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Sep 8, 2024

I would personally appreciate a resolution to #9403 so that we can start to test our codebase with numpy "2" series. I'm just trying to isolate breaking changes in our changes and I feel like #9403 is the last thing holding us up! (apart from tensorflow that is).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API design topic-chunked-arrays Managing different chunked backends, e.g. dask
Projects
None yet
Development

No branches or pull requests

3 participants