Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better support for datatree / kerchunk #594

Open
dcherian opened this issue Apr 11, 2023 · 4 comments
Open

Better support for datatree / kerchunk #594

dcherian opened this issue Apr 11, 2023 · 4 comments
Labels
enhancement Issues that are found to be a reasonable candidate feature additions

Comments

@dcherian
Copy link
Collaborator

I've been using kerchunk to generate aggregated datasets that have a Zarr group for each "stream" (this could be data on different grids and at different frequencies, e.g. full depth grid monthly means, and daily mean surface data).

I've been sticking them as reference files which works well.

I'd like to stick a single entry per simulation in a intake-esm catalog and read with datatree.open_datatree

I think I have two requests:

  1. turn off aggregation, which seems to be a common request. I'd rather do the aggregation "at write-time" by creating an appropriate JSON file that takes care of various idiosyncrasis (e.g. merging in "static variables") instead of pushing it to the user at read-time.
  2. a entry in the catalog that switches between using xr.open_dataset and datatree.open_datatree. Eventually, there will be a xr.open_datatree but the underlying concept of two different functions to open a group vs a full tree will still be around.
@dcherian
Copy link
Collaborator Author

Here's a catalog where there is an entry for each "stream": h,sfc, wci; and a aggregated dataset with stream="combined".

I'd like to pick some simulations and load the combined stream as a datatree

@mgrover1 mgrover1 added the enhancement Issues that are found to be a reasonable candidate feature additions label Apr 11, 2023
@dcherian
Copy link
Collaborator Author

dcherian commented May 4, 2023

Do you have any thoughts on how to do this?

@andersy005
Copy link
Member

One step closer with

This should enable the following

turn off aggregation, which seems to be a common request. I'd rather do the aggregation "at write-time" by creating an appropriate JSON file that takes care of various idiosyncrasis (e.g. merging in "static variables") instead of pushing it to the user at read-time.

@andersy005
Copy link
Member

regarding

a entry in the catalog that switches between using xr.open_dataset and datatree.open_datatree. Eventually, there will be a xr.open_datatree but the underlying concept of two different functions to open a group vs a full tree will still be around.

i haven't had a chance to look into possible options. i intend to get back to you next week with some ideas :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Issues that are found to be a reasonable candidate feature additions
Projects
None yet
Development

No branches or pull requests

3 participants