Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xarray-dggs package? #3

Open
benbovy opened this issue Nov 3, 2023 · 9 comments
Open

xarray-dggs package? #3

benbovy opened this issue Nov 3, 2023 · 9 comments

Comments

@benbovy
Copy link

benbovy commented Nov 3, 2023

Cross-posting here what I've suggested in the pangeo discource thread.

Xarray-DGGS

I think that a good and reasonable goal for the sprint would be to come up with an xarray-dggs package that would provide an xarray-compatible interface to various DGGS features exposed in 3rd-party Python libraries (e.g., healpy, pys2index, spherely, h3-py, dggrid4py, etc.) through a very basic set of features:

  1. A few Xarray custom indexes that could be built from lat/lon data (or directly from DGGS cell indices) and that would enable data selection using .sel()
  2. Xarray Dataset and/or DataArray accessors for DGGS-specific API (set new DGGS Xarray indexes from lat/lon coordinates, get DGGS cell indices as a new coordinate, etc.)

I think that DGGS grids have enough in common to expose the functionality for all of them in a common xarray-dggs package, maybe with optional dependencies for each backend (healpy, pys2index, h3-python, DGGRID, etc.).

This proposal builds on top of a few suggestions found in the README of this repository, e.g., H3 or rHEALPIx + Xoak + Xarray, H3 or rHEALPIx + Xoak + Xarray + Xvec?. While both xoak and xvec can be good sources of inspiration for xarray-dggs, those packages have slightly different scopes: Xoak provides generic tree-based indexes (not only geospatial) and Xvec currently works only with shapely (planar geometries). Xoak has a nice API for nearest-neighbors point-wise indexing that leverages Xarray advanced indexing (i.e., using xarray.DataArray objects) but it still has to be refactored so it builds on top of Xarray custom indexes. Xvec is one of the few (the only?) released Xarray extensions that provide an Xarray custom index.

The sub-topics and (open) questions listed below are not exhaustive. Please feel free to suggest in the comments below any important topic or question that is missing.

Data model

An Xarray index must relate to one or more coordinates with arbitrary dimensions. In the case of DGGS, what should be the coordinates and their dimension(s)?

  • latitude (cell centers)
  • longitude (cell centers)
  • cell ids

Do we need to have a fixed data model for all DGGS? It can be flexible, i.e., an Xarray Index subclass may support different data models (build options, flexible inputs).

Should we restrict the index and/or coordinates to a fixed level / zoom / resolution of the discrete global grid?

I guess we need some sort of CRS and/or additional metadata for certain kinds of grids (custom parameters)? Some grid parameters could perhaps be hidden as internal attributes of the index?

Data selection API (.sel)

There are a lot of possibilities regarding how to select data on a discrete global grid. What kind of indexer object(s) could we pass to xarray .sel()?

  • grid cell ids (exact indexing) or parent cell ids
  • latitude and longitude values (nearest-neighbor point-wise indexing)
  • a bbox or polygon (select all points within)

How to detect the kind of indexer? We could look at the type of the indexers (scalar, slice, list, array, custom object), the value type, etc. Note: currently it is not possible to pass custom options to .sel pydata/xarray#7099.

Assessing the capabilities of the DGGS Python libraries

There are some important requirements for reusing those libraries efficiently with Xarray:

  • support lat/lon vs. cell id conversion? (this seems obvious)
  • vectorized bindings? (numpy arrays)
  • provides structures like B-Tree, R-Tree for fast lookup?

Perhaps not all libraries mentioned above have those requirements. Which ones should we focus our efforts? Which kinds of data selection listed above should we focus on considering a common set of core features available in all libraries?

@tinaok
Copy link
Collaborator

tinaok commented Nov 3, 2023

Hello, @keewis is trying to put our efforts we made for our IAOCEA project related with healpix integration here. https://github.com/IAOCEA/xarray-healpy

We will try to update some example notebook with real data projection before Monday.

Note that we are not implementing rhealpix but healpix itself through healpy package.

Our final objective is that using property of Xarray-DGGS, we can

  • compute convolution ignoring 'land' for oceanography in optimal manner
  • compute convolution on mixture of different resolutions of Hierarchical Equal Area Grid
  • Make use of solutions such as Travel Time Analysis (like, H3 Travel Times - https://observablehq.com/@nrabinowitz/h3-travel-times), taking into account land masks and oceanic physical properties, with the goal of improving the tracking of fish habitats.

@benbovy
Copy link
Author

benbovy commented Nov 3, 2023

That looks great @tinaok and @keewis!

Your objectives look already quite specific and "high-level". I wonder if during the sprint it would be best to first discuss about

  1. everyone's use-cases / user stories with DGGS
  2. see how we can break them down into smaller, generic tasks
  3. look at each grid (implementation available in Python) if those tasks are supported
  4. see if/how those tasks may be easily implemented using the Xarray API (.sel, etc. possibly with a dggs Xarray index) or if they would require custom API in an Xarray accessor.

before getting our hands dirty into the code.

(3-4 are more specific to Python/Xarray but 1-2 may be interesting for anyone)

This might better structure the sprint and this would greatly help in having a better idea on whether an xarray-dggs extension (or any other package) makes sense for supporting common tasks across different global grids (healpix, s2, h3, etc.). At least for me as I don't have much experience in using DGGS for practical applications :)

@rabernat
Copy link
Member

rabernat commented Nov 3, 2023

I'm excited to participate in a sprint on this topic!

@tinaok
Copy link
Collaborator

tinaok commented Nov 4, 2023

@benbovy
I am happy to share our use case through the example we just added.

I can show how we convert data, and challenges we have today.
With the same notebook, I can show a same model data with 2 different resolution. Which we hope to somehow 'connect' them using DGGS convention.

I'm also very much interested learning by DGGS specialist @allixender (?) how DGGS is used for routing.

If anyone from EERIE project or nextGEMS Cycle 3 ICON projects are around at BIDS23, (https://github.com/eerie-project/EERIE_hackathon_2023/ ?https://github.com/nextGEMS/nextGEMS_Cycle3 ? @koldunovn ?https://easy.gems.dkrz.de/Processing/healpix/healpix_starter.html ) I would love to hear their user stories with healpix, and also how they will make their data available (DestinE?).

@koldunovn
Copy link

Wow, nice ideas! I haven't heard that anyone I know from EERIE, DestinE or nextGEMS plan to participate in this code sprint. Also we will have our EERIE Hackathon this week.

From nextGEMS and EERIE the notebooks with examples of how we use unstructured data are available, and ICON data for the last nextGEMS cycle are all in HEALPix. Access to data is currently restrictive if you don't have DKRZ account, but if there is interest we can provide subset. In EERIE we are trying also to expose data through xpublish , but it's in early stage.

Those kind of projects would be great to see on nextGEMS Hackathon, that will be held 4-8 March somewhere around Hamburg. Let me know know if there is interest and I will get you in contact with nextGEMS people :)

@benbovy
Copy link
Author

benbovy commented Nov 4, 2023

Great to hear from you @koldunovn! The notebook examples will be helpful. I've created an account on DKRZ so I'm now able to ask for joining a project there if needed.

@koldunovn
Copy link

Great!
If you interested in HEALPix I would start form this one, and explore the rest of the collection: https://easy.gems.dkrz.de/Processing/healpix/healpix_starter.html

Unstructured (FESOM2) and semi-structured data covered here: https://github.com/nextGEMS/nextGEMS_Cycle3

We are currently developing also EERIE notebooks, but there is a lot of examples using nextGEMS data as well: https://github.com/eerie-project/EERIE_hackathon_2023/

If you looking for something more concrete, let me know.

@benbovy
Copy link
Author

benbovy commented Nov 6, 2023

We started a shared document on HackMD for the sprint: https://hackmd.io/UBM5L6YNRlG73e3eVo6vOg

@benbovy
Copy link
Author

benbovy commented Nov 8, 2023

Xarray DGGS extension library in development here: https://github.com/benbovy/xdggs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants