Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extension proposal: extension for UGRID and unstructured mesh utils #4222

Closed
Huite opened this issue Jul 13, 2020 · 5 comments
Closed

Extension proposal: extension for UGRID and unstructured mesh utils #4222

Huite opened this issue Jul 13, 2020 · 5 comments
Labels

Comments

@Huite
Copy link
Contributor

Huite commented Jul 13, 2020

Disclaimer: I'm not a 100% sure this is necessarily the right place to discuss this. I think some of the things I propose could be broadly useful. At any rate, I appreciate feedback!

EDIT: to clarify, I don't think my original wording was entirely clear: I'm not suggesting adding this to xarray, but I'm thinking about a dataset accessor or a wrapping object in a separate package.

Introduction

Xarray is a great tool for structured data, but there isn't anything quite like it for unstructured meshes of convex cells. I'm coming from a hydrological background, and I believe many of my colleagues working with unstructured meshes could greatly benefit from xarray.

In terms of data models and formats, there are many unstructured mesh formats, I think UGRID is the most widely shared (outside of specific model formats) and it supports many aspects (e.g. data on faces, nodes, edges). Although Gridded exists and provides a number of features, I really need something that consumes and produces xarray Datasets (I don't want to go without e.g. dask, or deal with the netcdf4 library myself).

My most immediate need is some utilities for 2D unstructured meshes (y, x), and layered meshes (layer, y, x). This is also what UGRID is mostly used for now, as there don't seem to be many examples of the 3D unstructured mesh definitions in the wild. Sticking to 2D simplies things somewhat, so I'll ignore the third dimension for now -- but I don't see any reason why one couldn't generalize many ideas to 3D.

I currently use xarray a great deal with structured data, in dealing with "regular" netCDFs. In my thinking, it's possible to "mirror" a few methods which by and large behave similarly to xarray, using the UGRID data model for the mesh topology.

A simple UGRID example

To clarify what I'm talking about, this is what UGRID looks like in xarray, there's a Dataset with a dummy variable which describes in which variables the mesh topology is stored, in this case for a triangle and a quad.

ds = xr.Dataset()
ds["mesh2d"] = xr.DataArray(
    data=0,
    attrs={
     "cf_role": "mesh_topology",
     "long_name": "Topology data of 2D mesh",
     "topology_dimension": 2,
     "node_coordinates": "node_x node_y",
     "face_node_connectivity": "face_nodes",
    }
)
ds = ds.assign_coords(
    node_x=xr.DataArray(
        data=np.array([0.0, 10.0, 10.0, 0.0, 20.0, 20.0]),
        dims=["node"],
    )
)
ds = ds.assign_coords(
    node_y=xr.DataArray(
        data=np.array([0.0, 0.0, 10.0, 10.0, 0.0, 10.0]),
        dims=["node"],
    )
)
ds["face_nodes"] = xr.DataArray(
    data=np.array([
        [0, 1, 2, 3],
        [1, 4, 5, -1]
    ]),
    coords={
        "face_x": ("face", np.array([5.0, 15.0])),
        "face_y": ("face", np.array([5.0, 5.0])),
    },
    dims=["face", "nmax_face"],
    attrs={
        "cf_role": "face_node_connectivity",
        "long_name": "Vertex nodes of mesh faces (counterclockwise)",
        "start_index": 0,
        "_FillValue": -1,
    }
)

It's only the topology that is special cased. A time dependent variable looks the same as it would otherwise, except it would have dimensions ("time", "face") rather than ("time", "y", "x").

UGRID netCDFs can in principle be read and written without issue. I would prefer to work with standard xarray Datasets like this if possible, because it means there's no need to touch any IO methods, as well as maintaining the full flexibility of xarray.

Some methods

I'm thinking of a few methods which behave similarly as the xarray methods except that they know about the mesh topology (in x and y) and the dependencies between the different variables. So, you could do:

import xugrid  # name of extension

selection_ds = ds.xugrid.sel(node_x=slice(-30.0, 30.0))

And it would slice based on the x coordinate of the nodes, but update all the other variables to produce a consistent mesh (and renumber the face_nodes appriopriately). xugrid.isel would behave in the same way.

xugrid.plot would e.g. perform triangulation automatically (if the mesh isn't triangular already), and create a matplotlib.tri.Triangulation object, and so forth.

Regridding

I think one of the major needs for dealing with unstructured grids is being able to map data from one mesh to another, go from (GIS) raster data to unstructured, or go from unstructured model output to GIS raster data. For my use case, I often require area weighted methods, which requires computing overlap between source and destination meshes. I haven't really found a suitable package, e.g. xemsf looks very useful, but doesn't have quite the methods I'd want. Also troubling is that I can't really get it installed on Windows.

Existing GIS routines for vector data are far too slow (they have to take too many special cases into account, I think), see some benchmarks here.

I think numba provides an interesting opportunity here, not just for ease of development and distribution, but also because of JIT-compilation, the user can provide their own regridding methods. E.g.

import numba as nb
import numpy as np


def median(values, weights):
    return np.nanpercentile(values, 50)


class Regridder:
    def __init__(self, source, destination):
        # compute and store weights
        # ...
	
    # Use a closure to avoid numba's function overhead:
    # https://numba.pydata.org/numba-doc/latest/user/faq.html#can-i-pass-a-function-as-an-argument-to-a-jitted-function
    def _make_regrid(self, method):
        jitted_func = nb.njit(method)
        def regrid(src_indices, dst_indices, data, weights):
            out = np.full(dst_indices.size, np.nan)
            for src_index, dst_index in zip(src_indices, dst_indices):  # or something
                # accumulate values          
                out[dst_index] = jitted_func(values)
            return out
        return regrid

    def regrid(source_data, method):
        self._regrid = self._make_regrid(method)
        # Cache the method in the object or something to avoid repeat compilation
        # provide default methods (e.g. min or max) as string or enumerators
        return self._regrid(self.src_indices, self.dst_indices, source_data, self.weights)

The best way to find overlapping cells seems via a cell tree, as implemented here, and requires searching for bounding boxes rather than points. I've ported the CellTree2D to numba, and added a bounding box search.

The serial version is a little bit slower than C++ for point searches, but with numba's prange (and enough processors) searches are a few times faster. (In retrospect, I could've just re-used the C++ tree rather than porting it, but numba does have some benefits...)

Interpolation on an unstructured grids can probably be handled by pyinterp, and can be made available via the regridder. (Scipy interpolate takes too much memory in my experience for large meshes.)

API proposal

class Regridder:
    # Basically same API as xemsf
    def __init__(self, source: xr.Dataset, destination: xr.Dataset) -> None:
        pass
		
    def regrid(self, source_data: xr.DataArray, method: Union[str, Callable]) -> xr.Dataset:
        pass

xugrid.triangulate(self) -> xr.Dataset:
xugrid.rasterize(self, resolution: Mapping) -> xr.Dataset:
xugrid.reproject(self, dst_crs, src_crs) -> xr.Dataset: # just offload all coordinates to pyproj?

xugrid.sel(self, indexers: Mapping) -> xr.Dataset:
xugrid.isel(self, indexers: Mapping) -> xr.Dataset:
xugrid.plot(self) -> None:  # calls triangulate if needed
xugrid.plot.imshow(self) -> None:  # maybe calls rasterize first, provides a quicker peek

xugrid.slice_vertical(self, linestring) -> xr.Dataset:  # traces one or more rays through the mesh

xugrid.check_mesh(self) -> None: # check for convexity, all indices present, etc.
xugrid.connectivity_matrix(self) -> scipy.sparse?
xugrid.reverse_cuthill_mckee(self) -> xr.Dataset: #  use scipy.sparse.csgraph
xugrid.binary_erosion(self, iterations: int) -> xr.DataArray:
xugrid.binary_dilation(self, iterations: int) -> xr.DataArray:

xugrid.meshfix(self)  # call pymeshfix
xugrid.to_pyvista(self) -> pyvista.UnstructuredGrid: # create a pyvista (vtk) mesh

Most of these methods require dealing with numpy arrays (or numpy arrays representing sparse arrays), so I think numba is very suitable for the algorithms (if they don't already exist).

For most methods, 3D meshes would be possible too. It requires some changes in e.g. the CellTree, and plotting would first require a slice_horizontal or something to get a plan view image.

Some methods might not be very needed, I'm basing it off of my impression what I currently use xarray for in combination with some other tools that assume structured data.

Resources

(Have to check for non-permissive / copyleft licenses...)

Thoughts?

Does this sound useful? Does it belong somewhere else / does it already exist?

(I'm aware a number e.g. the tree structures are available in vtk as well, but the Python API unfortunately doesn't expose them; having a numba version also makes changes a lot easier.)

Something that popped up:
Some of the algorithms benefit from some persistent data structures (e.g. a cell tree or connectivity matrix), can these be maintained via an xarray-extension?

I'm assuming mesh topologies that fit within memory, that seems challenging enough for now...

@ChrisBarker-NOAA
Copy link

ChrisBarker-NOAA commented Jul 13, 2020

I put a note in the gridded issue, but in short:

I think the way to get this is to use xarray in gridded, rather than to, well, reimplement gridded in xarray :-)

@TomNicholas
Copy link
Contributor

TomNicholas commented Jul 13, 2020

My basic understanding of this is that your data essentially can be represented as an xarray dataset as long as you cart around an extra variable (or attributes) to describe it, and there is a common set of methods you would then like to apply to that data?

If that's the case then this seems like a powerful and clear proposal for a widely-useful package which would leverage xarray, but perhaps would not live within xarray itself? We encourage composition over inheritance (i.e. wrap xarray.Dataset with your new gridded.Dataset), and the accessor methods can be used to provide the .method syntax you want. These methods can then interact directly with the lower level arrays xarray wraps if they need to. Even if you ended up reimplementing a large fraction of xarray's API, as each method wrapper would be thin I expect this would still be the most straightforward way to achieve what you want. Once your package is built we could also happily link to it in our list of projects which use xarray.

@Huite
Copy link
Contributor Author

Huite commented Jul 14, 2020

@ChrisBarker-NOAA: thanks, I'll reply in the gridded issue: NOAA-ORR-ERD/gridded#55

@TomNicholas Thanks for the comment! I've put a clarification in the post, that I don't intend this to live in Xarray. The stuff I suggest is specifically about mesh topology, which I'd agree on as being outside of the scope of xarray. I realize the post wasn't that clear in that regard. I was indeed thinking of the accessor methods, but failed to explicitly mention them. Hopefully should be bit clearer now.

@stale
Copy link

stale bot commented Apr 18, 2022

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

@stale stale bot added the stale label Apr 18, 2022
@dcherian
Copy link
Contributor

I think we can move the conversation to https://github.com/UXARRAY/uxarray now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants