Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

discuss these cell logic functions #1

Open
5 tasks
mdsumner opened this issue Aug 31, 2022 · 11 comments
Open
5 tasks

discuss these cell logic functions #1

mdsumner opened this issue Aug 31, 2022 · 11 comments

Comments

@mdsumner
Copy link

mdsumner commented Aug 31, 2022

I have R versions of the cell logic from raster,terra in vaster and similar for tiling in grout, both unpolished:

https://github.com/hypertidy/vaster

https://github.com/hypertidy/grout

I think these make sense on their own, grid logic independent of any file or data handling and I'd like to see (or build myself) python and other lang versions. I'm also interested to contribute some of this to GDAL itself, there's at least a few cases I would use it for features in the lib-apps I want, but that needs a broader review atm.

  • what is the minimal or sensible set of functions
  • funs are of dim, extent or a combination, should also have objects that provide methods (or closures that record a dim+extent)
  • should funs be vectorized, i.e.multiple sets of dim+extent
  • what index conversions needed for tiles-as-children rasters, netcdf vs gdal indexes etc
  • grid alignment, compare gdal projwin to raster snap in,out,near - and ability for gdalwarp to act as RasterIO with a snap option

These are just brain dump ideas atm for things I've been doing in R and want more broadly in gdal and elsewhere 🙏

geoarrow/geoarrow#24 (comment)

@paleolimbot
Copy link
Owner

Keep dumping here! I had a bunch scraped out here, too: https://github.com/paleolimbot/grd/blob/master/R/cell.R

@mdsumner
Copy link
Author

mdsumner commented Sep 1, 2022

I didn't forget about those ... not entirely anyway! But, it's been very illuminating to strip down to just the bare essentials, and then see that some functions are only a function of dimension, some are of extent and dimension, and some compare extents (basically the snap stuff).

I see you have pretty serious snap options in grd ... I'm resistant to having an object that is also for data and vis for this functionality (a 0-dim array is a nice trick but makes me uncomfortable) - which is why I didn't just run with grd ... but, I'm also drawn to having an OOP solution - I guess there are at root functions for grid logic, and then there's a heirarchy of tools -objects that do variously

  • knows its raster-ness (extent + dimension)
  • knows only its alignment (the origin + resolution) - on reflection I guess that's what a (shear == 0) geotransform is ... hmm
  • knows only its dimension (a bare image, with a default extent - variously [0,1] or [0,dim] depending on context)
  • knows the above stuff and is ready to bare-metal read/vis/stream from sources that have these properties

I'm fleshing out my interactions with this logic as I slowly become independent from raster - recently I wrote raster::trim() from scratch, just to see what the logic is like - and like many vis and extraction and reprojection tasks for a given map, there's very often a back-and-forth, get enough data to find the "nearblack" margin, then apply that to a warp-streamed subset read. (That's a data-dependent task though, and perhaps better done by gdal with nearblack anyway - some of these things I've been thinking of a GDAL-api hooks that don't exist yet and I could write).

I'm interested very much in getting this family of grid logic that's entirely independent of data - things like polygon extractions from netcdf time series, what you really want is the 2D cell index of those polygons, then batch those into netcdf chunks - and the key idea here is that the indexing logic and query plan is entirely independent of the actual data source. I'm low level fleshing this out with a colleague in the climate model space, and he has very large workflows of interest, it's not just me and my tools ;)

@mdsumner
Copy link
Author

mdsumner commented Sep 1, 2022

and like, GDAL is crasy fast to rasterize polygons, as is {fasterize} - but I don't want a polygon-value burned tif as output, I want a table of cell index and polygon ID that I use for this plan-query batching - and for that I need index-converters from global cell (extent+dimension) to chunk cell (tiled arithmetic converts a global cell to a chunk-in-memory index).

more thoughts than code atm, but I have a lot of these pieces around :)

@mdsumner
Copy link
Author

mdsumner commented Sep 4, 2022

at some point I'll fold in the logic for netcdf from tidync, and flesh out the translators I've been talking about, and then explore what's needed for a proper api vs just R funs

@paleolimbot
Copy link
Owner

I made a place for "cell logic" for you to get started! PR into https://github.com/paleolimbot/geoarrow-cpp/blob/main/src/geoarrow/index_math.hpp (and make sure to add tests into https://github.com/paleolimbot/geoarrow-cpp/blob/main/src/geoarrow/index_math_test.cc !). If you're interested, I'm happy to set up a meeting to set up your VSCode to get started 😄

@mdsumner
Copy link
Author

mdsumner commented Sep 5, 2022

👌

@mdsumner
Copy link
Author

mdsumner commented Sep 7, 2022

I definitely need the hand-holding! I think it would be valuable :)

@paleolimbot
Copy link
Owner

Let's do it! It's tough for me to meet outside 8am - 4pm America/Halifax because of the kids or we can work through it via Twitter message. The gist of it is: open up geoarrow-cpp in VSCode, install the CMake extension, then open the "command palette" (Control-Shift-P) and choose CMake: configure, then CMake: build, then Cmake: run tests.

@mdsumner
Copy link
Author

related pydata/xarray#5081

@mdsumner
Copy link
Author

just reading Danielle's blog with a couple of rasterization steps, we could use a sparse cell approach - not profound or anything but a clear example for some crossover discussion: https://blog.djnavarro.net/posts/2022-08-23_visualising-a-billion-rows/

@mdsumner
Copy link
Author

all the more reason for me to get these funs in here, I keep realising implications, and variations on the index conversions 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants