Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: caching intermediate operations. #124

Open
brendancol opened this issue Aug 31, 2016 · 3 comments
Open

Question: caching intermediate operations. #124

brendancol opened this issue Aug 31, 2016 · 3 comments

Comments

@brendancol
Copy link

brendancol commented Aug 31, 2016

Hey hey. Great stuff.

Question: When using python-rasterstats with one polygon and many rasters (or vice versa), do you see a clear spot where intermediate steps can be cached? Examples: the rasterization of the polygon, or the reading of the value raster?

@perrygeo
Copy link
Owner

Hey @brendancol,

I'd say rasterstats is designed for the many polygons, one raster scenario so it's already fairly optimal - caching won't help much under that use case since you can already preload the raster into memory and each polygon needs to be rasterized independently.

Caching could potentially help with the one polygon, many rasters scenario. We could cache the rasterized geometry to avoid re-rasterizing. Since rasterizing is a significant chunk of the work (rough 20%?), that would likely be worth the memory footprint of storing them across raster bands.

My work on multiband support has really stalled out: #73 - there are design barriers internally and numpy behavior that makes it difficult to implement cleanly. But caching rasterized geometries would make a good addition should it ever come to fruition.

@johanvdw
Copy link

johanvdw commented Apr 4, 2022

We run into the same issue: we have 28 bands, which means that rasterization happens 28 times again.
Looking at the code, I wonder if it would not be an option to add an option to use the mini_raster (which we optionally get as an output) as an input to gen_zonal_stats.

Note I'm willing to create a PR, but I'd like to get feedback on the idea before diving into the details.

@perrygeo
Copy link
Owner

@johanvdw @brendancol so if the optional mini_raster was supplied, we would skip the rasterization step? At a high level, that seems like a reasonable approach.

You'd still have to call the gen_zonal_stats once to get the rasterized geoms, then 27 more times - the caller would be in charge of managing the rasterized geometries in memory (or elsewhere). And of course the caller would need to ensure the alignment of all 28 raster bands is exactly equal. I think it could work quite nicely without disrupting the current API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants