Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Progress Bar on Zonal_Stats #206

Open
lauzadis opened this issue Jan 8, 2020 · 5 comments
Open

Progress Bar on Zonal_Stats #206

lauzadis opened this issue Jan 8, 2020 · 5 comments

Comments

@lauzadis
Copy link

lauzadis commented Jan 8, 2020

Is there any plan or interest to add a progress bar to zonal_stats? When working with large data-sets, the function can take a long time to complete, sometimes hanging part of the way through.

A progress bar would be a good way to see that the function is still running and has not crashed.

tqdm could be a good candidate for this, it would just wrap around the feature iteration in this line of code.

@perrygeo
Copy link
Owner

perrygeo commented Oct 2, 2020

@mataslauzadis I definitely see the need for better feedback while iterating through input features.

I'm not sure the best way to implement it though - consider that tqdm and similar approaches are designed for interactive use where you want to get regular feedback printed to the screen . This library can be used in a web server or batch processing context where the additional noise from such frequent logging would be a problem. IOW I hesitate to add progress reporting directly into the library because it would disrupt some use cases.

However, there are some solutions you can use in your own Python code by using a tqdm context manager as in https://stackoverflow.com/a/51083782/519385 . Since the zonal_stats function is a generator, it should be possible to implement a progress bar in the application code.

I would consider adding something like that to the rio zonalstats command line interface.

@aazuspan
Copy link

Having a built-in progress bar would be great! Maybe there could be a show_progress=False arg so interactive users could enable a progress bar without adding noise for other users? tqdm takes a disable arg, so both options could be implemented just by changing that, e.g. wrap the iteration in tqdm and set disable=not show_progress to hide by default.

For now, here's a workaround I used in case anyone wants to create their own progress bar.

import rasterstats
from tqdm.auto import tqdm
import geopandas as gpd

# Load geometry
geom = gpd.read_file("geom.gpkg").geometry

# Create a zonal stats generator
gen = rasterstats.gen_zonal_stats(vectors=geom, raster="img.tif")
# Display a progress bar while converting the generator into a list of zonal stats
stats = [n for n in tqdm(gen, total=geom.shape[0])]

Thanks for building and maintaining rasterstats! It's an awesome tool.

@jeronimoluza
Copy link
Contributor

Hi all,
I could contribute on this one! Let me know and I'll get my hands on it.

@perrygeo
Copy link
Owner

perrygeo commented Sep 9, 2024

We'd need to stay backwards compatible - I have many code bases that call zonal stats non-interactively and if I upgraded and forgot to change my call sites, the resulting log spam would be unacceptably costly. I have a responsibility to avoid situations like that, for myself and all users.

So I'd prefer to keep the existing default behavior (no progress bar) and use a zonal_stats(..., progress=true) kwarg to optionally turn it on for interactive use.

Note that only zonal_stats would have a progress kwarg under this approach.

gen_zonal_stats would not since it returns a stateful generator. IOW, the application code in charge of consuming the generator would need to track progress, as in the @aazuspan example above. I don't consider this to be a "workaround", it's a great practice! Making the progress tracking explicit leads to a clean separation and more control over IO and processing, and I'll continue to use this pattern in my application code. I can see how this would be burdensome for interactive use though.

In terms of libraries, if we take a dependency on tqdm, I'd like to make that an optional dependency so it can be installed like pip install 'rasterstats[progress]'

@jeronimoluza did you have a different approach in mind? My main concern is backwards compatibility - as long as we keep existing code working exactly the same, I'm ok with adding a progress bar - if it's opt-in.

@jeronimoluza
Copy link
Contributor

Sounds good, @perrygeo! Putting the progress kwarg at a higher level in the code appears to be the safest choice, given the potential logging issue you're explaining.

I could use @aazuspan's example to modify the zonal_stats function, and do something like:

def zonal_stats(*args, **kwargs):

...

progress = kwargs.get('progress')
if progress:
    return [list_comprehension_with_tqdm]
else:
    return [list_comprehension_no_tqdm]

Regarding the optional dependency, I can add tqdm to the extras_require parameter in the setup.py.

How does it sound?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants