Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zonalstats includes numpy masked data areas in stats calculations #246

Open
sebastianclarke opened this issue Sep 21, 2021 · 0 comments
Open

Comments

@sebastianclarke
Copy link

Describe the bug
Issue occurs when calling the zonal_stats function and passing a numpy array and affine, rather than a geo-referenced raster. If you have previously used numpy.ma to mask a portion of the data, that portion of the data is still included in stats calculation. I was expecting that pixels masked in this manner would be ignored during stats calculation, in the same way that nodata pixels are.

My current workaround is to set these pixels explicitly to the nodata value, which then results in them being discounted accordingly, but this is a little in-elegant, and more to the point I think the expectation is that, if zonal_stats supports numpy data, it should respect any masks already applied. This was certainly my expectation, and I've seen one or two other reports online of people being caught out by this.

To Reproduce

The below example shows roughly how to reproduce. Naturally it's trivial, and you would never do this in anger, you would just pass the raster directly to the zonal_stats function. Assume in any actual example, we are doing something else to the raster data in numpy-land other than simply reading it and masking it. Note that it does require you provide your own test data (a raster and intersecting geometry).

import rasterio
from rasterstats import zonal_stats
from numpy import ma

with rasterio.open("somerasterdataset.tif") as my_raster:
    my_data = my_raster.read(1)    # 8-bit, unsigned int, values 0-255
    nodata = my_raster.nodata
    affine = my_raster.transform

masked_data = ma.masked_greater(my_data, 100)

stats = zonal_stats(
    geometry,
    masked_data,
    affine=affine,
    nodata=nodata,
    stats=['mean', 'count']
    all_touched=True,
 )    # Still includes pixels > 100, so mean is too high

my_data[my_data>100] = nodata

stats = zonal_stats(
    geometry,
    my_data,
    affine=affine,
    nodata=nodata,
    stats=['mean', 'count']
    all_touched=True,
 )    # Works as I expect
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant