-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
causes memory overflow in the loop #285
Comments
Same problem - memory leak. |
I can confirm memory leaks in the OS=Windows-10-10.0.19045-SP0 I attach code for testing and necessary raster, vector data (see test.zip). I would be grateful to the author of the library for any comment regarding this problem! @perrygeo import geopandas
import psutil # this is not standard library. Check https://pypi.org/project/psutil/
from rasterstats import zonal_stats
def find_process_by_name(name):
for pid in psutil.pids():
process_ = psutil.Process(pid)
if process_.name() == name:
return process_
vector = r'.\shp\polygon.shp'
raster_path = r'.\raster\MOD10A1F.A2000058.h22v02.061.2020037194056.hdf'
process_name = 'python.exe'
process = find_process_by_name(process_name)
if process is None:
raise f'{process_name} not found!'
columns = ['iteration'] + list(process.memory_info()._fields)
print()
print('\t'.join(columns))
geo_data_frame = geopandas.read_file(vector)
geo_data_frame_geom = geo_data_frame['geometry']
for iteration in range(1, 1001):
# Prints memory information for each 100th iteration
if iteration == 1 or iteration % 100 == 0:
mem_info = process.memory_info() # https://psutil.readthedocs.io/en/latest/#psutil.Process.memory_info
mem_info_as_string = [str(value) for value in mem_info]
mem_values = [str(iteration)] + mem_info_as_string
print('\t'.join(mem_values))
zonal_stats(vectors=geo_data_frame_geom,
raster=fr"""HDF4_EOS:EOS_GRID:"{raster_path}":MOD_Grid_Snow_500m:CGF_NDSI_Snow_Cover""",
categorical=True,
all_touched=True,
) |
The author hasn't updated this package for a long time, which is very unfortunate. I am currently using QGIS as a replacement for this package. |
And I on the contrary - used this package to replace QGIS, thinking that package is faster and easier to use. =) |
My results show that the calculation speed of zoning statistics in pyQGIS is more than 30 times faster than rasterstats. After all the effort I put into installing pyQGIS, this is the most gratifying and surprising aspect. Of course, the QGIS application itself is not as fast, but its speed is still impressive. The best part is that QGIS does not have memory leaks, which allows me to confidently do other things while running zoning statistics. |
Unfortunately not, at least in QGIS Appication (Windows 10): qgis/QGIS#37861 |
Alright. My process involves using ten million polygons for zonal statistics on a raster file. It's possible that too many raster files could cause a memory leak in QGIS. |
This problem has been bothering me for a long time. I started by reading the csv containing the outline of 10 million polygons, then divided it into 1000 parts, created new polygons one by one and then did the zonal_stats.
As you can see in line 68 of the picture, the memory increases by a dozen mb while doing the zonal_stats and is not cleared by the end of the loop (it shows 1694.4 mb when going to the next loop, as it does at the end of this loop).
My computer's memory can't cope with the increase in the number of 10,000 cycles and the process is killed (over 140g, that is). I tried to split the data into 100 or 10,000 copies, but the problem still occurs and the process is still killed.
What is the reason for this?
The text was updated successfully, but these errors were encountered: