Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

causes memory overflow in the loop #285

Open
mouzui opened this issue Mar 2, 2023 · 7 comments
Open

causes memory overflow in the loop #285

mouzui opened this issue Mar 2, 2023 · 7 comments

Comments

@mouzui
Copy link

mouzui commented Mar 2, 2023

This problem has been bothering me for a long time. I started by reading the csv containing the outline of 10 million polygons, then divided it into 1000 parts, created new polygons one by one and then did the zonal_stats.

As you can see in line 68 of the picture, the memory increases by a dozen mb while doing the zonal_stats and is not cleared by the end of the loop (it shows 1694.4 mb when going to the next loop, as it does at the end of this loop).

My computer's memory can't cope with the increase in the number of 10,000 cycles and the process is killed (over 140g, that is). I tried to split the data into 100 or 10,000 copies, but the problem still occurs and the process is still killed.

What is the reason for this?

Snipaste_2023-03-02_19-21-40

@MrChebur
Copy link

Same problem - memory leak.
Currently investigating.

@MrChebur
Copy link

MrChebur commented Jul 24, 2024

I can confirm memory leaks in the for loops and the zonal_stats function.

OS=Windows-10-10.0.19045-SP0
python=3.12.4 (main, Jun 10 2024, 12:48:35) [MSC v.1938 64 bit (AMD64)]
gdal=3.9.1
numpy=1.26.4

I attach code for testing and necessary raster, vector data (see test.zip).

I would be grateful to the author of the library for any comment regarding this problem! @perrygeo

изображение

import geopandas
import psutil  # this is not standard library. Check https://pypi.org/project/psutil/
from rasterstats import zonal_stats


def find_process_by_name(name):
    for pid in psutil.pids():
        process_ = psutil.Process(pid)
        if process_.name() == name:
            return process_


vector = r'.\shp\polygon.shp'
raster_path = r'.\raster\MOD10A1F.A2000058.h22v02.061.2020037194056.hdf'
process_name = 'python.exe'

process = find_process_by_name(process_name)
if process is None:
    raise f'{process_name} not found!'

columns = ['iteration'] + list(process.memory_info()._fields)
print()
print('\t'.join(columns))

geo_data_frame = geopandas.read_file(vector)
geo_data_frame_geom = geo_data_frame['geometry']

for iteration in range(1, 1001):

    # Prints memory information for each 100th iteration
    if iteration == 1 or iteration % 100 == 0:
        mem_info = process.memory_info()  # https://psutil.readthedocs.io/en/latest/#psutil.Process.memory_info
        mem_info_as_string = [str(value) for value in mem_info]
        mem_values = [str(iteration)] + mem_info_as_string
        print('\t'.join(mem_values))

    zonal_stats(vectors=geo_data_frame_geom,
                raster=fr"""HDF4_EOS:EOS_GRID:"{raster_path}":MOD_Grid_Snow_500m:CGF_NDSI_Snow_Cover""",
                categorical=True,
                all_touched=True,
                )

@mouzui
Copy link
Author

mouzui commented Jul 24, 2024

I can confirm memory leaks in the for loops and the zonal_stats function.

OS=Windows-10-10.0.19045-SP0 python=3.12.4 (main, Jun 10 2024, 12:48:35) [MSC v.1938 64 bit (AMD64)] gdal=3.9.1 numpy=1.26.4

I attach code for testing and necessary raster, vector data (see test.zip).

I would be grateful to the author of the library for any comment regarding this problem! @perrygeo

изображение

import geopandas
import psutil  # this is not standard library. Check https://pypi.org/project/psutil/
from rasterstats import zonal_stats


def find_process_by_name(name):
    for pid in psutil.pids():
        process_ = psutil.Process(pid)
        if process_.name() == name:
            return process_


vector = r'.\shp\polygon.shp'
raster_path = r'.\raster\MOD10A1F.A2000058.h22v02.061.2020037194056.hdf'
process_name = 'python.exe'

process = find_process_by_name(process_name)
if process is None:
    raise f'{process_name} not found!'

columns = ['iteration'] + list(process.memory_info()._fields)
print()
print('\t'.join(columns))

geo_data_frame = geopandas.read_file(vector)
geo_data_frame_geom = geo_data_frame['geometry']

for iteration in range(1, 1001):

    # Prints memory information for each 100th iteration
    if iteration == 1 or iteration % 100 == 0:
        mem_info = process.memory_info()  # https://psutil.readthedocs.io/en/latest/#psutil.Process.memory_info
        mem_info_as_string = [str(value) for value in mem_info]
        mem_values = [str(iteration)] + mem_info_as_string
        print('\t'.join(mem_values))

    zonal_stats(vectors=geo_data_frame_geom,
                raster=fr"""HDF4_EOS:EOS_GRID:"{raster_path}":MOD_Grid_Snow_500m:CGF_NDSI_Snow_Cover""",
                categorical=True,
                all_touched=True,
                )

Same problem - memory leak. Currently investigating.

The author hasn't updated this package for a long time, which is very unfortunate. I am currently using QGIS as a replacement for this package.

@MrChebur
Copy link

@mouzui

I am currently using QGIS as a replacement for this package.

And I on the contrary - used this package to replace QGIS, thinking that package is faster and easier to use. =)

@mouzui
Copy link
Author

mouzui commented Jul 24, 2024

@mouzui

I am currently using QGIS as a replacement for this package.

And I on the contrary - used this package to replace QGIS, thinking that package is faster and easier to use. =)

My results show that the calculation speed of zoning statistics in pyQGIS is more than 30 times faster than rasterstats. After all the effort I put into installing pyQGIS, this is the most gratifying and surprising aspect. Of course, the QGIS application itself is not as fast, but its speed is still impressive. The best part is that QGIS does not have memory leaks, which allows me to confidently do other things while running zoning statistics.

@MrChebur
Copy link

MrChebur commented Jul 24, 2024

The best part is that QGIS does not have memory leaks

Unfortunately not, at least in QGIS Appication (Windows 10): qgis/QGIS#37861

@mouzui
Copy link
Author

mouzui commented Jul 24, 2024

The best part is that QGIS does not have memory leaks

Unfortunately not, at least in QGIS Appication (Windows 10): qgis/QGIS#37861

Alright. My process involves using ten million polygons for zonal statistics on a raster file. It's possible that too many raster files could cause a memory leak in QGIS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants