out-of-memory warnings/solutions #237

CeresBarros · 2024-04-23T19:47:20Z

If the user asks for many GCMs/runs/periods/scenarios/years across many (many) locations, they can easily run out of memory to extract cliamte values and the large data.table of climate values to be downscaled.

I've seen this happen when asking for all GCMs/scenarios/periods x 3 runs for 2 700 000 point coordinates, using a 32Gb machine. The error is of the type std::bac_alloc which can easily be made more intuitive for the user with some messaging.

We could also foresee having climr actually deal with this problem by, e.g.:

extracting/downscaling by subsets of points or combinations of climate model parameters
saving each subset to a csv file with write.csv(..., append = TRUE)

See https://stackoverflow.com/questions/78170318/error-stdbad-alloc-using-terraextract-on-large-stack-and-many-points -- extracting only the unique raster locations is not a good solution because 1) results are different and 2) the user will still run out of memory when expanding back to the full set of points.

The text was updated successfully, but these errors were encountered:

CeresBarros · 2024-04-23T19:48:33Z

see bcgov/BGCprojections#5

kdaust · 2024-04-23T20:45:45Z

Look at you, getting SO replies from Hijmans himself ;)

Once I have more time this summer I'd be more than happy to take on dealing with this issue directly.

CeresBarros · 2024-04-23T23:11:26Z

I know, I feel so special :P

achubaty · 2024-05-24T03:21:06Z

I haven't yet had a chance to dive too far into using this package (thank you btw!) or its inner workings, but it occured to me that to help deal with memory limitations on the data.table side of things, using on-disk formats such as disk.frame or arrow may also be useful. Both are quite fast, but do trade low memory use for higher disk storage requirements. These may be a great fit for data that need to persist e.g., across R sessions, but perhaps the overhead of creating these is too much for more 'transient' use.

disk.frame supports data.table syntax, but is soft deprecated in favour of arrow;
arrow is built for dplyr syntax, and has become very popular.

CeresBarros · 2024-05-24T17:22:03Z

Thanks for the suggestion @achubaty.
I think that could help with RAM limitation, yes. For the "persisting between sessions" or even calls to downscale we might need to embed some caching mechanism that detects that the downscaled on-disk table has already been produced.

@kdaust, thoughts?

kdaust · 2024-05-25T01:45:00Z

Thanks @achubaty ! I haven't used arrow before, but took a quick look and it seems promising. However, as far as I recall, the memory issues were when we were extracting point from the terra raster. @CeresBarros is that correct? If so, I'm not sure using arrow would fix the problem? I know the dcast operation at the final stage of downscale_core takes a lot of RAM, so arrow could definitely help with that.

CeresBarros · 2024-05-27T17:04:43Z

The only time i've tried arrow I didn't manage to get it to work, not sure why. But I've been successful with dataLaF from the LaF package, at least to read big data in chunks.

That is correct, but the issue, as far as I remember, is that the table of point data created is too large (because there were >2000 rasters). So, it may help to:

extract the point data by e.g. layer and dump it into disk
do the downscaling on the disk-based table (I think this requires processing it in chunks, but would have to refresh my memory of arrow)
educate the user (through doc and messaging) about the output/downscaled table being larger than memroy and existing on disk only (so not being output by downscale). We could even provide examples of how to deal it later

This could be a great enhancement that takes the onus of dealing with sequential processing of climate scenarios/models/runs/etc by the user when they don't understand why climr is failing. The user will of course still have to figure out how they want to deal with the big downscaled table, but we can say "that's"out of our hands, we've already done the downscaling bit"

CeresBarros assigned CeresBarros and kdaust Apr 23, 2024

CeresBarros added the v1.0.0 complete before release of version 1.0.0 label Apr 23, 2024

CeresBarros mentioned this issue Apr 23, 2024

237 out-of-memory warnings/solutions #238

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

out-of-memory warnings/solutions #237

out-of-memory warnings/solutions #237

CeresBarros commented Apr 23, 2024

CeresBarros commented Apr 23, 2024

kdaust commented Apr 23, 2024

CeresBarros commented Apr 23, 2024

achubaty commented May 24, 2024 •

edited

Loading

CeresBarros commented May 24, 2024 •

edited

Loading

kdaust commented May 25, 2024

CeresBarros commented May 27, 2024 •

edited

Loading

out-of-memory warnings/solutions #237

out-of-memory warnings/solutions #237

Comments

CeresBarros commented Apr 23, 2024

CeresBarros commented Apr 23, 2024

kdaust commented Apr 23, 2024

CeresBarros commented Apr 23, 2024

achubaty commented May 24, 2024 • edited Loading

CeresBarros commented May 24, 2024 • edited Loading

kdaust commented May 25, 2024

CeresBarros commented May 27, 2024 • edited Loading

achubaty commented May 24, 2024 •

edited

Loading

CeresBarros commented May 24, 2024 •

edited

Loading

CeresBarros commented May 27, 2024 •

edited

Loading