RAM is increasing over time #484

maytusp · 2021-04-28T08:57:42Z

I have implemented vizdoom tasks following "learning_pytorch.py" in an example folder. I found that memory usage was increasing as the training epoch. The task "health gathering" in the late epoch sometimes utilized 100% of memory and then it was a breakdown. Sometimes, the memory stopped increasing at around 80-90% and it can run until the last epoch. How can I fix this issue?

Miffyli · 2021-04-28T09:20:43Z

Learning examples use Q-learning with replay buffer, which use a lot of memory, especially with images. You can try setting the replay_memory_size parameter to something lower, but this can make the training less stable.

maytusp · 2021-04-28T09:44:00Z

Learning examples use Q-learning with replay buffer, which use a lot of memory, especially with images. You can try setting the replay_memory_size parameter to something lower, but this can make the training less stable.

In that case, should memory be full since the early epoch?

What I found is that memory in early epochs remains the same, but the memory in later epochs radically increases.

As in this figure: https://drive.google.com/file/d/1LXvwdr6g_5kLLlOW6tDWS2NQuX2vhjxQ/view?usp=sharing

Miffyli · 2021-04-28T09:46:50Z

In that case, should memory be full since the early epoch?

Not necessarily. Depending on how the buffer is created, it might slowly fill up as training goes. This is how the example does with just appending stuff into a Python list

What I found is that memory in early epochs remains the same, but the memory in later epochs radically increases.

Ah, that is not normal and I see the problem! Can you provide the exact code you run and system information (Python version, library versions, ViZDoom versions etc)? Is it the Python script that eats the memory or ZDoom process?

maytusp · 2021-04-28T10:04:10Z

Ah, that is not normal and I see the problem! Can you provide the exact code you run and system information (Python version, library versions, ViZDoom versions etc)? Is it the Python script that eats the memory or ZDoom process?

It's mostly the same as "learning_pytorch.py". Additionally, I changed the network's architecture, hyperparameters, screen resolution, and RGB display (from original gray).

Here is my code: https://colab.research.google.com/drive/1OpUPvs7h2vrHWNduS2FWenmN87b1CgoI?usp=sharing

with
python 3.6.12
torch 1.7.1
vizdoom 1.1.8
scikit-image 0.14.5
numpy 1.19.4

Miffyli · 2021-04-28T10:41:52Z

I ran the code for 20min on a Ubuntu 20.04 machine with Python 3.7, ViZDoom 1.1.8 and PyTorch 1.8.1, and the memory usage maxed out at 9GB for me (for whole system), with no noticeable sudden spikes. I would still try reducing the image size (RGB + bigger resolution takes a lot of space. Normally gray image of size (64, 64) or so is enough even for complex ViZDoom scenarios).

maytusp · 2021-04-28T10:49:01Z

I ran the code for 20min on a Ubuntu 20.04 machine with Python 3.7, ViZDoom 1.1.8 and PyTorch 1.8.1, and the memory usage maxed out at 9GB for me (for whole system), with no noticeable sudden spikes. I would still try reducing the image size (RGB + bigger resolution takes a lot of space. Normally gray image of size (64, 64) or so is enough even for complex ViZDoom scenarios).

I ran it on 64GB ram. So, there may be something wrong with the system or software, not my code?

Miffyli · 2021-04-28T11:04:36Z

I ran it on 64GB ram. So, there may be something wrong with the system or software, not my code?

The sudden spike seems to indicate so. However now as I continue running the memory use does increase, but this was to be expected with such large observations. Again, I recommend you to try out reducing resolution size and go back to gray images. I also recommend that you use a formal RL library to run experiments (e.g. stable-baselines3), where implementations are known to work out.

Edit: Just after I wrote this the memory use spiked on my machine as well. It seems to come from the Python code (probably some tensors duplicated recursively). I do not have time currently to fix this sadly, but if you debug this out we would be happy to accept a fix. I still recommend you to try out formal libraries if you wish to experiment with RL algorithms :)

maytusp · 2021-04-28T11:08:05Z

I ran it on 64GB ram. So, there may be something wrong with the system or software, not my code?

The sudden spike seems to indicate so. However now as I continue running the memory use does increase, but this was to be expected with such large observations. Again, I recommend you to try out reducing resolution size and go back to gray images. I also recommend that you use a formal RL library to run experiments (e.g. stable-baselines3), where implementations are known to work out.

Thank you very much, I will follow your suggestion.
I really appreciate your quick reponse.

Miffyli added the question label Apr 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RAM is increasing over time #484

RAM is increasing over time #484

maytusp commented Apr 28, 2021

Miffyli commented Apr 28, 2021

maytusp commented Apr 28, 2021

Miffyli commented Apr 28, 2021

maytusp commented Apr 28, 2021

Miffyli commented Apr 28, 2021 •

edited

Loading

maytusp commented Apr 28, 2021

Miffyli commented Apr 28, 2021 •

edited

Loading

maytusp commented Apr 28, 2021

RAM is increasing over time #484

RAM is increasing over time #484

Comments

maytusp commented Apr 28, 2021

Miffyli commented Apr 28, 2021

maytusp commented Apr 28, 2021

Miffyli commented Apr 28, 2021

maytusp commented Apr 28, 2021

Miffyli commented Apr 28, 2021 • edited Loading

maytusp commented Apr 28, 2021

Miffyli commented Apr 28, 2021 • edited Loading

maytusp commented Apr 28, 2021

Miffyli commented Apr 28, 2021 •

edited

Loading

Miffyli commented Apr 28, 2021 •

edited

Loading