-
-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RAM is increasing over time #484
Comments
Learning examples use Q-learning with replay buffer, which use a lot of memory, especially with images. You can try setting the |
In that case, should memory be full since the early epoch? What I found is that memory in early epochs remains the same, but the memory in later epochs radically increases. As in this figure: https://drive.google.com/file/d/1LXvwdr6g_5kLLlOW6tDWS2NQuX2vhjxQ/view?usp=sharing |
Not necessarily. Depending on how the buffer is created, it might slowly fill up as training goes. This is how the example does with just appending stuff into a Python list
Ah, that is not normal and I see the problem! Can you provide the exact code you run and system information (Python version, library versions, ViZDoom versions etc)? Is it the Python script that eats the memory or ZDoom process? |
It's mostly the same as "learning_pytorch.py". Additionally, I changed the network's architecture, hyperparameters, screen resolution, and RGB display (from original gray). Here is my code: https://colab.research.google.com/drive/1OpUPvs7h2vrHWNduS2FWenmN87b1CgoI?usp=sharing with |
I ran the code for 20min on a Ubuntu 20.04 machine with Python 3.7, ViZDoom 1.1.8 and PyTorch 1.8.1, and the memory usage maxed out at 9GB for me (for whole system), with no noticeable sudden spikes. I would still try reducing the image size (RGB + bigger resolution takes a lot of space. Normally gray image of size (64, 64) or so is enough even for complex ViZDoom scenarios). |
I ran it on 64GB ram. So, there may be something wrong with the system or software, not my code? |
The sudden spike seems to indicate so. However now as I continue running the memory use does increase, but this was to be expected with such large observations. Again, I recommend you to try out reducing resolution size and go back to gray images. I also recommend that you use a formal RL library to run experiments (e.g. stable-baselines3), where implementations are known to work out. Edit: Just after I wrote this the memory use spiked on my machine as well. It seems to come from the Python code (probably some tensors duplicated recursively). I do not have time currently to fix this sadly, but if you debug this out we would be happy to accept a fix. I still recommend you to try out formal libraries if you wish to experiment with RL algorithms :) |
Thank you very much, I will follow your suggestion. |
I have implemented vizdoom tasks following "learning_pytorch.py" in an example folder. I found that memory usage was increasing as the training epoch. The task "health gathering" in the late epoch sometimes utilized 100% of memory and then it was a breakdown. Sometimes, the memory stopped increasing at around 80-90% and it can run until the last epoch. How can I fix this issue?
The text was updated successfully, but these errors were encountered: