Out of memory errors #102

schwab · 2021-01-01T01:33:08Z

How much memory do we need to train muzero games? So far I've gotten out of memory with atari and breakout on a system with 64GB RAM. Most of the memory is used by the ReplayBuffer, so perhaps there are ways to limit that? Also, it doesn't seem to be using swap effectively and always dies when it hits 95% of system memory. (This is a linux ubuntu 20.10 system).

File "/home/mcstar_dev/.local/lib/python3.8/site-packages/ray/worker.py", line 1379, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RayOutOfMemoryError): ray::SharedStorage.get_info() (pid=17985, ip=192.168.1.175)
File "python/ray/_raylet.pyx", line 423, in ray._raylet.execute_task
File "/home/mcstar_dev/.local/lib/python3.8/site-packages/ray/memory_monitor.py", line 130, in raise_if_low_memory
raise RayOutOfMemoryError(
ray.memory_monitor.RayOutOfMemoryError: More than 95% of the memory on node tr001 is used (59.62 / 62.75 GB). The top 10 memory consumers are:

PID MEM COMMAND
17950 50.6GiB ray::ReplayBuffer
17944 1.48GiB ray::Trainer.continuous_update_weights()
17978 0.8GiB ray::SelfPlay.continuous_self_play()
9431 0.29GiB /usr/bin/gnome-shell
981691 0.24GiB /usr/bin/python3 /home/mcstar_dev/.local/bin/tensorboard --logdir ./results
17980 0.23GiB ray::SelfPlay.continuous_self_play()
110347 0.19GiB /opt/google/chrome/chrome --type=gpu-process --field-trial-handle=6870262073328718638,16916755515565
17599 0.17GiB python3 muzero.py
777299 0.16GiB /snap/code/52/usr/share/code/code --type=renderer --disable-color-correct-rendering --no-sandbox --f
777313 0.16GiB /snap/code/52/usr/share/code/code --type=renderer --disable-color-correct-rendering --no-sandbox --f

In addition, up to 1.93 GiB of shared memory is currently being used by the Ray object store.

werner-duvaud · 2021-01-01T11:46:22Z

Hi,
On most simple games it doesn't need a lot of ram. For the atari version with the same network as in the MuZero paper, it can require more ram. I haven't tested atari so I have no idea how much ram is needed. On the other hand indeed the Replay buffer stores the games in ram, to train atari for a long time it would be necessary to implement a system to write on disk and load in ram the games with the greatest probabilities of being used. I will try to add that when I have time a PR is welcome otherwise.

ahainaut · 2021-01-07T17:51:04Z

Hi @schwab ,
In addition to Werner's answer, you can lower the replay_buffer size in order to save some space in the ram.

schwab · 2021-01-11T22:38:01Z

OK, will use smaller buffer for now.

schwab closed this as completed Jan 11, 2021

me-unsolicited mentioned this issue Apr 22, 2021

Keep replay buffer on disk (not in memory), allowing it to grow to any size. #151

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out of memory errors #102

Out of memory errors #102

schwab commented Jan 1, 2021

werner-duvaud commented Jan 1, 2021 •

edited

Loading

ahainaut commented Jan 7, 2021

schwab commented Jan 11, 2021

Out of memory errors #102

Out of memory errors #102

Comments

schwab commented Jan 1, 2021

werner-duvaud commented Jan 1, 2021 • edited Loading

ahainaut commented Jan 7, 2021

schwab commented Jan 11, 2021

werner-duvaud commented Jan 1, 2021 •

edited

Loading