-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Out of memory errors #102
Comments
Hi, |
Hi @schwab , |
OK, will use smaller buffer for now. |
How much memory do we need to train muzero games? So far I've gotten out of memory with atari and breakout on a system with 64GB RAM. Most of the memory is used by the ReplayBuffer, so perhaps there are ways to limit that? Also, it doesn't seem to be using swap effectively and always dies when it hits 95% of system memory. (This is a linux ubuntu 20.10 system).
File "/home/mcstar_dev/.local/lib/python3.8/site-packages/ray/worker.py", line 1379, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RayOutOfMemoryError): ray::SharedStorage.get_info() (pid=17985, ip=192.168.1.175)
File "python/ray/_raylet.pyx", line 423, in ray._raylet.execute_task
File "/home/mcstar_dev/.local/lib/python3.8/site-packages/ray/memory_monitor.py", line 130, in raise_if_low_memory
raise RayOutOfMemoryError(
ray.memory_monitor.RayOutOfMemoryError: More than 95% of the memory on node tr001 is used (59.62 / 62.75 GB). The top 10 memory consumers are:
PID MEM COMMAND
17950 50.6GiB ray::ReplayBuffer
17944 1.48GiB ray::Trainer.continuous_update_weights()
17978 0.8GiB ray::SelfPlay.continuous_self_play()
9431 0.29GiB /usr/bin/gnome-shell
981691 0.24GiB /usr/bin/python3 /home/mcstar_dev/.local/bin/tensorboard --logdir ./results
17980 0.23GiB ray::SelfPlay.continuous_self_play()
110347 0.19GiB /opt/google/chrome/chrome --type=gpu-process --field-trial-handle=6870262073328718638,16916755515565
17599 0.17GiB python3 muzero.py
777299 0.16GiB /snap/code/52/usr/share/code/code --type=renderer --disable-color-correct-rendering --no-sandbox --f
777313 0.16GiB /snap/code/52/usr/share/code/code --type=renderer --disable-color-correct-rendering --no-sandbox --f
In addition, up to 1.93 GiB of shared memory is currently being used by the Ray object store.
The text was updated successfully, but these errors were encountered: