You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With code based off v0.5.0, commit 152ec50, ray complains with errors like:
2023-02-28 15:45:28,812 WARNING worker.py:1851 -- WARNING: 274 PYTHON worker processes have been started on node: 22020832ef6d4fae8e4bb7b15ae173df3c17d639877d04d413e5c506 with address: 192.168.0.52. This could be a result of using a large number of actors, or due to tasks blocked in ray.get() calls (see https://github.com/ray-project/ray/issues/3644 for some discussion of workarounds).
On my machine, the data utility notebook ends up spitting this out:
...
2023-02-28 15:58:12,575 WARNING worker.py:1851 -- WARNING: 676 PYTHON worker processes have been started on node: 22020832ef6d4fae8e4bb7b15ae173df3c17d639877d04d413e5c506 with address: 192.168.0.52. This could be a result of using a large number of actors, or due to tasks blocked in ray.get() calls (see https://github.com/ray-project/ray/issues/3644 for some discussion of workarounds).
(raylet) [2023-02-28 15:58:19,519 E 114931 114931] (raylet) node_manager.cc:3097: 2 Workers (tasks / actors) killed due to memory pressure (OOM), 0 Workers crashed due to other reasons at node (ID: 22020832ef6d4fae8e4bb7b15ae173df3c17d639877d04d413e5c506, IP: 192.168.0.52) over the last time period. To see more information about the Workers killed on this node, use `ray logs raylet.out -ip 192.168.0.52`
(raylet)
(raylet) Refer to the documentation on how to address the out of memory issue: https://docs.ray.io/en/latest/ray-core/scheduling/ray-oom-prevention.html. Consider provisioning more memory on this node or reducing task parallelism by requesting more CPUs per task. To adjust the kill threshold, set the environment variable `RAY_memory_usage_threshold` when starting Ray. To disable worker killing, set the environment variable `RAY_memory_monitor_refresh_ms` to zero.
2023-02-28 15:58:21,823 WARNING worker.py:1851 -- WARNING: 736 PYTHON worker processes have been started on node: 22020832ef6d4fae8e4bb7b15ae173df3c17d639877d04d413e5c506 with address: 192.168.0.52. This could be a result of using a large number of actors, or due to tasks blocked in ray.get() calls (see https://github.com/ray-project/ray/issues/3644 for some discussion of workarounds).
(raylet) [2023-02-28 15:59:19,521 E 114931 114931] (raylet) node_manager.cc:3097: 4 Workers (tasks / actors) killed due to memory pressure (OOM), 0 Workers crashed due to other reasons at node (ID: 22020832ef6d4fae8e4bb7b15ae173df3c17d639877d04d413e5c506, IP: 192.168.0.52) over the last time period. To see more information about the Workers killed on this node, use `ray logs raylet.out -ip 192.168.0.52`
(raylet)
(raylet) Refer to the documentation on how to address the out of memory issue: https://docs.ray.io/en/latest/ray-core/scheduling/ray-oom-prevention.html. Consider provisioning more memory on this node or reducing task parallelism by requesting more CPUs per task. To adjust the kill threshold, set the environment variable `RAY_memory_usage_threshold` when starting Ray. To disable worker killing, set the environment variable `RAY_memory_monitor_refresh_ms` to zero.
And after a few more iterations, it dies with
(raylet) [2023-02-28 16:18:26,974 E 114931 114931] (raylet) worker_pool.cc:524: Some workers of the worker process(953902) have not registered within the timeout. The process is still alive, probably it's hanging during start.
(ShapleyWorker pid=900576) E0228 16:18:22.835699587 902741 chttp2_transport.cc:2721] keepalive_ping_end state error: 0 (expect: 1)
(ShapleyWorker pid=906401) E0228 16:18:26.290218105 909206 chttp2_transport.cc:2721] keepalive_ping_end state error: 0 (expect: 1)
(ShapleyWorker pid=906390) E0228 16:18:22.540661878 907569 chttp2_transport.cc:2721] keepalive_ping_end state error: 0 (expect: 1)
(ShapleyWorker pid=912437) E0228 16:18:25.968255772 914199 chttp2_transport.cc:2721] keepalive_ping_end state error: 0 (expect: 1)
(ShapleyWorker pid=912395) E0228 16:18:30.959742320 913533 chttp2_transport.cc:2721] keepalive_ping_end state error: 0 (expect: 1)
The text was updated successfully, but these errors were encountered:
With code based off v0.5.0, commit 152ec50, ray complains with errors like:
The issue referenced in the error claims this might be because of nested remote calls.
On my machine, the data utility notebook ends up spitting this out:
And after a few more iterations, it dies with
The text was updated successfully, but these errors were encountered: