You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One almost solution is to set the max_calls argument to 1, when calling your remote function.
Here's is my code and instructions to replicate what I see:
test_ray.py:
import ray
from test_ray_lib import bar
def foo(x):
return bar(x)
ray.init(address='auto')
jobs = []
print('Submitting jobs')
for _ in range(5):
jobs.append(ray.remote(max_calls=1)(foo).remote(None))
print('Reading results')
res = [ray.get(j) for j in jobs]
print(res)
test_ray_lib.py:
def bar(x):
return 0
So foo just imports bar from test_ray_lib.py. Also note how max_calls is set to 1.
Running my script on ray:
Also sync up the head with our new test_ray_lib.py version
(I should point out I have max_workers: 0 in my cluster config, so the file only needs to be rsynced to the head node)
ray rsync-up cluster.yaml ~/ray_test/scripts/test_ray_lib.py ~/ray_test/scripts/test_ray_lib.py
If there were a way to drop the stale worker when pushing new versions of test_rayt_lib.py, the flow above can be automated so users wouldn't have to worry about stale function definitions (with some tradeoffs, of course).
Is there a way to force remote functions to always be ran on fresh workers? Alternatively, is there a way to reset all workers manually?
The text was updated successfully, but these errors were encountered:
I have been looking for a way to solve the 'outdated function definition' problem, as described here: https://ray.readthedocs.io/en/latest/troubleshooting.html#outdated-function-definitions
One almost solution is to set the
max_calls
argument to 1, when calling your remote function.Here's is my code and instructions to replicate what I see:
test_ray.py
:test_ray_lib.py
:So foo just imports
bar
fromtest_ray_lib.py
. Also note howmax_calls
is set to 1.Running my script on ray:
Now change
test_ray_lib.py
and havebar
return 1:Also sync up the head with our new
test_ray_lib.py
version(I should point out I have
max_workers: 0
in my cluster config, so the file only needs to be rsynced to the head node)We can now rerun our script:
Almost what we want - it seems like one worker is still around from before we changed
test_ray_lib.py
.Rerunning yet again:
So all the workers are fresh.
If there were a way to drop the stale worker when pushing new versions of
test_rayt_lib.py
, the flow above can be automated so users wouldn't have to worry about stale function definitions (with some tradeoffs, of course).Is there a way to force remote functions to always be ran on fresh workers? Alternatively, is there a way to reset all workers manually?
The text was updated successfully, but these errors were encountered: