Allow remote functions to require running on a fresh worker #7059

matter-funds · 2020-02-05T10:09:00Z

I have been looking for a way to solve the 'outdated function definition' problem, as described here: https://ray.readthedocs.io/en/latest/troubleshooting.html#outdated-function-definitions

One almost solution is to set the max_calls argument to 1, when calling your remote function.
Here's is my code and instructions to replicate what I see:

test_ray.py:

import ray
from test_ray_lib import bar

def foo(x):
  return bar(x)

ray.init(address='auto')
jobs = []
print('Submitting jobs')
for _ in range(5):
  jobs.append(ray.remote(max_calls=1)(foo).remote(None))

print('Reading results')
res = [ray.get(j) for j in jobs]
print(res)

test_ray_lib.py:

def bar(x):
  return 0

So foo just imports bar from test_ray_lib.py. Also note how max_calls is set to 1.
Running my script on ray:

(env) ubuntu@ip-172-31-30-103:scripts$ python test_ray.py
Submitting jobs
Reading results
[0, 0, 0, 0, 0]

Now change test_ray_lib.py and have bar return 1:

def bar(x):
  return 0

Also sync up the head with our new test_ray_lib.py version
(I should point out I have max_workers: 0 in my cluster config, so the file only needs to be rsynced to the head node)

ray rsync-up cluster.yaml ~/ray_test/scripts/test_ray_lib.py ~/ray_test/scripts/test_ray_lib.py

We can now rerun our script:

(env) ubuntu@ip-172-31-30-103:scripts$ python test_ray.py
Submitting jobs
Reading results
[0, 1, 1, 1, 1]

Almost what we want - it seems like one worker is still around from before we changed test_ray_lib.py.

Rerunning yet again:

(env) ubuntu@ip-172-31-30-103:scripts$ python test_ray.py
Submitting jobs
Reading results
[1, 1, 1, 1, 1]

So all the workers are fresh.

If there were a way to drop the stale worker when pushing new versions of test_rayt_lib.py, the flow above can be automated so users wouldn't have to worry about stale function definitions (with some tradeoffs, of course).

Is there a way to force remote functions to always be ran on fresh workers? Alternatively, is there a way to reset all workers manually?

The text was updated successfully, but these errors were encountered:

matter-funds added the enhancement Request for new feature and/or capability label Feb 5, 2020

ericl added the P3 Issue moderate in impact or severity label Mar 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow remote functions to require running on a fresh worker #7059

Allow remote functions to require running on a fresh worker #7059

matter-funds commented Feb 5, 2020 •

edited

Loading

Allow remote functions to require running on a fresh worker #7059

Allow remote functions to require running on a fresh worker #7059

Comments

matter-funds commented Feb 5, 2020 • edited Loading

matter-funds commented Feb 5, 2020 •

edited

Loading