Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow remote functions to require running on a fresh worker #7059

Open
matter-funds opened this issue Feb 5, 2020 · 0 comments
Open

Allow remote functions to require running on a fresh worker #7059

matter-funds opened this issue Feb 5, 2020 · 0 comments
Labels
enhancement Request for new feature and/or capability P3 Issue moderate in impact or severity

Comments

@matter-funds
Copy link

matter-funds commented Feb 5, 2020

I have been looking for a way to solve the 'outdated function definition' problem, as described here: https://ray.readthedocs.io/en/latest/troubleshooting.html#outdated-function-definitions

One almost solution is to set the max_calls argument to 1, when calling your remote function.
Here's is my code and instructions to replicate what I see:

test_ray.py:

import ray
from test_ray_lib import bar

def foo(x):
  return bar(x)

ray.init(address='auto')
jobs = []
print('Submitting jobs')
for _ in range(5):
  jobs.append(ray.remote(max_calls=1)(foo).remote(None))

print('Reading results')
res = [ray.get(j) for j in jobs]
print(res)

test_ray_lib.py:

def bar(x):
  return 0

So foo just imports bar from test_ray_lib.py. Also note how max_calls is set to 1.
Running my script on ray:

(env) ubuntu@ip-172-31-30-103:scripts$ python test_ray.py
Submitting jobs
Reading results
[0, 0, 0, 0, 0]

Now change test_ray_lib.py and have bar return 1:

def bar(x):
  return 0

Also sync up the head with our new test_ray_lib.py version
(I should point out I have max_workers: 0 in my cluster config, so the file only needs to be rsynced to the head node)

ray rsync-up cluster.yaml ~/ray_test/scripts/test_ray_lib.py ~/ray_test/scripts/test_ray_lib.py

We can now rerun our script:

(env) ubuntu@ip-172-31-30-103:scripts$ python test_ray.py
Submitting jobs
Reading results
[0, 1, 1, 1, 1]

Almost what we want - it seems like one worker is still around from before we changed test_ray_lib.py.

Rerunning yet again:

(env) ubuntu@ip-172-31-30-103:scripts$ python test_ray.py
Submitting jobs
Reading results
[1, 1, 1, 1, 1]

So all the workers are fresh.

If there were a way to drop the stale worker when pushing new versions of test_rayt_lib.py, the flow above can be automated so users wouldn't have to worry about stale function definitions (with some tradeoffs, of course).

Is there a way to force remote functions to always be ran on fresh workers? Alternatively, is there a way to reset all workers manually?

@matter-funds matter-funds added the enhancement Request for new feature and/or capability label Feb 5, 2020
@ericl ericl added the P3 Issue moderate in impact or severity label Mar 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Request for new feature and/or capability P3 Issue moderate in impact or severity
Projects
None yet
Development

No branches or pull requests

2 participants