Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add experimental API for ray.get and ray.wait with additional argument types #2071

Merged
merged 6 commits into from
Jun 1, 2018

Conversation

kunalgosar
Copy link
Contributor

@kunalgosar kunalgosar commented May 16, 2018

What do these changes do?

This adds in an experimental API to allow ray.experimental.get to take in lists, tuples, ndarrays or dicts. It also allows ray.experimental.wait to take in lists, tuples or ndarrays.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5422/
Test PASSed.

@robertnishihara
Copy link
Collaborator

This seems like a good idea, some comments/questions:

  1. Are iterables really the right abstraction? E.g., strings and dicts are iterable, but it doesn't make sense to call ray.get on them (definitely not on a string, and dicts would require more special handling). If we leave it this way, we may want to specifically check for iterables that don't work and raise exceptions or at least make sure we get a good error message. It make more sense to just specifically handle certain kinds of iterables, e.g., just lists and tuples.
  2. Can you add some tests for this (in test/runtest.py, wherever we are already testing ray.get)?
  3. We may want to respect the type, that is, if you call ray.get on a list, then you get a list back, and if you call it on a tuple, then you get a tuple back, though maybe always returning a list is ok.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5437/
Test PASSed.

@kunalgosar
Copy link
Contributor Author

  1. I agree that Iterable alone is not the right abstraction. My original reasoning was that, passing in something weird (string) would get caught in the isinstance(object_id, ray.ObjectID) check. As you said, this can instead handle certain types (lists, tuples, dicts, np.ndarray and pd.Series). If this sounds right to you, I can update the PR to specifically handle these cases.
  2. Once the specific cases are finalized, I'll update and add tests.
  3. If each iterable type is handled differently, it can also ensure that the return type matches the input type.

I will also update other methods to support this once the types are finalized (e.g. ray.wait).

Additional question: Does it make sense to handle Iterables passed into ray.put? This is not supported currently.

@robertnishihara
Copy link
Collaborator

@kunalgosar some questions. What are the types that you think are important to support? Lists and tuples make a lot of sense. Maybe numpy arrays? Does anything else make sense?

I'm still unsure about whether the return container type should match the input container type..

We could make this change for ray.wait but there are more constraints. E.g., ray.wait should probably continue to return lists (I think).

Let's not change ray.put because it's hard to distinguish between if you pass a list to ray.put that may just mean that you want to put a list in the object store (as opposed to calling ray.put on each element of the list).

@kunalgosar
Copy link
Contributor Author

I agree it makes sense to support lists and tuples. I would also add in numpy arrays and dictionaries. For the return type, I think that all of these could return just lists, except dictionaries, where is might make sense to return a dict back as well.

The types for ray.wait should probably follow the same as ray.get.

It would definitely be hard to change ray.put, without adding a parameter for num_vals or something similar. I won't change this here.

.format(threading.current_thread().getName()))


def get(object_ids, worker=None):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robertnishihara: The function names here can be changed, if needed. I was unsure what to call them so have kept it as get and wait for now.

@kunalgosar kunalgosar changed the title Allow any Iterable type to ray.get Add experimental API for ray.get and ray.wait additional argument types May 23, 2018
@kunalgosar kunalgosar changed the title Add experimental API for ray.get and ray.wait additional argument types Add experimental API for ray.get and ray.wait with additional argument types May 23, 2018
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5590/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5592/
Test PASSed.

@kunalgosar
Copy link
Contributor Author

This passes on Private travis.

@robertnishihara
Copy link
Collaborator

@kunalgosar, instead of reimplementing get/wait, let's just implement these by calling ray.get and ray.wait under the hood. All we should need to do is to cast to a list and then call the original ray.get.

We shouldn't need to duplicate check_main_thread.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5689/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5690/
Test PASSed.

@kunalgosar
Copy link
Contributor Author

@robertnishihara I've addressed the above comment and the test failures are unrelated.

Copy link
Collaborator

@robertnishihara robertnishihara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This looks pretty good. Left some comments.


def get(object_ids, worker=None):
"""Get a remote object or a collection of remote objects from the object
store.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the first doc line should fit on a single line

def get(object_ids, worker=None):
"""Get a remote object or a collection of remote objects from the object
store.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a note explaining how this differs from ray.get. The block below is probably not necessary, since you can assume people know how ray.get works.

if isinstance(object_ids, (list, tuple, np.ndarray)):
# There is a dependency on ray.worker which prevents importing
# global_worker at the top of this file
return ray.get(list(object_ids)) if worker is None else \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this worker is None appears a number of times, can we just do

worker = ray.worker.global_worker if worker is None else worker

once at the beginning of this function?

return ray.get(list(object_ids)) if worker is None else \
ray.get(list(object_ids), worker)
elif isinstance(object_ids, dict):
to_get = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be simplified, maybe something like

keys_to_get = [k for k, v in object_ids.items() if isinstance(v, ray.ObjectID)]
ids_to_get = [v for k, v in object_ids.items() if isinstance(v, ray.ObjectID)]
values = ray.get(ids_to_get)

result  = object_ids.copy()
for key, value in zip(keys_to_get, values):
    result[key] = value

result[key] = val
return result
else:
return ray.get(object_ids) if worker is None else \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this file, let's avoid using \. If multiline things are necessary, let's do

return (ray.get(object_ids) if worker is None
        else ray.get(object_ids, worker))

def wait(object_ids, num_returns=1, timeout=None, worker=None):
"""Return a list of IDs that are ready and a list of IDs that are not.

If timeout is set, the function returns either when the requested number of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest explaining documenting this function primarily by explaining the difference from ray.wait.

@@ -0,0 +1,78 @@
from __future__ import absolute_import
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change this filename to be api.py

@kunalgosar
Copy link
Contributor Author

@robertnishihara thanks for taking a look! I've resolved the above comments.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5785/
Test FAILed.

@kunalgosar
Copy link
Contributor Author

Jenkins, retest this please

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5786/
Test FAILed.

@kunalgosar
Copy link
Contributor Author

Jenkins, retest this please

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5797/
Test PASSed.

@robertnishihara
Copy link
Collaborator

Looks good to me, thanks!

@robertnishihara robertnishihara merged commit 317d0da into ray-project:master Jun 1, 2018
@alok
Copy link
Contributor

alok commented Jun 3, 2018

@robertnishihara I think the "right sort of iterable" for this problem is actually pretty hard to express in Python. We want an iterable that contains only concrete types that is not itself a concrete type (which rules out strings). Dicts would require special handling to support, but this covers all the "flat" types like ndarrays, lists, sets (though order would be a sticking point there), and tuples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants