Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow ray.get and ray.wait to take in additional argument types #2126

Open
kunalgosar opened this issue May 23, 2018 · 7 comments
Open

Allow ray.get and ray.wait to take in additional argument types #2126

kunalgosar opened this issue May 23, 2018 · 7 comments
Labels
enhancement Request for new feature and/or capability P3 Issue moderate in impact or severity

Comments

@kunalgosar
Copy link
Contributor

kunalgosar commented May 23, 2018

I wanted to open a discussion on this API change. The proposal here is shown via an experimental API in #2071. It would be helpful for people to try these out and see what you think; depending on how people feel, we can just support a subset of the types proposed below:

Allow ray.get to accept object_ids as lists, tuples, ndarrays and dicts:
In [2]: oids = [ray.put(i) for i in range(5)]

In [3]: ray.experimental.get(tuple(oids))
Out[3]: [0, 1, 2, 3, 4]

In [4]: ray.experimental.get(np.array(oids))
Out[4]: [0, 1, 2, 3, 4]

In [5]: d = {'one': ray.put(1),
   ...:      'two': ray.put(2),
   ...:      'three': ray.put(3),
   ...:      'four': 4,
   ...:      'five': 5}
   ...:      

In [6]: ray.experimental.get(d)
Out[6]: {'five': 5, 'four': 4, 'one': 1, 'three': 3, 'two': 2}
Allow ray.wait to accept object_ids as lists, tuples and ndarrays:
In [7]: oids = (ray.put(0), ray.put(1), ray.put(2))

In [8]: ray.experimental.wait(oids)
Out[8]: 
([ObjectID(4164de3113a3dffa90bcc19d0ea5abb93582c528)],
 [ObjectID(4364de3113a3dffa90bcc19d0ea5abb93582c528),
  ObjectID(4264de3113a3dffa90bcc19d0ea5abb93582c528)])

In [9]: oids = np.array([ray.put(i) for i in range(5)])

In [10]: ray.experimental.wait(oids)
Out[10]: 
([ObjectID(4464de3113a3dffa90bcc19d0ea5abb93582c528)],
 [ObjectID(4764de3113a3dffa90bcc19d0ea5abb93582c528),
  ObjectID(5864de3113a3dffa90bcc19d0ea5abb93582c528),
  ObjectID(4664de3113a3dffa90bcc19d0ea5abb93582c528),
  ObjectID(4564de3113a3dffa90bcc19d0ea5abb93582c528)])

Open Questions:

  1. Should the return type of ray.get match the input type? Or is always returning lists fine? Except the dict case, where it will always return a dict.
  2. Are there any additional types that should be handled as well?

cc @robertnishihara

@devin-petersohn
Copy link
Member

This seems useful, particularly for np.array which is frequently used in DataFrames.

@pschafhalter
Copy link
Contributor

I like the general idea of the proposal. Extending ray.get support to data structures beyond lists seems natural. In addition, I think that it could make sense to support generators too.

I am concerned about about consistency with the API. For example, claiming support for ndarrays, but only when n=1 seems restrictive. On the other hand, support for n>1 would imply support for nested data structures (e.g. nested lists) which could quickly complicate and introduce bugs to ray.get.

Furthermore, why add explicit support for ndarrays which already implement tolist()? From my understanding, computation time, object store time, and deserialization time are far greater than the time spent calling tolist().

@devin-petersohn
Copy link
Member

@pschafhalter Great point, tolist() is currently what we use. Ideally we wouldn't have to go through the tolist() operation and then back just to use ray.get.

Consistency in the API is important, which is why I suggest we don't do anything in the dict area yet. We can reasonably assume that users would expect to get the same type that they passed in back as a result from ray.get (e.g. return ndarray when ray.get(ndarray) is given).

The more I think about this the more I think we should wait (no pun intended) to implement it.

@kunalgosar
Copy link
Contributor Author

@pschafhalter For the ndarray point, I disagree that only supporting 1D arrays is restrictive. Currently, the experimental API throws an Error if a 2D array is passed in, which is no different then if a listed nest is passed in now. I agree that it does not make sense to support >1D ndarrays or nested lists.

@devin-petersohn Using dict is definitely the most questionable datatype out of those proposed. The goal of this is to add in the experimental API, called as ray.experimental.get and ray.experimental.wait so that people can begin playing with the new API. Based on how people feel about it after, the code can be moved out of the experimental API.

In terms of return types matching input types, @robertnishihara has said that it might make sense to always return lists regardless of input type. But, I am still torn on this point.

@devin-petersohn
Copy link
Member

I can see someone wanting to ray.get a matrix of OID objects without having to loop themselves. We do this in DataFrames, but wouldn't it be nice if you could say ray.get(self._block_partitions) for the entire matrix? I think so. For first pass it makes sense to stick with 1D, but I do see value in 2D matrix ray.get.

From an applications perspective, I would prefer the return type to be the same as the input type. Otherwise I am just casting it again and half the convenience of implementing this is lost.

The reason I think we should wait is because this feels like a slippery slope. If we start implementing one we can quickly find ourselves trying to support iterator, set, and potentially more. This will complicate the API a bit, which is something I don't think we want.

@pschafhalter
Copy link
Contributor

@devin-petersohn this relates to the consistency problem I think this introduces in the API.

Claiming to support ray.get for ndarrays seems disingenuous if we restrict to 1D. 2+D ndarrays are useful, but to me that suggests support for nested data structures (e.g. nested lists of OIDs) which I don't think we should support.

@simon-mo
Copy link
Contributor

My vote is for all 1-D "Flat" data structure including ndarray, dictionary, even set or generator. Users should be using ray.get at their own risk if they were going to pass in a generator etc.

ray.get should function like a map function over 1 d data structures.

@edoakes edoakes added enhancement Request for new feature and/or capability P3 Issue moderate in impact or severity labels Mar 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Request for new feature and/or capability P3 Issue moderate in impact or severity
Projects
None yet
Development

No branches or pull requests

5 participants