-
Notifications
You must be signed in to change notification settings - Fork 7.2k
Refactor code about ray.ObjectID. #3674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test PASSed. |
|
The changes regarding NIL ids and the error class renaming look good to me! For the random object id generation, we should make sure that from_random is fork save if we expose it to the python side, see the discussion in apache/arrow#2400 and import ray
import multiprocessing as mp
def child(): print(ray.ObjectID.from_random())
for i in range(4): mp.Process(target=child).start()
ObjectID(fe662239bebfa676b7c37896fbe31e8548273ef1)
ObjectID(fe662239bebfa676b7c37896fbe31e8548273ef1)
ObjectID(fe662239bebfa676b7c37896fbe31e8548273ef1)
ObjectID(fe662239bebfa676b7c37896fbe31e8548273ef1)Pickling object_ids is a double edged sword. It can be very convenient for users, but can also be over-used and make fault-tolerance harder. I'd say we shouldn't do it for now and let users explicitly call .id() if they need to, to make sure they understand something potentially dangerous is going on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think there is actually a good reason to not allow object IDs to be pickled, but I'm not exactly sure what. @robertnishihara?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, my concern was that people would define remote functions that captured object IDs and that most of the time this happened it would be an accident.
I'm not really sure how much this kind of error would occur, since I haven't seen too many people complaining about object IDs not being pickleable on GitHub.
It does force us to do some ugly stuff to make actor handles pickleable (since actor handles include a bunch of object IDs).
I could go either way on this one. @guoyuhong what were your reasons for making them pickleable? Is it to simplify the actor handle code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see there is #1317.
|
Thanks @guoyuhong! Similar to @raulchen's comment in #3564 (comment), I think I prefer Also, in the future, instead of having a using just |
|
@pcmoritz @robertnishihara Do you remember what particular issue(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment doesn't need to be addressed in this PR.
We can define this CommonError in Python code and import it to the C extension. That would simplify the code. Also, the name CommonError sounds ambiguous to me. We should use more specific exception types depending on the concrete cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will temporarily change the name to RayCommonError. I think after @suquark 's cython change. This part will easier.
|
@raulchen: If we make ObjectIDs picklable, they can enter tasks by being pickled and read through objects, or even being closed over even with the official API (people can obviously already do this by calling .id()). At the moment, they can only be made available to tasks or actors if they are passed into tasks/actors or by tasks submissions. I'm not saying this is necessarily a bad thing and I'm happy to try it out, but we should look out for possible future problems (e.g. if we want to do more precise reference counting etc.). Once this kind of functionality is granted to the users, it cannot be taken away any more. |
Thanks. If we don't see any potential issues by allowing ObjectID to be picklable, I prefer to give it a try. |
|
Any progress in this PR? I am considering closing it because |
|
@suquark Thanks for the reminding. I will finish this PR. The python part can be will not conflict with your PR and |
d096764 to
006b43e
Compare
|
Test FAILed. |
|
Test FAILed. |
|
I have updated the PR.
|
|
Test FAILed. |
88feeb1 to
f0e7761
Compare
|
Test PASSed. |
|
Test PASSed. |
stephanie-wang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I left a minor comment and will approve once that and the question about pickling ObjectIDs is addressed.
python/ray/utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we can change driver_id to be a ray.ObjectID instead of handling raw bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephanie-wang I have changed driver_id to ray.ObjectID. For the question about pickling ObjectID, @raulchen had discussed with @robertnishihara , we decided to give it a try. From current Jenkins and Travis test, it works fine. We need to monitor it continuously to see whether users will have problems or could there be difficult bugs when it's pickleable.
096e342 to
83dbc99
Compare
|
Test PASSed. |
|
Test PASSed. |
83dbc99 to
b172316
Compare
|
Test FAILed. |
|
@AmplabJenkins retest this, please. |
test/runtest.py
Outdated
| assert len(task_table) == 1 | ||
| assert driver_task_id == list(task_table.keys())[0] | ||
| task_spec = task_table[driver_task_id]["TaskSpec"] | ||
| nil_id_hex = ray.experimental.state.binary_to_hex( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you can remove this and just use "ray.ObjectID.nil_id().hex()" below
pcmoritz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit, rest LGTM
|
Test PASSed. |
|
Test PASSed. |
|
Test PASSed. |
What do these changes do?
The instances of ids in python are not constant. Sometimes, ’id‘ means
bytesand sometimes ’id‘ meansray.ObjectID. In this PR, I will do my best to make the meaning of ’id‘ consistent to representray.ObjectID. Only when the id is used as a hash key, the id is needed to transformed to bytes usingid().This PR included following changes:
ray.ObjectIDto be pickled / unpickled.ray.ObjectID()to generate a NIL ID which is the same as the backend does. Convert UniqueID::nil() to a constructor #3564ray.ObjectID.from_random()to generate a random Object Id.NIL_IDtoray.ObjectID'sis_nil().common_errortoCommonErrorwhich is the wayObjectID,RayletClient,Task, etc. use.Related issue number
N/A