-
Notifications
You must be signed in to change notification settings - Fork 7k
Refactor ObjectDirectory to reduce and fix callback usage #3227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor ObjectDirectory to reduce and fix callback usage #3227
Conversation
|
Test FAILed. |
| callback(client_id_vec, object_id); | ||
| io_service_.post([this, callback, client_id_vec, object_id]() { | ||
| callback(client_id_vec, object_id); | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the key fix, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, at least for #3201.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Stephanie. Could you explain what's happening, and how this change fixes #3201?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is basically the same one that was discussed when ray.wait was first implemented here. Basically the callback deletes from a data structure shared with the caller, so an iterator held by the caller gets invalidated when the callback returns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I'm still confused because I thought we addressed that issue before merging. The iterator in SubscribeRemainingWaitObjects makes a copy of the object ids it iterates over (see here), which is not shared with the callback. The callback does not modify the vector of object ids (see here). Could you explain precisely the failure scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but the memory referenced by wait_state in that function gets invalidated because the callback deletes the wait_id from active_wait_requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, one failure scenario is as follows:
- The final object id's callback is invoked immediately.
- The
wait_idis removed within the callback (i.e.WaitCompleteis invoked). - At this point, the loop exits, so this is not invoked.
- This is invoked, which references memory that doesn't exist.
Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
| flatbuffers::FlatBufferBuilder fbb; | ||
| bool is_transfer = (type == ConnectionPool::ConnectionType::TRANSFER); | ||
| auto message = object_manager_protocol::CreateConnectClientMessage( | ||
| fbb, fbb.CreateString(client_id_.binary()), is_transfer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor, but can be to_flatbuf(fbb, client_id_).
|
Test FAILed. |
|
Test PASSed. |
|
Test PASSed. |
elibol
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, this looks good to me. I'd prefer to avoid passing in the main service into the object directory by using some other asynchronous mechanism to invoke the callback on a separate stack, but I think this solution is better than further complicating the logic of object manager's wait implementation.
With these changes in place, we can clean up some of the code that originally dealt with the immediate callback issue within SubscribeRemainingWaitObjects, like this condition.
We ought to consider cleaning up that function in this PR, or at least add a TODO to do so. Leaving it as-is is harmless, but it unnecessarily complicates the code, and some of the comments are no longer valid.
|
Thanks, @elibol, I removed the old code. |
|
Test FAILed. |
|
Test PASSed. |
What do these changes do?
This refactors the
ObjectDirectoryto only use callbacks where necessary and to post callbacks on an event loop instead of calling the callbacks in the same stack.Related issue number
#2959