-
Notifications
You must be signed in to change notification settings - Fork 7k
[Core] Ownership-based Object Directory - Added support for object spilling in the ownership-based object directory. #13948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Ownership-based Object Directory - Added support for object spilling in the ownership-based object directory. #13948
Conversation
0a4214c to
37dedcf
Compare
37dedcf to
94d1c83
Compare
94d1c83 to
99aad22
Compare
Why didn't you run test_object_spilling.py? |
99aad22 to
c94f943
Compare
c94f943 to
855bee0
Compare
3405e7f to
9a6a3b4
Compare
|
All object spilling tests pass for me locally (with the ownership-based object directory flag turned on): So I think that this is functionally ready to be merged, pending any design or style feedback. |
|
OSX build failures |
| c_vector[CObjectID] contained_ids, | ||
| CObjectID *c_object_id, shared_ptr[CBuffer] *data) | ||
| CObjectID *c_object_id, shared_ptr[CBuffer] *data, | ||
| owner_address=*) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this =* mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's how you define optional arguments in Cython .pxd header files: https://cython.readthedocs.io/en/stable/src/userguide/language_basics.html#optional-arguments
| RAY_LOG(DEBUG) << "Received AddSpilledUrl request for object " << object_id | ||
| << ", which has been spilled to " << spilled_url << " on node " | ||
| << node_id; | ||
| auto reference_exists = reference_counter_->HandleObjectSpilled( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
| if (rpc_client == nullptr) { | ||
| RAY_LOG(WARNING) << "Object " << object_id << " does not have owner. " | ||
| << "LookupLocations returns an empty list of locations."; | ||
| io_service_.post([callback, object_id]() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, is this post necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, due to this issue, further described in this thread.
| const auto node_id_object_spilled = | ||
| is_external_storage_type_fs_ ? self_node_id_ : NodeID::Nil(); | ||
|
|
||
| const auto unpin_callback = [this, object_id, object_url, callback, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just make it a private method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like HandleObjectUnpinned or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason why? Defining these kind of callbacks as inline lambdas is definitely the most common pattern in Ray.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I am not sure if this kind of callback is common (use callback when we can have a private method). I prefer private method because if there’s some fault, stack trace will look better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good reason, I'll change this to a std::binded private method.
e2d2c02 to
8d9fbbf
Compare
rkooo567
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if MacOS build passes
8d9fbbf to
52b0e01
Compare
52b0e01 to
5116209
Compare
|
The failing MacOS tests are both tests that I've seen failing in recent master and other unrelated PR builds, I don't think that those failures should block the merging of this PR. |
| /// A callback to call when an object has been freed. | ||
| std::function<void(const std::vector<ObjectID> &)> on_objects_freed_; | ||
|
|
||
| // Objects that are pinned on this node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you update this comment to say why we also store the owner address?
stephanie-wang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
…illing in the ownership-based object directory. (ray-project#13948) * Add support for object spilling in the ownership-based object directory. * Move owner address hashmap into pinned_objects_ and objects_pending_spill_. * Update local object manager tests. * Feedback and misc. fixes. * Move spilled unpin callback lambda to std::binded private method. * Skip test_delete_objects_multi_node test on MacOS for now.
…bject spilling in the ownership-based object directory. (ray-project#13948)" This reverts commit e6b200b.
Added support for object spilling in the ownership-based object directory.
Right now, on object spilling that's triggered by the owner, instead of the
RequestObjectSpillageRPC response containing the spilled URL and the spilled node ID (which would be optimal), we only return a success indicator and instead merge with the automatic raylet-based object spilling path, sending an RPC to the owner containing the spilled URL and spilled node ID when the object is spilled. This results in one extra RPC on the owner-triggered object spill path, but results in cleaner implementation and makes this change a good bit smaller. I'm still debating whether to move to the less clean but more efficient path in this PR.I've enabled the feature flag locally and have ensured that the
test_basic*.py,test_advanced*.py, andtest_object_manager.pye2e Python tests passed. Unit tests for the ownership-based object directory and the object manager will come in the next PR, along with turning on the feature flag.Related issue number
Closes #13701, closes #14010
Checks
scripts/format.shto lint the changes in this PR.