Skip to content

Conversation

@clarkzinzow
Copy link
Contributor

@clarkzinzow clarkzinzow commented Feb 6, 2021

Added support for object spilling in the ownership-based object directory.

Right now, on object spilling that's triggered by the owner, instead of the RequestObjectSpillage RPC response containing the spilled URL and the spilled node ID (which would be optimal), we only return a success indicator and instead merge with the automatic raylet-based object spilling path, sending an RPC to the owner containing the spilled URL and spilled node ID when the object is spilled. This results in one extra RPC on the owner-triggered object spill path, but results in cleaner implementation and makes this change a good bit smaller. I'm still debating whether to move to the less clean but more efficient path in this PR.

I've enabled the feature flag locally and have ensured that the test_basic*.py, test_advanced*.py, and test_object_manager.py e2e Python tests passed. Unit tests for the ownership-based object directory and the object manager will come in the next PR, along with turning on the feature flag.

Related issue number

Closes #13701, closes #14010

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@clarkzinzow clarkzinzow force-pushed the core/feat/obod-object-spilling branch from 0a4214c to 37dedcf Compare February 6, 2021 00:52
@clarkzinzow clarkzinzow force-pushed the core/feat/obod-object-spilling branch from 37dedcf to 94d1c83 Compare February 6, 2021 01:44
@clarkzinzow clarkzinzow force-pushed the core/feat/obod-object-spilling branch from 94d1c83 to 99aad22 Compare February 6, 2021 02:00
@rkooo567
Copy link
Contributor

rkooo567 commented Feb 7, 2021

I've enabled the feature flag locally and have ensured that the test_basic*.py, test_advanced*.py, and test_object_manager.py

Why didn't you run test_object_spilling.py?

@stephanie-wang stephanie-wang added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Feb 8, 2021
@clarkzinzow clarkzinzow force-pushed the core/feat/obod-object-spilling branch from 99aad22 to c94f943 Compare February 9, 2021 02:17
@clarkzinzow clarkzinzow force-pushed the core/feat/obod-object-spilling branch from c94f943 to 855bee0 Compare February 10, 2021 00:04
@clarkzinzow clarkzinzow removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Feb 10, 2021
@clarkzinzow clarkzinzow force-pushed the core/feat/obod-object-spilling branch from 3405e7f to 9a6a3b4 Compare February 10, 2021 00:55
@clarkzinzow
Copy link
Contributor Author

All object spilling tests pass for me locally (with the ownership-based object directory flag turned on):

ubuntu@ip-172-31-46-244:(ws)/ray/python/ray/tests $ pytest -v test_object_spilling.py
 ray/tests/test_object_spilling.py::test_invalid_config_raises_exception ✓                                                                                                                                                                                                                                                                                                                                                                                                                   
 ray/tests/test_object_spilling.py::test_url_generation_and_parse ✓                                                                                                                                                                                                                                                                                                                                                                                                                         
 ray/tests/test_object_spilling.py::test_default_config ✓                                                                                                                                                                                                                                                                                                                                                                                                                                   
 ray/tests/test_object_spilling.py::test_default_config_cluster ✓                                                                                                                                                                                                                                                                                                                                                                                                                           
 ray/tests/test_object_spilling.py::test_spilling_not_done_for_pinned_object[object_spilling_config0] ✓                                                                                                                                                                                                                                                                                                                                                                                     
 ray/tests/test_object_spilling.py::test_spill_remote_object[multi_node_object_spilling_config0] ✓                                                                                                                                                                                                                                                                                                                                                                                          
 ray/tests/test_object_spilling.py::test_spill_remote_object[multi_node_object_spilling_config1] ✓                                                                                                                                                                                                                                                                                                                                                                                          
 ray/tests/test_object_spilling.py::test_spill_objects_automatically[object_spilling_config0] ✓                                                                                                                                                                                                                                                                                                                                                                                             
 ray/tests/test_object_spilling.py::test_spill_stats[object_spilling_config0] ✓                                                                                                                                                                                                                                                                                                                                                                                                             
 ray/tests/test_object_spilling.py::test_spill_during_get[object_spilling_config0] ✓                                                                                                                                                                                                                                                                                                                                                                                                        
 ray/tests/test_object_spilling.py::test_spill_deadlock[object_spilling_config0] ✓                                                                                                                                                                                                                                                                                                                                                                                                          
 ray/tests/test_object_spilling.py::test_delete_objects[object_spilling_config0] ✓                                                                                                                                                                                                                                                                                                                                                                                                          
 ray/tests/test_object_spilling.py::test_delete_objects_delete_while_creating[object_spilling_config0] ✓                                                                                                                                                                                                                                                                                                                                                                                    
 ray/tests/test_object_spilling.py::test_delete_objects_on_worker_failure[object_spilling_config0] ✓                                                                                                                                                                                                                                                                                                                                                                                        
 ray/tests/test_object_spilling.py::test_delete_objects_multi_node[multi_node_object_spilling_config0] ✓                                                                                                                                                                                                                                                                                                                                                                                    
 ray/tests/test_object_spilling.py::test_delete_objects_multi_node[multi_node_object_spilling_config1] ✓                                                                                                                                                                                                                                                                                                                                                                                    
 ray/tests/test_object_spilling.py::test_fusion_objects[object_spilling_config0] ✓                                                                                                                                                                                                                                                                                                                                                                                                          
 ray/tests/test_object_spilling.py::test_no_release_during_plasma_fetch[object_spilling_config0] ✓                                                                                                                                                                                                                                                                                                                                                                                          
 ray/tests/test_object_spilling.py::test_release_during_plasma_fetch[object_spilling_config0] ✓                                                                                                                                                                                                                                                                                                                                                                                             
 ray/tests/test_object_spilling.py::test_spill_objects_on_object_transfer[object_spilling_config0] ✓                                                                                                                                                                                                                                                                                                                                                                                        
 ray/tests/test_object_spilling.py::test_file_deleted_when_driver_exits ✓                                                                                                                                                                                                                                                                                                                                                                                                                  
Results (243.39s):
      21 passed

So I think that this is functionally ready to be merged, pending any design or style feedback.

@ericl
Copy link
Contributor

ericl commented Feb 10, 2021

OSX build failures

@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Feb 10, 2021
c_vector[CObjectID] contained_ids,
CObjectID *c_object_id, shared_ptr[CBuffer] *data)
CObjectID *c_object_id, shared_ptr[CBuffer] *data,
owner_address=*)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this =* mean?

Copy link
Contributor Author

@clarkzinzow clarkzinzow Feb 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's how you define optional arguments in Cython .pxd header files: https://cython.readthedocs.io/en/stable/src/userguide/language_basics.html#optional-arguments

RAY_LOG(DEBUG) << "Received AddSpilledUrl request for object " << object_id
<< ", which has been spilled to " << spilled_url << " on node "
<< node_id;
auto reference_exists = reference_counter_->HandleObjectSpilled(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

if (rpc_client == nullptr) {
RAY_LOG(WARNING) << "Object " << object_id << " does not have owner. "
<< "LookupLocations returns an empty list of locations.";
io_service_.post([callback, object_id]() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, is this post necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, due to this issue, further described in this thread.

const auto node_id_object_spilled =
is_external_storage_type_fs_ ? self_node_id_ : NodeID::Nil();

const auto unpin_callback = [this, object_id, object_url, callback,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just make it a private method?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like HandleObjectUnpinned or something

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason why? Defining these kind of callbacks as inline lambdas is definitely the most common pattern in Ray.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I am not sure if this kind of callback is common (use callback when we can have a private method). I prefer private method because if there’s some fault, stack trace will look better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good reason, I'll change this to a std::binded private method.

@clarkzinzow clarkzinzow force-pushed the core/feat/obod-object-spilling branch from e2d2c02 to 8d9fbbf Compare February 10, 2021 19:10
Copy link
Contributor

@rkooo567 rkooo567 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if MacOS build passes

@clarkzinzow clarkzinzow force-pushed the core/feat/obod-object-spilling branch from 8d9fbbf to 52b0e01 Compare February 10, 2021 19:36
@clarkzinzow clarkzinzow force-pushed the core/feat/obod-object-spilling branch from 52b0e01 to 5116209 Compare February 10, 2021 19:53
@clarkzinzow clarkzinzow removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Feb 10, 2021
@clarkzinzow
Copy link
Contributor Author

The failing MacOS tests are both tests that I've seen failing in recent master and other unrelated PR builds, I don't think that those failures should block the merging of this PR.

/// A callback to call when an object has been freed.
std::function<void(const std::vector<ObjectID> &)> on_objects_freed_;

// Objects that are pinned on this node.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update this comment to say why we also store the owner address?

Copy link
Contributor

@stephanie-wang stephanie-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@rkooo567 rkooo567 merged commit cd7e567 into ray-project:master Feb 11, 2021
fishbone pushed a commit to fishbone/ray that referenced this pull request Feb 16, 2021
…illing in the ownership-based object directory. (ray-project#13948)

* Add support for object spilling in the ownership-based object directory.

* Move owner address hashmap into pinned_objects_ and objects_pending_spill_.

* Update local object manager tests.

* Feedback and misc. fixes.

* Move spilled unpin callback lambda to std::binded private method.

* Skip test_delete_objects_multi_node test on MacOS for now.
fishbone added a commit to fishbone/ray that referenced this pull request Feb 16, 2021
…bject spilling in the ownership-based object directory. (ray-project#13948)"

This reverts commit e6b200b.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Object Spilling] Original object owner is not preserved on restore [Core] Ownership-based Object Directory - add support for object spilling

4 participants