-
Notifications
You must be signed in to change notification settings - Fork 7k
[xray] Implements ray.wait #2162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
0e18ca7
fc2572c
2e8af60
b57a548
a128698
fa5c32d
0ccf46b
b02de4f
f9a9e16
15b7f61
a22263b
8ab41f0
98bacfa
d518a89
53f33e0
8ef35f7
6e10f9e
9a95c65
aa12bd7
9e1602d
304b39c
5d63bb3
cf1fdb2
531d024
d0d3ea4
62ae832
67eef67
0796a17
dd9f0db
9d4ed2b
541b88c
58af739
d9ef29b
fa1928b
d41b1d0
aeaab5b
048f45f
0aa7525
833939f
8e1947c
83d04dd
080282f
a58f5c9
c6d8ba5
7d8d756
6b6e2f3
3a86c93
1a99f25
00eafd7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -359,7 +359,19 @@ ray::Status ObjectManager::Wait(const std::vector<ObjectID> &object_ids, | |
| int64_t timeout_ms, uint64_t num_required_objects, | ||
| bool wait_local, const WaitCallback &callback) { | ||
| UniqueID wait_id = UniqueID::from_random(); | ||
| RAY_RETURN_NOT_OK(AddWaitRequest(wait_id, object_ids, timeout_ms, num_required_objects, | ||
| wait_local, callback)); | ||
| RAY_RETURN_NOT_OK(LookupRemainingWaitObjects(wait_id)); | ||
| // LookupRemainingWaitObjects invokes SubscribeRemainingWaitObjects once lookup has | ||
| // been performed on all remaining objects. | ||
| return ray::Status::OK(); | ||
| } | ||
|
|
||
| ray::Status ObjectManager::AddWaitRequest(const UniqueID &wait_id, | ||
| const std::vector<ObjectID> &object_ids, | ||
| int64_t timeout_ms, | ||
| uint64_t num_required_objects, bool wait_local, | ||
| const WaitCallback &callback) { | ||
| if (wait_local) { | ||
| return ray::Status::NotImplemented("Wait for local objects is not yet implemented."); | ||
| } | ||
|
|
@@ -385,6 +397,12 @@ ray::Status ObjectManager::Wait(const std::vector<ObjectID> &object_ids, | |
| } | ||
| } | ||
|
|
||
| return ray::Status::OK(); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If this method always returns
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It actually also returns |
||
| } | ||
|
|
||
| ray::Status ObjectManager::LookupRemainingWaitObjects(const UniqueID &wait_id) { | ||
| auto &wait_state = active_wait_requests_.find(wait_id)->second; | ||
|
|
||
| if (wait_state.remaining.empty()) { | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if we're going to include this special case here, then the condition should probably be something like That said, I think it makes sense to handle this case after firing off the
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We invoke the lookup calls immediately after checking which objects are local to obtain current information about the location of remote objects. This allows us to collect information about all given objects, regardless of their location. If we invoke when If we fire
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add a comment to this |
||
| WaitComplete(wait_id); | ||
| } else { | ||
|
|
@@ -396,7 +414,7 @@ ray::Status ObjectManager::Wait(const std::vector<ObjectID> &object_ids, | |
| for (const auto &object_id : wait_state.remaining) { | ||
| // Lookup remaining objects. | ||
| wait_state.requested_objects.insert(object_id); | ||
| RAY_CHECK_OK(object_directory_->LookupLocations( | ||
| RAY_RETURN_NOT_OK(object_directory_->LookupLocations( | ||
| object_id, [this, wait_id](const std::vector<ClientID> &client_ids, | ||
| const ObjectID &lookup_object_id) { | ||
| auto &wait_state = active_wait_requests_.find(wait_id)->second; | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. couldn't
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The only way to progress to a point where |
||
|
|
@@ -406,40 +424,47 @@ ray::Status ObjectManager::Wait(const std::vector<ObjectID> &object_ids, | |
| } | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What would it mean if
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It may mean that the object has been evicted. It may also mean that no entries exist for the object. In both cases, after the round of lookups, if we need to wait for more objects, |
||
| wait_state.requested_objects.erase(lookup_object_id); | ||
| if (wait_state.requested_objects.empty()) { | ||
| AllWaitLookupsComplete(wait_id); | ||
| SubscribeRemainingWaitObjects(wait_id); | ||
| } | ||
| })); | ||
| } | ||
| } | ||
| return ray::Status::OK(); | ||
| } | ||
|
|
||
| void ObjectManager::AllWaitLookupsComplete(const UniqueID &wait_id) { | ||
| void ObjectManager::SubscribeRemainingWaitObjects(const UniqueID &wait_id) { | ||
| auto &wait_state = active_wait_requests_.find(wait_id)->second; | ||
| if (wait_state.found.size() >= wait_state.num_required_objects || | ||
| wait_state.timeout_ms == 0) { | ||
| // Requirements already satisfied. | ||
| WaitComplete(wait_id); | ||
| } else { | ||
| // Subscribe to objects in order to ensure Wait-related tests are deterministic. | ||
| // Wait may complete during the execution of any one of the following calls to | ||
| // SubscribeObjectLocations, so copy the object ids that need to be iterated over. | ||
| // Order matters for test purposes. | ||
| std::vector<ObjectID> ordered_remaining_object_ids; | ||
| for (const auto &object_id : wait_state.object_id_order) { | ||
| if (wait_state.remaining.count(object_id) == 0) { | ||
| continue; | ||
| if (wait_state.remaining.count(object_id) > 0) { | ||
| ordered_remaining_object_ids.push_back(object_id); | ||
| } | ||
| // Subscribe to object notifications. | ||
| } | ||
| for (const auto &object_id : ordered_remaining_object_ids) { | ||
| if (active_wait_requests_.find(wait_id) == active_wait_requests_.end()) { | ||
| // This is possible if an object's location is obtained immediately, | ||
| // within the current callstack. In this case, WaitComplete has been | ||
| // invoked already, so we're done. | ||
| return; | ||
| } | ||
| wait_state.requested_objects.insert(object_id); | ||
| // Subscribe to object notifications. | ||
| RAY_CHECK_OK(object_directory_->SubscribeObjectLocations( | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The lookup calls return immediately, regardless of whether there's a known location of the object. After the lookup calls, if the conditions of the wait call are still not satisfied, we start the timer and subscribe to each remaining object. The subscription callback will be invoked only when an object's location is found. We wait until the minimum number of objects is found or the timeout is triggered. |
||
| wait_id, object_id, [this, wait_id](const std::vector<ClientID> &client_ids, | ||
| const ObjectID &subscribe_object_id) { | ||
| auto object_id_wait_state = active_wait_requests_.find(wait_id); | ||
| // We never expect to handle a subscription notification for a wait that has | ||
| // already completed. | ||
| RAY_CHECK(object_id_wait_state != active_wait_requests_.end()); | ||
| auto &wait_state = active_wait_requests_.find(wait_id)->second; | ||
| auto &wait_state = object_id_wait_state->second; | ||
| RAY_CHECK(wait_state.remaining.erase(subscribe_object_id)); | ||
| wait_state.found.insert(subscribe_object_id); | ||
| wait_state.requested_objects.erase(subscribe_object_id); | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -71,17 +71,16 @@ class MockServer { | |
| } | ||
|
|
||
| friend class TestObjectManager; | ||
| friend class TestObjectManagerCommands; | ||
|
|
||
| boost::asio::ip::tcp::acceptor object_manager_acceptor_; | ||
| boost::asio::ip::tcp::socket object_manager_socket_; | ||
| std::shared_ptr<gcs::AsyncGcsClient> gcs_client_; | ||
| ObjectManager object_manager_; | ||
| }; | ||
|
|
||
| class TestObjectManager : public ::testing::Test { | ||
| class TestObjectManagerBase : public ::testing::Test { | ||
| public: | ||
| TestObjectManager() {} | ||
| TestObjectManagerBase() {} | ||
|
|
||
| std::string StartStore(const std::string &id) { | ||
| std::string store_id = "/tmp/store"; | ||
|
|
@@ -176,10 +175,6 @@ class TestObjectManager : public ::testing::Test { | |
|
|
||
| void object_added_handler_2(ObjectID object_id) { v2.push_back(object_id); }; | ||
|
|
||
| ObjectDirectoryInterface &get_object_directory(const MockServer &server) { | ||
| return *server.object_manager_.object_directory_; | ||
| } | ||
|
|
||
| protected: | ||
| std::thread p; | ||
| boost::asio::io_service main_service; | ||
|
|
@@ -199,7 +194,7 @@ class TestObjectManager : public ::testing::Test { | |
| uint push_timeout_ms; | ||
| }; | ||
|
|
||
| class TestObjectManagerCommands : public TestObjectManager { | ||
| class TestObjectManager : public TestObjectManagerBase { | ||
| public: | ||
| int current_wait_test = -1; | ||
| int num_connected_clients = 0; | ||
|
|
@@ -273,35 +268,38 @@ class TestObjectManagerCommands : public TestObjectManager { | |
| uint num_expected_objects1 = 1; | ||
| uint num_expected_objects2 = 2; | ||
| if (v1.size() == num_expected_objects1 && v2.size() == num_expected_objects2) { | ||
| TestWaitCallbacks1(); | ||
| SubscribeObjectThenWait(); | ||
| } | ||
| } | ||
|
|
||
| void TestWaitCallbacks1() { | ||
| void SubscribeObjectThenWait() { | ||
| int data_size = 100; | ||
| // Test to ensure Wait works properly during an active subscription to the same | ||
| // object. | ||
| ObjectID object_1 = WriteDataToClient(client2, data_size); | ||
| ObjectID object_2 = WriteDataToClient(client2, data_size); | ||
| UniqueID sub_id = ray::ObjectID::from_random(); | ||
|
|
||
| RAY_CHECK_OK(get_object_directory(*server1).SubscribeObjectLocations( | ||
| RAY_CHECK_OK(server1->object_manager_.object_directory_->SubscribeObjectLocations( | ||
| sub_id, object_1, | ||
| [this, sub_id, object_1, object_2](const std::vector<ray::ClientID> &, | ||
| const ray::ObjectID &object_id) { | ||
| TestWaitCallbacks2(sub_id, object_1, object_2); | ||
| TestWaitWhileSubscribed(sub_id, object_1, object_2); | ||
| })); | ||
| } | ||
|
|
||
| void TestWaitCallbacks2(UniqueID sub_id, ObjectID object_1, ObjectID object_2) { | ||
| void TestWaitWhileSubscribed(UniqueID sub_id, ObjectID object_1, ObjectID object_2) { | ||
| int num_objects = 2; | ||
| int required_objects = 1; | ||
| int timeout_ms = 1000; | ||
|
|
||
| std::vector<ObjectID> object_ids = {object_1, object_2}; | ||
| boost::posix_time::ptime start_time = boost::posix_time::second_clock::local_time(); | ||
| RAY_CHECK_OK(server1->object_manager_.Wait( | ||
| object_ids, timeout_ms, required_objects, false, | ||
|
|
||
| UniqueID wait_id = UniqueID::from_random(); | ||
|
|
||
| RAY_CHECK_OK(server1->object_manager_.AddWaitRequest( | ||
| wait_id, object_ids, timeout_ms, required_objects, false, | ||
| [this, sub_id, object_1, object_ids, num_objects, start_time]( | ||
| const std::vector<ray::ObjectID> &found, | ||
| const std::vector<ray::ObjectID> &remaining) { | ||
|
|
@@ -313,10 +311,14 @@ class TestObjectManagerCommands : public TestObjectManager { | |
| RAY_CHECK(found.size() == 1); | ||
| // There's nothing more to test. A check will fail if unexpected behavior is | ||
| // triggered. | ||
| RAY_CHECK_OK(get_object_directory(*server1).UnsubscribeObjectLocations( | ||
| sub_id, object_1)); | ||
| RAY_CHECK_OK( | ||
| server1->object_manager_.object_directory_->UnsubscribeObjectLocations( | ||
| sub_id, object_1)); | ||
| NextWaitTest(); | ||
| })); | ||
|
|
||
| // Skip lookups and rely on Subscribe only to test subscribe interaction. | ||
| server1->object_manager_.SubscribeRemainingWaitObjects(wait_id); | ||
| } | ||
|
|
||
| void NextWaitTest() { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we separate each of the test cases into a separate unit test, and also use the gtest framework to be consistent with other raylet tests? Also, please document what case each of these is supposed to be testing.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, excellent point. I have a branch I'd like to merge before separating each test into a separate unit test as this is a somewhat substantial change. Agree and will do the rest. |
||
|
|
@@ -400,7 +402,8 @@ class TestObjectManagerCommands : public TestObjectManager { | |
|
|
||
| switch (current_wait_test) { | ||
| case 0: { | ||
| // Ensure timeout_ms = 0 returns expected number of found / remaining objects. | ||
| // Ensure timeout_ms = 0 returns expected number of found and remaining | ||
| // objects. | ||
| ASSERT_TRUE(found.size() <= required_objects); | ||
| ASSERT_TRUE(static_cast<int>(found.size() + remaining.size()) == num_objects); | ||
| NextWaitTest(); | ||
|
|
@@ -454,7 +457,7 @@ class TestObjectManagerCommands : public TestObjectManager { | |
| } | ||
| }; | ||
|
|
||
| TEST_F(TestObjectManagerCommands, StartTestObjectManagerCommands) { | ||
| TEST_F(TestObjectManager, StartTestObjectManager) { | ||
| auto AsyncStartTests = main_service.wrap([this]() { WaitConnections(); }); | ||
| AsyncStartTests(); | ||
| main_service.run(); | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've had trouble with
from_randombeing too expensive in the past.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it was a problem for
Pull, when we were polling for locations with a low timeout. It's called once for each call toWait, so I think it is okay. Is there an alternative tofrom_random? How are unique ObjectIDs created?