-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wait for sessions to drain during sync client shutdown #6293
Conversation
@@ -1264,6 +1266,12 @@ inline void ClientImpl::Connection::one_less_active_unsuspended_session() | |||
{ | |||
if (--m_num_active_unsuspended_sessions != 0) | |||
return; | |||
|
|||
if (m_force_closed) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the check also include m_state != ConnectionState::disconnected
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call.
If this goes into the current release then it doesn't need a changelog entry since the assertions its fixing haven't been released yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - I didn't see the new debug statements in any of the test runs I checked, but the logic looks good.
There is a failure in CI: I think it should not be related to this change. |
The test failures were actually relevant. I've gone and re-worked some things and added more asserts. I've run the all the realm-sync-tests locally 1000 times with the latest changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes look good - just a few minor questions/comments
@@ -544,6 +568,11 @@ void Connection::initiate_reconnect_wait() | |||
REALM_ASSERT(!m_reconnect_delay_in_progress); | |||
REALM_ASSERT(!m_disconnect_delay_in_progress); | |||
|
|||
// If we've been force closed then we don't need/want to reconnect. Just return early here. | |||
if (m_force_closed) { | |||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also ensure the m_reconnect_disconnect_timer
timer is stopped (e.g. m_reconnect_disconnect_timer.reset()
), or maybe that should still happen in Connection::force_close()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I've also added some assertions to make sure that we end up disconnected at the end of voluntary_disconnect and that we're in the correct delay mode if we are disconnected.
{ | ||
std::lock_guard lock{m_mutex}; | ||
m_actualize_and_finalize_needed = false; | ||
swap(m_unactualized_session_wrappers, unactualized_session_wrappers); | ||
swap(m_abandoned_session_wrappers, abandoned_session_wrappers); | ||
stopped = m_stopped; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need this since stopped is now an atomic? You can read m_stopped
directly on line 602 and avoid a tiny window where m_stopped
might be changed between here and there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided not to make m_stopped atomic after all. I think since we're threading force_closed states down into sessions now, we can just use that to efficiently check if it's safe to call callbacks etc rather than introducing an atomic bool.
src/realm/sync/client.cpp
Outdated
@@ -491,24 +495,17 @@ void ClientImpl::drain_connections_on_loop() | |||
{ | |||
post([this](Status status) mutable { | |||
REALM_ASSERT(status.is_ok()); | |||
actualize_and_finalize_session_wrappers(); | |||
drain_connections(); | |||
}); | |||
} | |||
|
|||
void ClientImpl::drain() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about renaming Client::drain()
to something like Client::shutdown()
or Client::shutdown_and_wait()
so the name is a bit more descriptive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! I was actually thinking the same thing but just hadn't typed it up yet.
8212db9
to
7cfef56
Compare
src/realm/sync/client.cpp
Outdated
@@ -491,24 +495,17 @@ void ClientImpl::drain_connections_on_loop() | |||
{ | |||
post([this](Status status) mutable { | |||
REALM_ASSERT(status.is_ok()); | |||
actualize_and_finalize_session_wrappers(); | |||
drain_connections(); | |||
}); | |||
} | |||
|
|||
void ClientImpl::drain() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! I was actually thinking the same thing but just hadn't typed it up yet.
{ | ||
std::lock_guard lock{m_mutex}; | ||
m_actualize_and_finalize_needed = false; | ||
swap(m_unactualized_session_wrappers, unactualized_session_wrappers); | ||
swap(m_abandoned_session_wrappers, abandoned_session_wrappers); | ||
stopped = m_stopped; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided not to make m_stopped atomic after all. I think since we're threading force_closed states down into sessions now, we can just use that to efficiently check if it's safe to call callbacks etc rather than introducing an atomic bool.
@@ -566,7 +566,9 @@ class MultiClientServerFixture { | |||
{ | |||
unit_test::TestContext& test_context = m_test_context; | |||
stop(); | |||
m_clients.clear(); | |||
for (int i = 0; i < m_num_clients; ++i) { | |||
m_clients[i]->shutdown_and_wait(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it turns out there can be callbacks that may fire during the call to shutdown_and_wait() and that if we clear this array than the unique_ptr in m_clients will be null and callbacks will segfault. So we need to defer destroying the Clients until they are actually all shutdown.
m_conn.one_less_active_unsuspended_session(); // Throws | ||
on_suspended(SessionErrorInfo{error.code(), false}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was actually a bug where we'd end up making the number of active unsuspended sessions underflow.
@@ -1262,8 +1266,10 @@ inline void ClientImpl::Connection::one_more_active_unsuspended_session() | |||
|
|||
inline void ClientImpl::Connection::one_less_active_unsuspended_session() | |||
{ | |||
REALM_ASSERT(m_num_active_unsuspended_sessions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added these assertions wherever we -- a variable because there was at least one case where we were underflowing and making this value uint64_t(-1).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like having the assertions there - it will help with tracking down any issues in the future.
@@ -544,6 +568,11 @@ void Connection::initiate_reconnect_wait() | |||
REALM_ASSERT(!m_reconnect_delay_in_progress); | |||
REALM_ASSERT(!m_disconnect_delay_in_progress); | |||
|
|||
// If we've been force closed then we don't need/want to reconnect. Just return early here. | |||
if (m_force_closed) { | |||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I've also added some assertions to make sure that we end up disconnected at the end of voluntary_disconnect and that we're in the correct delay mode if we are disconnected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - I walked through a couple of shutdown scenarios and I couldn't come up with any possible issues that aren't already handled. One area that I thought might me an issue was around the outstanding_posts, but the timers always run the handler when canceled and the code waits for all the posts to complete.
@@ -1262,8 +1266,10 @@ inline void ClientImpl::Connection::one_more_active_unsuspended_session() | |||
|
|||
inline void ClientImpl::Connection::one_less_active_unsuspended_session() | |||
{ | |||
REALM_ASSERT(m_num_active_unsuspended_sessions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like having the assertions there - it will help with tracking down any issues in the future.
@danieltabacaru , do you want to weigh in here? |
Yes, I'll have a look this evening. |
std::vector<Session*> to_close; | ||
for (auto& session_pair : m_sessions) { | ||
if (session_pair.second->m_state == Session::State::Active) { | ||
to_close.push_back(session_pair.second.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why not closing the sessions directly here?
Because closing the session may remove the session from m_sessions and
invalidate the iterators used to go through the list of sessions. So I copy
the session pointers to close to a secondary vectors to ensure the
iterators remain valid all the way through the loop.
…On Thu, Feb 16, 2023 at 4:43 PM Daniel Tabacaru ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/realm/sync/noinst/client_impl_base.cpp
<#6293 (comment)>:
> }
- REALM_ASSERT(m_num_active_unsuspended_sessions == 0);
- REALM_ASSERT(m_num_active_sessions == 0);
- if (m_state == ConnectionState::disconnected) {
- return;
+ std::vector<Session*> to_close;
+ for (auto& session_pair : m_sessions) {
+ if (session_pair.second->m_state == Session::State::Active) {
+ to_close.push_back(session_pair.second.get());
nit: why not closing the sessions directly here?
—
Reply to this email directly, view it on GitHub
<#6293 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABOVUMZK6L34K3MMYM4QWLWX2NRVANCNFSM6AAAAAAUXJJ4UI>
.
You are receiving this because you were assigned.Message ID:
***@***.***>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What, How & Why?
There's a race when force closing a connection where sessions may still be getting finalized. This makes it so that if there are still any active sessions on a connection, force_close() makes it so that the connection will close as soon as the last session has ended and will skip all the linger period timing.
☑️ ToDos