-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RCORE-2174 Bootstrap store is not being reset if initial subscription bootstrap is interrupted by role change #7831
RCORE-2174 Bootstrap store is not being reset if initial subscription bootstrap is interrupted by role change #7831
Conversation
…when the session or connection is restarted without restarting the Sync Session
Pull Request Test Coverage Report for Build michael.wilkersonbarker_1190Details
💛 - Coveralls |
…ore into mwb/fix-reset-bootstrap-store
Pull Request Test Coverage Report for Build michael.wilkersonbarker_1192Details
💛 - Coveralls |
CHANGELOG.md
Outdated
@@ -6,7 +6,7 @@ | |||
|
|||
### Fixed | |||
* <How do the end-user experience this issue? what was the impact?> ([#????](https://github.com/realm/realm-core/issues/????), since v?.?.?) | |||
* None. | |||
* Fix pending bootstrap store was not applying a pending bootstrap or clearing a partial bootstrap when the session is restarted. ([#7827](https://github.com/realm/realm-core/issues/7827), since 14.8.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we instead say what this meant for the user?
@@ -869,8 +869,6 @@ void SessionImpl::process_pending_flx_bootstrap() | |||
if (!m_is_flx_sync_session || m_state != State::Active) { | |||
return; | |||
} | |||
// Should never be called if session is not active | |||
REALM_ASSERT_EX(m_state == SessionImpl::Active, m_state); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this assertion didn't really make sense
@@ -1524,6 +1524,8 @@ void Session::cancel_resumption_delay() | |||
|
|||
logger.debug("Resumed"); // Throws | |||
|
|||
process_pending_flx_bootstrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can throw, so it's not a good idea calling it outside a try catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd make more sense to process it in reset_protocol_state()
(and clean-up Session::activate())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put a try catch around the call here. I looked into moving this call into reset_protocol_state()
but this is also used to reset the state in the session's connection_lost()
function called when the connection is disconnected. I didn't want to change this behavior, so I left the calls as-is in activate()
and cancel_resumption_delay()
.
Let me know if you think I should still continue to look into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I saw that too and I personally think it's not an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I don't think we want to apply the bootstrap when the session is shutting down after being disconnected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do it anyways when the client reconnects, but I agree that it may be inconvenient for the user.
Pull Request Test Coverage Report for Build michael.wilkersonbarker_1195Details
💛 - Coveralls |
CHANGELOG.md
Outdated
@@ -6,7 +6,8 @@ | |||
|
|||
### Fixed | |||
* <How do the end-user experience this issue? what was the impact?> ([#????](https://github.com/realm/realm-core/issues/????), since v?.?.?) | |||
* None. | |||
* Fix data from a previous interrupted bootstrap was potentially being included with the bootstrap data during retry attempt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't this potentially cause diverging history?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not immediately, but manually removing the extra entries may cause a diverging history error when merged with the server, since it expects those to not be in the local realm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not immediately, but manually removing the extra entries may cause a diverging history error
Not sure what you mean.
After thinking a bit more, I think it can actually cause orphaned objects not diverging history.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, orphaned objects is what I was trying to say
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kmorkos correct me if i'm wrong, but I think this will just lead to compensating writes rather than diverging history.
But also, this changelog entry is super confusing. The audience for changelog entries are SDK engineers and end users who likely don't know about the pending bootstrap store. Perhaps something like
- If a sync session were interrupted by a disconnect while downloading a bootstrap more writes may have been made to the database than necessary when the sync session reconnected, and there may be objects stored that do not match the actual state of the server - potentially leading to compensating writes.
Also, I don't think this started in 14.8.0 - I think this started in v12.0.0 https://github.com/realm/realm-core/blob/master/CHANGELOG.md#1200-release-notes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbreams I think we can refer to those objects as orphaned objects as we do in other places, otherwise I agree with your suggestion for changelog entry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the bug is that the client may end up with objects that the server doesn't think it has (if the object was in the new query's view during the first attempt, but then moved out that query's view before the second attempt).
From there, one of three things could happen:
- The object moves back into the client's query view at some point in the future, and we are eventually consistent™️
- The object never moves back into the client's query view, and the client just holds on to this stale view of an object it was never supposed to have indefinitely
- The client tries modifying the object at some point, at which point they'll get a compensating write because the server interprets it as modifying an object outside of their query view
I think for all intents and purposes @jbreams' description is more accurate than referring to them as "orphaned objects" unless that terminology is used elsewhere to refer to the above scenario
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reworded the changelog entry based on Jonathan's recommendation - hopefully it is clearer now.
CHANGELOG.md
Outdated
@@ -6,7 +6,8 @@ | |||
|
|||
### Fixed | |||
* <How do the end-user experience this issue? what was the impact?> ([#????](https://github.com/realm/realm-core/issues/????), since v?.?.?) | |||
* None. | |||
* Fix data from a previous interrupted bootstrap was potentially being included with the bootstrap data during retry attempt | |||
and complete bootstraps were potentially not being applied if the session restarted once fully downloaded. ([#7827](https://github.com/realm/realm-core/issues/7827), since 14.8.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the data would be re-downloaded right?
CHANGELOG.md
Outdated
@@ -6,7 +6,8 @@ | |||
|
|||
### Fixed | |||
* <How do the end-user experience this issue? what was the impact?> ([#????](https://github.com/realm/realm-core/issues/????), since v?.?.?) | |||
* None. | |||
* Fix data from a previous interrupted bootstrap was potentially being included with the bootstrap data during retry attempt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kmorkos correct me if i'm wrong, but I think this will just lead to compensating writes rather than diverging history.
But also, this changelog entry is super confusing. The audience for changelog entries are SDK engineers and end users who likely don't know about the pending bootstrap store. Perhaps something like
- If a sync session were interrupted by a disconnect while downloading a bootstrap more writes may have been made to the database than necessary when the sync session reconnected, and there may be objects stored that do not match the actual state of the server - potentially leading to compensating writes.
Also, I don't think this started in 14.8.0 - I think this started in v12.0.0 https://github.com/realm/realm-core/blob/master/CHANGELOG.md#1200-release-notes.
process_pending_flx_bootstrap(); // throws | ||
} | ||
catch (const IntegrationException& error) { | ||
on_integration_failure(error); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if we have an integration failure here? I guess the client will just continue to resume the session but without applying the bootstrap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - it's the same if there is an integration failure during activate()
m_client_error = util::none; | ||
|
||
m_upload_progress = m_progress.upload; | ||
m_enlisted_to_send = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did this all change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted - I had reformatted the function when I was playing around with moving the bootstrap apply to this funciton.
…ore into mwb/fix-reset-bootstrap-store
Pull Request Test Coverage Report for Build michael.wilkersonbarker_1201Details
💛 - Coveralls |
…ore into mwb/fix-reset-bootstrap-store
Pull Request Test Coverage Report for Build michael.wilkersonbarker_1205Details
💛 - Coveralls |
…ore into mwb/fix-reset-bootstrap-store
Pull Request Test Coverage Report for Build michael.wilkersonbarker_1221Details
💛 - Coveralls |
…ore into mwb/fix-reset-bootstrap-store
Pull Request Test Coverage Report for Build michael.wilkersonbarker_1224Details
💛 - Coveralls |
…ore into mwb/fix-reset-bootstrap-store
CHANGELOG.md
Outdated
@@ -10,6 +10,10 @@ | |||
* Fixed removing backlinks from the wrong objects if the link came from a nested list, nested dictionary, top-level dictionary, or list of mixed, and the source table had more than 256 objects. This could manifest as `array_backlink.cpp:112: Assertion failed: int64_t(value >> 1) == key.value` when removing an object. ([#7594](https://github.com/realm/realm-core/issues/7594), since v11 for dictionaries) | |||
* Fixed the collapse/rejoin of clusters which contained nested collections with links. This could manifest as `array.cpp:319: Array::move() Assertion failed: begin <= end [2, 1]` when removing an object. ([#7839](https://github.com/realm/realm-core/issues/7839), since the introduction of nested collections in v14.0.0-beta.0) | |||
* wait_for_upload_completion() was inconsistent in how it handled commits which did not produce any changesets to upload. Previously it would sometimes complete immediately if all commits waiting to be uploaded were empty, and at other times it would wait for a server roundtrip. It will now always complete immediately. ([PR #7796](https://github.com/realm/realm-core/pull/7796)). | |||
* If a sync session is interrupted by a disconnect or restart while downloading a bootstrap, stale data from the previous |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did this get split up onto multiple lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was manually word wrapped, but reverted change.
* Moved role change tests to separate test file * Fixed building of new flx_role_change.cpp file * Added local changes w/role bootstrap test - fixed exception in subscription store during server initiated boostrap * Updated local change test to include valid offline writes during role change * Added role change test during initial schema bootstrap * Wrapped up role change during bootstrap tests * Removed debug statments to fix thread sanitizer * Updated sub state comments and reverted a minor change * Refactored role change tests and broke out into 2 separate test cases * Moved harness from a global to a static var in each test case * Reverted resetting the bootstrapping subscription state back to Pending * Updated baas to use protocol v14 and removed the feature flag for role change bootstraps * Removed left over code in statement... * Updated baasaas version to be a cached version * Updated baasaas githash and reordered role change during bootstrap to check for role change bootstrap as first validation step * Minor updates to reuse the verify_records() fcn * RCORE-2174 Bootstrap store is not being reset if initial subscription bootstrap is interrupted by role change (#7831) * Updated pending bootstrap store to be processed (applied or cleared) when the session or connection is restarted without restarting the Sync Session
* RCORE-1872 Sync client should allow server bootstrapping at any time (#7440) * First round of changes for server-initiated bootstraps * Added test for role change bootstraps * Updated test for handle role bootstraps * Updated baas/baasaas to use branch with fixes * Updated test to verify bootstrap actually occurred * Fixed tsan warning * Updates from review; added comments to clarify bootstrap detection logic * Reverted baas branch to master and protocol version to 12 * Added comments to changes needed when merging to master; update baas version to not use master * Pulled over changes from other branch and tweaking download params * Refactored tests to validate different bootstrap types * Updated tests to get passing using the server params * Updated to support new batch_state protocol changes; updated tests * Updated role change tests and merged test from separate PR * Fixed issue with flx query verion 0 not being treated as a bootstrap * Cleaned up the tests a bit and reworked query version 0 handling * Updates from review; updated batch_state for schema bootstraps * Removed extra mutex in favor of state machine's mutex * Increased timeout when waiting for app initial sync to complete * Updated role change test to use test commands * Update resume and ident message handling * Updated future waits for the pause/resume test command * Added session connected event for when session multiplexing is disabled * Added wait_until() to state machine to wait for callback; updated role change test * RCORE-1973 Add role/permissions tests for new bootstrap feature (#7675) * Moved role change tests to separate test file * Fixed building of new flx_role_change.cpp file * Added local changes w/role bootstrap test - fixed exception in subscription store during server initiated boostrap * Updated local change test to include valid offline writes during role change * Added role change test during initial schema bootstrap * Wrapped up role change during bootstrap tests * Removed debug statments to fix thread sanitizer * Updated sub state comments and reverted a minor change * Refactored role change tests and broke out into 2 separate test cases * Moved harness from a global to a static var in each test case * Reverted resetting the bootstrapping subscription state back to Pending * Updated baas to use protocol v14 and removed the feature flag for role change bootstraps * Updated baasaas version to be a cached version * Updated baasaas githash and reordered role change during bootstrap to check for role change bootstrap as first validation step * Minor updates to reuse the verify_records() fcn * RCORE-2174 Bootstrap store is not being reset if initial subscription bootstrap is interrupted by role change (#7831) * Updated pending bootstrap store to be processed (applied or cleared) when the session or connection is restarted without restarting the Sync Session * RCORE-1974 Add tests for role/permissions changed during client reset (#7840) * re-applied changes after base branch was merged to feature branch * Updates to address test failures * Disable role change check during fresh realm download * Updated comments for clarity * Updates from review; added a bunch of comments to test * Updates to role change tests per review comments * removed ostream support for SyncClientHookEvent
What, How & Why?
It was discovered during testing of role changes while the realm is currently downloading a bootstrap, that the pending bootstrap store is not being reset when the session reconnects after the 200 error is received from the server due to the role change.
This issue is likely related to #7707 where the bootstrap is not being applied if it has been completely downloaded when the connection (and possibly session) are restarted.
Fixes #7827
☑️ ToDos
[ ] C-API, if public C++ API changed[ ]bindgen/spec.yml
, if public C++ API changed