Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RCORE-2174 Bootstrap store is not being reset if initial subscription bootstrap is interrupted by role change #7831

Merged
merged 10 commits into from
Jul 1, 2024

Conversation

michael-wb
Copy link
Contributor

What, How & Why?

It was discovered during testing of role changes while the realm is currently downloading a bootstrap, that the pending bootstrap store is not being reset when the session reconnects after the 200 error is received from the server due to the role change.
This issue is likely related to #7707 where the bootstrap is not being applied if it has been completely downloaded when the connection (and possibly session) are restarted.

Fixes #7827

☑️ ToDos

  • 📝 Changelog update
  • 🚦 Tests (or not relevant)
  • [ ] C-API, if public C++ API changed
  • [ ] bindgen/spec.yml, if public C++ API changed

…when the session or connection is restarted without restarting the Sync Session
@michael-wb michael-wb requested a review from jbreams June 21, 2024 18:43
@cla-bot cla-bot bot added the cla: yes label Jun 21, 2024
@michael-wb michael-wb self-assigned this Jun 21, 2024
@michael-wb michael-wb changed the title RCORE-2174 Bootstrap store is not being reset if initial subscription boostrap is interrupted by role change RCORE-2174 Bootstrap store is not being reset if initial subscription bootstrap is interrupted by role change Jun 21, 2024
Copy link

coveralls-official bot commented Jun 21, 2024

Pull Request Test Coverage Report for Build michael.wilkersonbarker_1190

Details

  • 2 of 2 (100.0%) changed or added relevant lines in 2 files are covered.
  • 70 unchanged lines in 14 files lost coverage.
  • Overall coverage increased (+0.005%) to 90.981%

Files with Coverage Reduction New Missed Lines %
src/realm/util/serializer.cpp 1 90.43%
test/test_index_string.cpp 1 93.48%
test/test_all.cpp 2 75.82%
src/realm/sync/noinst/protocol_codec.hpp 3 73.47%
src/realm/sync/noinst/server/server.cpp 3 73.71%
src/realm/sync/noinst/server/server_history.cpp 3 63.51%
src/realm/sync/noinst/pending_bootstrap_store.cpp 4 94.47%
test/fuzz_group.cpp 4 51.67%
test/fuzz_tester.hpp 4 57.32%
test/object-store/util/sync/baas_admin_api.cpp 5 83.83%
Totals Coverage Status
Change from base Build michael.wilkersonbarker_1189: 0.005%
Covered Lines: 215333
Relevant Lines: 236680

💛 - Coveralls

Copy link

coveralls-official bot commented Jun 25, 2024

Pull Request Test Coverage Report for Build michael.wilkersonbarker_1192

Details

  • 2 of 2 (100.0%) changed or added relevant lines in 2 files are covered.
  • 32 unchanged lines in 8 files lost coverage.
  • Overall coverage increased (+0.02%) to 90.994%

Files with Coverage Reduction New Missed Lines %
src/realm/object-store/shared_realm.cpp 2 91.89%
src/realm/sync/network/http.hpp 2 82.27%
src/realm/sync/transform.cpp 2 60.73%
test/fuzz_group.cpp 3 48.8%
src/realm/sync/instruction_applier.cpp 4 68.26%
src/realm/sync/noinst/pending_bootstrap_store.cpp 4 94.47%
src/realm/sync/noinst/client_impl_base.cpp 7 82.8%
src/realm/index_string.cpp 8 84.63%
Totals Coverage Status
Change from base Build michael.wilkersonbarker_1191: 0.02%
Covered Lines: 215350
Relevant Lines: 236664

💛 - Coveralls

CHANGELOG.md Outdated
@@ -6,7 +6,7 @@

### Fixed
* <How do the end-user experience this issue? what was the impact?> ([#????](https://github.com/realm/realm-core/issues/????), since v?.?.?)
* None.
* Fix pending bootstrap store was not applying a pending bootstrap or clearing a partial bootstrap when the session is restarted. ([#7827](https://github.com/realm/realm-core/issues/7827), since 14.8.0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we instead say what this meant for the user?

@@ -869,8 +869,6 @@ void SessionImpl::process_pending_flx_bootstrap()
if (!m_is_flx_sync_session || m_state != State::Active) {
return;
}
// Should never be called if session is not active
REALM_ASSERT_EX(m_state == SessionImpl::Active, m_state);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this assertion didn't really make sense

@@ -1524,6 +1524,8 @@ void Session::cancel_resumption_delay()

logger.debug("Resumed"); // Throws

process_pending_flx_bootstrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can throw, so it's not a good idea calling it outside a try catch.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd make more sense to process it in reset_protocol_state() (and clean-up Session::activate())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put a try catch around the call here. I looked into moving this call into reset_protocol_state() but this is also used to reset the state in the session's connection_lost() function called when the connection is disconnected. I didn't want to change this behavior, so I left the calls as-is in activate() and cancel_resumption_delay().

Let me know if you think I should still continue to look into this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I saw that too and I personally think it's not an issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I don't think we want to apply the bootstrap when the session is shutting down after being disconnected.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do it anyways when the client reconnects, but I agree that it may be inconvenient for the user.

test/object-store/sync/flx_role_change.cpp Show resolved Hide resolved
Copy link

coveralls-official bot commented Jun 26, 2024

Pull Request Test Coverage Report for Build michael.wilkersonbarker_1195

Details

  • 18 of 22 (81.82%) changed or added relevant lines in 4 files are covered.
  • 57 unchanged lines in 12 files lost coverage.
  • Overall coverage increased (+0.001%) to 90.974%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/realm/sync/noinst/client_impl_base.cpp 6 10 60.0%
Files with Coverage Reduction New Missed Lines %
src/realm/sync/noinst/server/server_history.cpp 1 63.7%
src/realm/object-store/shared_realm.cpp 2 91.89%
src/realm/query_expression.cpp 2 86.62%
src/realm/sync/client.cpp 3 91.95%
src/realm/sync/instruction_applier.cpp 4 68.01%
src/realm/sync/noinst/pending_bootstrap_store.cpp 4 94.47%
test/fuzz_group.cpp 5 46.89%
src/realm/bplustree.cpp 6 72.55%
src/realm/sync/noinst/server/server.cpp 6 73.78%
src/realm/sync/noinst/client_impl_base.cpp 7 82.67%
Totals Coverage Status
Change from base Build michael.wilkersonbarker_1191: 0.001%
Covered Lines: 215310
Relevant Lines: 236672

💛 - Coveralls

CHANGELOG.md Outdated
@@ -6,7 +6,8 @@

### Fixed
* <How do the end-user experience this issue? what was the impact?> ([#????](https://github.com/realm/realm-core/issues/????), since v?.?.?)
* None.
* Fix data from a previous interrupted bootstrap was potentially being included with the bootstrap data during retry attempt
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this potentially cause diverging history?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not immediately, but manually removing the extra entries may cause a diverging history error when merged with the server, since it expects those to not be in the local realm.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not immediately, but manually removing the extra entries may cause a diverging history error

Not sure what you mean.

After thinking a bit more, I think it can actually cause orphaned objects not diverging history.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, orphaned objects is what I was trying to say

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kmorkos correct me if i'm wrong, but I think this will just lead to compensating writes rather than diverging history.

But also, this changelog entry is super confusing. The audience for changelog entries are SDK engineers and end users who likely don't know about the pending bootstrap store. Perhaps something like

  • If a sync session were interrupted by a disconnect while downloading a bootstrap more writes may have been made to the database than necessary when the sync session reconnected, and there may be objects stored that do not match the actual state of the server - potentially leading to compensating writes.

Also, I don't think this started in 14.8.0 - I think this started in v12.0.0 https://github.com/realm/realm-core/blob/master/CHANGELOG.md#1200-release-notes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbreams I think we can refer to those objects as orphaned objects as we do in other places, otherwise I agree with your suggestion for changelog entry.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the bug is that the client may end up with objects that the server doesn't think it has (if the object was in the new query's view during the first attempt, but then moved out that query's view before the second attempt).

From there, one of three things could happen:

  • The object moves back into the client's query view at some point in the future, and we are eventually consistent™️
  • The object never moves back into the client's query view, and the client just holds on to this stale view of an object it was never supposed to have indefinitely
  • The client tries modifying the object at some point, at which point they'll get a compensating write because the server interprets it as modifying an object outside of their query view

I think for all intents and purposes @jbreams' description is more accurate than referring to them as "orphaned objects" unless that terminology is used elsewhere to refer to the above scenario

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reworded the changelog entry based on Jonathan's recommendation - hopefully it is clearer now.

CHANGELOG.md Outdated
@@ -6,7 +6,8 @@

### Fixed
* <How do the end-user experience this issue? what was the impact?> ([#????](https://github.com/realm/realm-core/issues/????), since v?.?.?)
* None.
* Fix data from a previous interrupted bootstrap was potentially being included with the bootstrap data during retry attempt
and complete bootstraps were potentially not being applied if the session restarted once fully downloaded. ([#7827](https://github.com/realm/realm-core/issues/7827), since 14.8.0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the data would be re-downloaded right?

CHANGELOG.md Outdated
@@ -6,7 +6,8 @@

### Fixed
* <How do the end-user experience this issue? what was the impact?> ([#????](https://github.com/realm/realm-core/issues/????), since v?.?.?)
* None.
* Fix data from a previous interrupted bootstrap was potentially being included with the bootstrap data during retry attempt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kmorkos correct me if i'm wrong, but I think this will just lead to compensating writes rather than diverging history.

But also, this changelog entry is super confusing. The audience for changelog entries are SDK engineers and end users who likely don't know about the pending bootstrap store. Perhaps something like

  • If a sync session were interrupted by a disconnect while downloading a bootstrap more writes may have been made to the database than necessary when the sync session reconnected, and there may be objects stored that do not match the actual state of the server - potentially leading to compensating writes.

Also, I don't think this started in 14.8.0 - I think this started in v12.0.0 https://github.com/realm/realm-core/blob/master/CHANGELOG.md#1200-release-notes.

process_pending_flx_bootstrap(); // throws
}
catch (const IntegrationException& error) {
on_integration_failure(error);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if we have an integration failure here? I guess the client will just continue to resume the session but without applying the bootstrap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - it's the same if there is an integration failure during activate()

m_client_error = util::none;

m_upload_progress = m_progress.upload;
m_enlisted_to_send = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did this all change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted - I had reformatted the function when I was playing around with moving the bootstrap apply to this funciton.

Copy link

coveralls-official bot commented Jun 28, 2024

Pull Request Test Coverage Report for Build michael.wilkersonbarker_1201

Details

  • 18 of 22 (81.82%) changed or added relevant lines in 4 files are covered.
  • 25 unchanged lines in 8 files lost coverage.
  • Overall coverage increased (+0.5%) to 90.994%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/realm/sync/noinst/client_impl_base.cpp 6 10 60.0%
Files with Coverage Reduction New Missed Lines %
src/realm/sync/noinst/client_impl_base.cpp 1 83.12%
src/realm/sync/noinst/server/server.cpp 1 74.44%
test/test_index_string.cpp 1 93.48%
src/realm/mixed.cpp 2 86.46%
src/realm/link_translator.cpp 4 76.92%
src/realm/sync/noinst/pending_bootstrap_store.cpp 4 94.47%
test/object-store/util/sync/baas_admin_api.cpp 5 83.83%
src/realm/bplustree.cpp 7 71.41%
Totals Coverage Status
Change from base Build michael.wilkersonbarker_1200: 0.5%
Covered Lines: 215426
Relevant Lines: 236747

💛 - Coveralls

Copy link

coveralls-official bot commented Jun 28, 2024

Pull Request Test Coverage Report for Build michael.wilkersonbarker_1205

Details

  • 8 of 12 (66.67%) changed or added relevant lines in 3 files are covered.
  • 36 unchanged lines in 11 files lost coverage.
  • Overall coverage increased (+0.03%) to 90.986%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/realm/sync/noinst/client_impl_base.cpp 6 10 60.0%
Files with Coverage Reduction New Missed Lines %
src/realm/query_engine.hpp 1 93.94%
src/realm/sort_descriptor.cpp 1 94.06%
src/realm/sync/network/websocket.cpp 1 71.79%
test/fuzz_tester.hpp 1 57.73%
src/realm/mixed.cpp 2 86.46%
src/realm/query_expression.hpp 2 93.81%
src/realm/sync/noinst/protocol_codec.hpp 3 74.0%
test/fuzz_group.cpp 3 54.23%
src/realm/sync/noinst/pending_bootstrap_store.cpp 4 94.47%
src/realm/sync/network/network.hpp 7 85.28%
Totals Coverage Status
Change from base Build michael.wilkersonbarker_1203: 0.03%
Covered Lines: 215403
Relevant Lines: 236743

💛 - Coveralls

Copy link

coveralls-official bot commented Jul 1, 2024

Pull Request Test Coverage Report for Build michael.wilkersonbarker_1221

Details

  • 8 of 12 (66.67%) changed or added relevant lines in 3 files are covered.
  • 45 unchanged lines in 11 files lost coverage.
  • Overall coverage increased (+0.005%) to 90.987%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/realm/sync/noinst/client_impl_base.cpp 6 10 60.0%
Files with Coverage Reduction New Missed Lines %
src/realm/index_string.hpp 1 93.48%
test/test_index_string.cpp 1 93.48%
src/realm/mixed.cpp 2 86.46%
src/realm/sync/network/http.hpp 2 82.27%
test/test_thread.cpp 2 65.79%
src/realm/sync/noinst/server/server.cpp 3 73.63%
src/realm/sync/noinst/client_impl_base.cpp 4 82.57%
src/realm/sync/noinst/pending_bootstrap_store.cpp 4 94.47%
src/realm/sync/noinst/protocol_codec.hpp 4 73.82%
src/realm/bplustree.cpp 6 72.55%
Totals Coverage Status
Change from base Build michael.wilkersonbarker_1219: 0.005%
Covered Lines: 215782
Relevant Lines: 237158

💛 - Coveralls

Copy link

coveralls-official bot commented Jul 1, 2024

Pull Request Test Coverage Report for Build michael.wilkersonbarker_1224

Details

  • 8 of 12 (66.67%) changed or added relevant lines in 3 files are covered.
  • 43 unchanged lines in 15 files lost coverage.
  • Overall coverage increased (+0.02%) to 91.013%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/realm/sync/noinst/client_impl_base.cpp 6 10 60.0%
Files with Coverage Reduction New Missed Lines %
src/realm/mixed.cpp 1 86.46%
src/realm/query_engine.hpp 1 93.94%
src/realm/sync/instructions.hpp 1 76.03%
src/realm/sync/network/websocket.cpp 1 71.79%
test/test_table.cpp 1 99.51%
src/realm/sync/transform.cpp 2 61.05%
src/realm/table_view.cpp 2 92.99%
test/test_lang_bind_helper.cpp 2 93.2%
src/realm/sync/instruction_replication.cpp 3 91.48%
test/fuzz_group.cpp 3 48.17%
Totals Coverage Status
Change from base Build michael.wilkersonbarker_1223: 0.02%
Covered Lines: 215819
Relevant Lines: 237131

💛 - Coveralls

CHANGELOG.md Outdated
@@ -10,6 +10,10 @@
* Fixed removing backlinks from the wrong objects if the link came from a nested list, nested dictionary, top-level dictionary, or list of mixed, and the source table had more than 256 objects. This could manifest as `array_backlink.cpp:112: Assertion failed: int64_t(value >> 1) == key.value` when removing an object. ([#7594](https://github.com/realm/realm-core/issues/7594), since v11 for dictionaries)
* Fixed the collapse/rejoin of clusters which contained nested collections with links. This could manifest as `array.cpp:319: Array::move() Assertion failed: begin <= end [2, 1]` when removing an object. ([#7839](https://github.com/realm/realm-core/issues/7839), since the introduction of nested collections in v14.0.0-beta.0)
* wait_for_upload_completion() was inconsistent in how it handled commits which did not produce any changesets to upload. Previously it would sometimes complete immediately if all commits waiting to be uploaded were empty, and at other times it would wait for a server roundtrip. It will now always complete immediately. ([PR #7796](https://github.com/realm/realm-core/pull/7796)).
* If a sync session is interrupted by a disconnect or restart while downloading a bootstrap, stale data from the previous
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did this get split up onto multiple lines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was manually word wrapped, but reverted change.

@michael-wb michael-wb merged commit 321b889 into mwb/user-role-change-tests Jul 1, 2024
3 of 4 checks passed
@michael-wb michael-wb deleted the mwb/fix-reset-bootstrap-store branch July 1, 2024 21:20
michael-wb pushed a commit that referenced this pull request Jul 4, 2024
* Moved role change tests to separate test file

* Fixed building of new flx_role_change.cpp file

* Added local changes w/role bootstrap test - fixed exception in subscription store during server initiated boostrap

* Updated local change test to include valid offline writes during role change

* Added role change test during initial schema bootstrap

* Wrapped up role change during bootstrap tests

* Removed debug statments to fix thread sanitizer

* Updated sub state comments and reverted a minor change

* Refactored role change tests and broke out into 2 separate test cases

* Moved harness from a global to a static var in each test case

* Reverted resetting the bootstrapping subscription state back to Pending

* Updated baas to use protocol v14 and removed the feature flag for role change bootstraps

* Removed left over code in  statement...

* Updated baasaas version to be a cached version

* Updated baasaas githash and reordered role change during bootstrap to check for role change bootstrap as first validation step

* Minor updates to reuse the verify_records() fcn

* RCORE-2174 Bootstrap store is not being reset if initial subscription bootstrap is interrupted by role change (#7831)

* Updated pending bootstrap store to be processed (applied or cleared) when the session or connection is restarted without restarting the Sync Session
michael-wb pushed a commit that referenced this pull request Jul 19, 2024
* RCORE-1872 Sync client should allow server bootstrapping at any time (#7440)
* First round of changes for server-initiated bootstraps
* Added test for role change bootstraps
* Updated test for handle role bootstraps
* Updated baas/baasaas to use branch with fixes
* Updated test to verify bootstrap actually occurred
* Fixed tsan warning
* Updates from review; added comments to clarify bootstrap detection logic
* Reverted baas branch to master and protocol version to 12
* Added comments to changes needed when merging to master; update baas version to not use master
* Pulled over changes from other branch and tweaking download params
* Refactored tests to validate different bootstrap types
* Updated tests to get passing using the server params
* Updated to support new batch_state protocol changes; updated tests
* Updated role change tests and merged test from separate PR
* Fixed issue with flx query verion 0 not being treated as a bootstrap
* Cleaned up the tests a bit and reworked query version 0 handling
* Updates from review; updated batch_state for schema bootstraps
* Removed extra mutex in favor of state machine's mutex
* Increased timeout when waiting for app initial sync to complete
* Updated role change test to use test commands
* Update resume and ident message handling
* Updated future waits for the pause/resume test command
* Added session connected event for when session multiplexing is disabled
* Added wait_until() to state machine to wait for callback; updated role change test

* RCORE-1973 Add role/permissions tests for new bootstrap feature (#7675)
* Moved role change tests to separate test file
* Fixed building of new flx_role_change.cpp file
* Added local changes w/role bootstrap test - fixed exception in subscription store during server initiated boostrap
* Updated local change test to include valid offline writes during role change
* Added role change test during initial schema bootstrap
* Wrapped up role change during bootstrap tests
* Removed debug statments to fix thread sanitizer
* Updated sub state comments and reverted a minor change
* Refactored role change tests and broke out into 2 separate test cases
* Moved harness from a global to a static var in each test case
* Reverted resetting the bootstrapping subscription state back to Pending
* Updated baas to use protocol v14 and removed the feature flag for role change bootstraps
* Updated baasaas version to be a cached version
* Updated baasaas githash and reordered role change during bootstrap to check for role change bootstrap as first validation step
* Minor updates to reuse the verify_records() fcn

* RCORE-2174 Bootstrap store is not being reset if initial subscription bootstrap is interrupted by role change (#7831)
* Updated pending bootstrap store to be processed (applied or cleared) when the session or connection is restarted without restarting the Sync Session

* RCORE-1974 Add tests for role/permissions changed during client reset (#7840)
* re-applied changes after base branch was merged to feature branch
* Updates to address test failures
* Disable role change check during fresh realm download
* Updated comments for clarity
* Updates from review; added a bunch of comments to test
* Updates to role change tests per review comments
* removed ostream support for SyncClientHookEvent
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 31, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bootstrap store is not being reset if initial subscription bootstrap is interrupted by role change
4 participants