Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RCORE-1982: Opening realm with cached user while offline results in fatal error and session does not retry connection #7365

Closed
wants to merge 12 commits into from

Conversation

michael-wb
Copy link
Contributor

@michael-wb michael-wb commented Feb 21, 2024

What, How & Why?

Moved the location update when a realm is opened at client App start with a cached user to be performed by the sync manager. Until the location has been updated, the opened sync sessions will be in the WaitingForLocation state. Once the location is updated, the active sessions in this state will be revived and either go to the Active or WaitingForAccessToken state as normal.

Fixes #7349

☑️ ToDos

  • 📝 Changelog update
  • 🚦 Tests (or not relevant)
  • C-API, if public C++ API changed
  • bindgen/spec.yml, if public C++ API changed

Copy link

coveralls-official bot commented Feb 22, 2024

Pull Request Test Coverage Report for Build michael.wilkersonbarker_971

Details

  • 209 of 226 (92.48%) changed or added relevant lines in 9 files are covered.
  • 126 unchanged lines in 18 files lost coverage.
  • Overall coverage decreased (-0.03%) to 91.835%

Changes Missing Coverage Covered Lines Changed/Added Lines %
test/test_util_network.cpp 37 39 94.87%
src/realm/object-store/sync/sync_session.cpp 25 31 80.65%
src/realm/object-store/sync/sync_manager.cpp 85 94 90.43%
Files with Coverage Reduction New Missed Lines %
src/realm/index_string.cpp 1 87.85%
src/realm/index_string.hpp 1 82.86%
test/object-store/sync/app.cpp 1 97.92%
src/realm/array_blobs_big.cpp 2 98.72%
src/realm/sync/noinst/server/server_history.cpp 2 67.94%
src/realm/util/serializer.cpp 2 90.03%
src/realm/uuid.cpp 2 97.01%
test/test_sync.cpp 2 94.14%
src/realm/sync/transform.cpp 3 63.07%
src/realm/util/file.cpp 3 81.47%
Totals Coverage Status
Change from base Build 2051: -0.03%
Covered Lines: 235370
Relevant Lines: 256297

💛 - Coveralls

Copy link
Member

@tgoyne tgoyne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to still break the AutoOpen use case. For that we want to attempt to connect exactly once, and if we hit any errors we switch to synchronously opening the current local data. If I understand this correctly it'll swallow all connection errors when trying to get the location and make us unable to do this.

@@ -1203,7 +1204,9 @@ TEST_CASE("Get Realm using Async Open", "[sync][pbs][async open]") {
});
std::lock_guard<std::mutex> lock(mutex);
REQUIRE(called);
REQUIRE(got_error);
timed_wait_for([&] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's making this test now be asynchronous? All of the network requests are resolved synchronously, so unless it's waiting for the 8 minute timeout...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is relying on got_error to be set to true via the error_handler, which is called after wait_for_download() returns with an error that is provided to the start() handler. There was a race condition in the test between when called was set to true and the got_error being set to true by the error_handler.

This is the same situation that was happening with the similar test in the C_API async open tests.

src/realm/object-store/sync/sync_manager.cpp Outdated Show resolved Hide resolved
src/realm/object-store/sync/impl/sync_client.hpp Outdated Show resolved Hide resolved
util::CheckedUniqueLock lock(m_state_mutex);
// If the state is not waiting for location, bail early
if (m_state != State::WaitingForLocation) {
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How could you end up in this case? Do we actually want to restart the session if the location is different than it was before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're going through the list of SyncSessions (with external references) to notify them of the location update, we wouldn't want to restart any sessions that are currently paused.

src/realm/object-store/sync/sync_manager.cpp Outdated Show resolved Hide resolved
src/realm/object-store/sync/sync_manager.cpp Outdated Show resolved Hide resolved
src/realm/object-store/sync/sync_manager.cpp Outdated Show resolved Hide resolved
src/realm/object-store/sync/sync_manager.cpp Outdated Show resolved Hide resolved
src/realm/object-store/sync/sync_manager.cpp Outdated Show resolved Hide resolved
src/realm/object-store/sync/sync_manager.cpp Outdated Show resolved Hide resolved
src/realm/object-store/sync/sync_manager.cpp Outdated Show resolved Hide resolved
src/realm/object-store/sync/sync_manager.cpp Outdated Show resolved Hide resolved
@jbreams
Copy link
Contributor

jbreams commented Feb 22, 2024

@tgoyne , AutoOpen vs AsyncOpen vs whatever other kinds of open we have are getting a bit muddled for me. Can you point to where in the swift SDK (I assume that's where AutoOpen) is implemented so we can write a test that verifies this functionality? My read is that updating your location via other App calls (like trying to log in a user) will still fail semi-synchronously, and this change just makes actually starting a sync::Session asynchronously retry getting a location if it doesn't have one. Maybe I've missed something though.

@tgoyne
Copy link
Member

tgoyne commented Feb 22, 2024

The AutoOpen implementation is quite complicated and may not be particularly information. There are no other App calls involved; in the relevant use-case we already have a cached logged-in user from a previous run of the application. We call Realm::get_synchronized_realm() with cancel_waits_on_nonfatal_error=true to make the async open fail on any error rather than just non-transient errors, and fall back to a synchronous open if any errors occur. For this to work all transient errors when opening a session have to be reported to the error handler.

@jbreams
Copy link
Contributor

jbreams commented Feb 22, 2024

So to adapt these changes to fit this case we'd need to add some handling here https://github.com/realm/realm-core/pull/7365/files#diff-8a4439bf8b1d6f5ce56b98f9d0409beb874ce6de9aee5ed795688f5d4378787eR750 and maybe have SyncSession::handle_error() drive the location update backoff?

@tgoyne
Copy link
Member

tgoyne commented Feb 23, 2024

A higher level thought on design: I think that SyncSession shouldn't really be aware of location fetching at all. It could have a single WaitingForDependencies state that it enters when it tries to activate and one of the things it needs is missing and a way to ask its SyncUser to try again to do whatever things it needs to do to supply the dependencies (and then pass the resulting error back to any waiters if applicable).

I think we had a pre-existing problem where an expired cached access token would result in AutoOpen failing to fall back to the local realm, and it seems like we have to solve all of the exact same problems for the waiting for access token state and waiting for location state.

@michael-wb
Copy link
Contributor Author

This PR is superceded by the fix in #7469.

@michael-wb michael-wb closed this Mar 21, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 20, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Opening realm with cached user while offline results in fatal error and session does not retry connection
3 participants