RCORE-1982: Opening realm with cached user while offline results in fatal error and session does not retry connection #7365

michael-wb · 2024-02-21T23:50:17Z

What, How & Why?

Moved the location update when a realm is opened at client App start with a cached user to be performed by the sync manager. Until the location has been updated, the opened sync sessions will be in the WaitingForLocation state. Once the location is updated, the active sessions in this state will be revived and either go to the Active or WaitingForAccessToken state as normal.

Fixes #7349

☑️ ToDos

📝 Changelog update
🚦 Tests (or not relevant)
C-API, if public C++ API changed
bindgen/spec.yml, if public C++ API changed

…r; fixed timers not starting in network::Service

…al-location-failure

coveralls-official · 2024-02-22T06:04:43Z

Pull Request Test Coverage Report for Build michael.wilkersonbarker_971

Details

209 of 226 (92.48%) changed or added relevant lines in 9 files are covered.
126 unchanged lines in 18 files lost coverage.
Overall coverage decreased (-0.03%) to 91.835%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
test/test_util_network.cpp	37	39	94.87%
src/realm/object-store/sync/sync_session.cpp	25	31	80.65%
src/realm/object-store/sync/sync_manager.cpp	85	94	90.43%

Files with Coverage Reduction	New Missed Lines	%
src/realm/index_string.cpp	1	87.85%
src/realm/index_string.hpp	1	82.86%
test/object-store/sync/app.cpp	1	97.92%
src/realm/array_blobs_big.cpp	2	98.72%
src/realm/sync/noinst/server/server_history.cpp	2	67.94%
src/realm/util/serializer.cpp	2	90.03%
src/realm/uuid.cpp	2	97.01%
test/test_sync.cpp	2	94.14%
src/realm/sync/transform.cpp	3	63.07%
src/realm/util/file.cpp	3	81.47%

Totals
Change from base Build 2051:	-0.03%
Covered Lines:	235370
Relevant Lines:	256297

💛 - Coveralls

tgoyne

This appears to still break the AutoOpen use case. For that we want to attempt to connect exactly once, and if we hit any errors we switch to synchronously opening the current local data. If I understand this correctly it'll swallow all connection errors when trying to get the location and make us unable to do this.

tgoyne · 2024-02-22T17:09:42Z

test/object-store/realm.cpp

@@ -1203,7 +1204,9 @@ TEST_CASE("Get Realm using Async Open", "[sync][pbs][async open]") {
        });
        std::lock_guard<std::mutex> lock(mutex);
        REQUIRE(called);
-        REQUIRE(got_error);
+        timed_wait_for([&] {


What's making this test now be asynchronous? All of the network requests are resolved synchronously, so unless it's waiting for the 8 minute timeout...

The test is relying on got_error to be set to true via the error_handler, which is called after wait_for_download() returns with an error that is provided to the start() handler. There was a race condition in the test between when called was set to true and the got_error being set to true by the error_handler.

This is the same situation that was happening with the similar test in the C_API async open tests.

src/realm/object-store/sync/sync_manager.cpp

src/realm/object-store/sync/impl/sync_client.hpp

jbreams · 2024-02-22T16:32:42Z

src/realm/object-store/sync/sync_session.cpp

+    util::CheckedUniqueLock lock(m_state_mutex);
+    // If the state is not waiting for location, bail early
+    if (m_state != State::WaitingForLocation) {
+        return;


How could you end up in this case? Do we actually want to restart the session if the location is different than it was before?

Since we're going through the list of SyncSessions (with external references) to notify them of the location update, we wouldn't want to restart any sessions that are currently paused.

src/realm/object-store/sync/sync_manager.cpp

jbreams · 2024-02-22T18:38:31Z

@tgoyne , AutoOpen vs AsyncOpen vs whatever other kinds of open we have are getting a bit muddled for me. Can you point to where in the swift SDK (I assume that's where AutoOpen) is implemented so we can write a test that verifies this functionality? My read is that updating your location via other App calls (like trying to log in a user) will still fail semi-synchronously, and this change just makes actually starting a sync::Session asynchronously retry getting a location if it doesn't have one. Maybe I've missed something though.

tgoyne · 2024-02-22T18:53:38Z

The AutoOpen implementation is quite complicated and may not be particularly information. There are no other App calls involved; in the relevant use-case we already have a cached logged-in user from a previous run of the application. We call Realm::get_synchronized_realm() with cancel_waits_on_nonfatal_error=true to make the async open fail on any error rather than just non-transient errors, and fall back to a synchronous open if any errors occur. For this to work all transient errors when opening a session have to be reported to the error handler.

jbreams · 2024-02-22T19:12:32Z

So to adapt these changes to fit this case we'd need to add some handling here https://github.com/realm/realm-core/pull/7365/files#diff-8a4439bf8b1d6f5ce56b98f9d0409beb874ce6de9aee5ed795688f5d4378787eR750 and maybe have SyncSession::handle_error() drive the location update backoff?

tgoyne · 2024-02-23T02:29:30Z

A higher level thought on design: I think that SyncSession shouldn't really be aware of location fetching at all. It could have a single WaitingForDependencies state that it enters when it tries to activate and one of the things it needs is missing and a way to ask its SyncUser to try again to do whatever things it needs to do to supply the dependencies (and then pass the resulting error back to any waiters if applicable).

I think we had a pre-existing problem where an expired cached access token would result in AutoOpen failing to fall back to the local realm, and it seems like we have to solve all of the exact same problems for the waiting for access token state and waiting for location state.

…al-location-failure

michael-wb · 2024-03-21T19:09:06Z

This PR is superceded by the fix in #7469.

Michael Wilkerson-Barker added 2 commits February 21, 2024 18:25

Updated sync session location update to be handled by the sync manage…

19ffea4

…r; fixed timers not starting in network::Service

Merge branch 'master' of github.com:realm/realm-core into mwb/fix-fat…

c74adb7

…al-location-failure

michael-wb self-assigned this Feb 21, 2024

cla-bot bot added the cla: yes label Feb 21, 2024

michael-wb linked an issue Feb 21, 2024 that may be closed by this pull request

network::Service does not start waiting on timers if no other events are currently active #7364

Closed

Fixed hanging test

992a18f

Fixed compile error in swift-build-and-test

86611ac

michael-wb requested review from jbreams, danieltabacaru and tgoyne February 22, 2024 15:29

Michael Wilkerson-Barker added 2 commits February 22, 2024 10:37

Updated changelog

6d3f880

Fixed failing C_API async open test

10e36b2

tgoyne reviewed Feb 22, 2024

View reviewed changes

jbreams reviewed Feb 22, 2024

View reviewed changes

Michael Wilkerson-Barker added 6 commits February 22, 2024 23:24

First round of updates from review

c70dd3e

Merge branch 'master' of github.com:realm/realm-core into mwb/fix-fat…

7a4a069

…al-location-failure

Updates from review - reconnect info is now configurable

8a6908f

Merge branch 'master' of github.com:realm/realm-core into mwb/fix-fat…

d924329

…al-location-failure

Updated changelog after build

c69dbc0

Fixed some build/test issues

180e44b

michael-wb removed a link to an issue Mar 5, 2024

network::Service does not start waiting on timers if no other events are currently active #7364

Closed

michael-wb mentioned this pull request Mar 11, 2024

RCORE-2007 Added Resumption delay configuration to SyncClientTimeouts #7441

Merged

4 tasks

michael-wb closed this Mar 21, 2024

github-actions bot locked as resolved and limited conversation to collaborators Apr 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RCORE-1982: Opening realm with cached user while offline results in fatal error and session does not retry connection #7365

RCORE-1982: Opening realm with cached user while offline results in fatal error and session does not retry connection #7365

michael-wb commented Feb 21, 2024 •

edited

Loading

coveralls-official bot commented Feb 22, 2024 •

edited

Loading

tgoyne left a comment

tgoyne Feb 22, 2024

michael-wb Feb 22, 2024

jbreams Feb 22, 2024

michael-wb Feb 22, 2024

jbreams commented Feb 22, 2024

tgoyne commented Feb 22, 2024 •

edited

Loading

jbreams commented Feb 22, 2024

tgoyne commented Feb 23, 2024

michael-wb commented Mar 21, 2024

RCORE-1982: Opening realm with cached user while offline results in fatal error and session does not retry connection #7365

RCORE-1982: Opening realm with cached user while offline results in fatal error and session does not retry connection #7365

Conversation

michael-wb commented Feb 21, 2024 • edited Loading

What, How & Why?

☑️ ToDos

coveralls-official bot commented Feb 22, 2024 • edited Loading

Pull Request Test Coverage Report for Build michael.wilkersonbarker_971

Details

💛 - Coveralls

tgoyne left a comment

Choose a reason for hiding this comment

tgoyne Feb 22, 2024

Choose a reason for hiding this comment

michael-wb Feb 22, 2024

Choose a reason for hiding this comment

jbreams Feb 22, 2024

Choose a reason for hiding this comment

michael-wb Feb 22, 2024

Choose a reason for hiding this comment

jbreams commented Feb 22, 2024

tgoyne commented Feb 22, 2024 • edited Loading

jbreams commented Feb 22, 2024

tgoyne commented Feb 23, 2024

michael-wb commented Mar 21, 2024

michael-wb commented Feb 21, 2024 •

edited

Loading

coveralls-official bot commented Feb 22, 2024 •

edited

Loading

tgoyne commented Feb 22, 2024 •

edited

Loading