Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RCORE-2060 Enabling 'cancel_waits_on_nonfatal_error' does not cancel waits during location update while offline #7528

Merged
merged 25 commits into from
Apr 26, 2024

Conversation

michael-wb
Copy link
Contributor

@michael-wb michael-wb commented Mar 27, 2024

What, How & Why?

After the location update fixes in #7469, the operation of @AutoOpen in the Swift SDK was still an issue if the Realm needed to update the location and was currently offline. In order for @AutoOpen to work, the sync session needs to cancel any waiters with an error if the websocket connection to the server is not successful, which is done by setting the SyncConfig::cancel_waits_on_nonfatal_error flag to true.

This PR fixes this operation and adds tests for the SyncConfig::cancel_waits_on_nonfatal_error setting since these did not currently exist.

Fixes #7527

☑️ ToDos

  • 📝 Changelog update
  • 🚦 Tests (or not relevant)
  • [ ] C-API, if public C++ API changed
  • [ ] bindgen/spec.yml, if public C++ API changed

Copy link

coveralls-official bot commented Mar 27, 2024

Pull Request Test Coverage Report for Build michael.wilkersonbarker_1057

Details

  • 212 of 257 (82.49%) changed or added relevant lines in 6 files are covered.
  • 71 unchanged lines in 15 files lost coverage.
  • Overall coverage remained the same at 90.751%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/realm/exceptions.cpp 1 3 33.33%
test/object-store/util/test_file.cpp 27 41 65.85%
test/object-store/realm.cpp 173 202 85.64%
Files with Coverage Reduction New Missed Lines %
src/realm/index_string.hpp 1 93.48%
test/test_query2.cpp 1 98.73%
src/realm/array_blobs_big.cpp 2 98.58%
src/realm/query_expression.cpp 2 86.62%
src/realm/util/to_string.cpp 2 97.62%
src/realm/sync/noinst/protocol_codec.hpp 3 73.5%
src/realm/table.cpp 3 90.08%
test/fuzz_group.cpp 3 51.67%
test/test_thread.cpp 3 64.84%
src/realm/util/assert.hpp 4 87.1%
Totals Coverage Status
Change from base Build 2262: 0.0%
Covered Lines: 212445
Relevant Lines: 234096

💛 - Coveralls

test/object-store/realm.cpp Outdated Show resolved Hide resolved
@michael-wb
Copy link
Contributor Author

Hi @tgoyne or @dianaafanador3 - would either of you mind trying these changes out with the Swift @AutoOpen class to make sure it fixes the current issues (e.g. gets stuck in .connecting state while offline)?
Thank you! 🙂

location_refresh_called = true;
if (mode == location_fails) {
// Fake "offline/request timed out" custom error response
return app::Response{0, 28, {}, "Operation timed out"};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the 28 have some special significance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really - it just corresponded with CURLE_OPERATION_TIMEDOUT (28), since there really isn't a default error returned when the operation times out and is more transport implementation specific.
https://curl.se/libcurl/c/libcurl-errors.html#:~:text=CURLE_OPERATION_TIMEDOUT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a define for the value to try to explain why the value is being used.

@danieltabacaru
Copy link
Collaborator

@michael-wb Is is a guarantee that AutoOpen sets SyncConfig::cancel_waits_on_nonfatal_error to true?

@michael-wb
Copy link
Contributor Author

@michael-wb Is is a guarantee that AutoOpen sets SyncConfig::cancel_waits_on_nonfatal_error to true?

Yes - here is where it is being set: https://github.com/realm/realm-swift/blob/master/RealmSwift/SwiftUI.swift#L1533-L1540


SyncTestFile config(oas, "realm");
auto user = config.sync_config->user;
config.sync_config->cancel_waits_on_nonfatal_error = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add a test (maybe already have one?) when this is false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's needed...
When this is false, the session will continue with retrying to connect to the server, so the test case expectations will be different. In addition, all the other aynsc open tests (as well as many others), validate the sync client operation with this flag set to false (the default).

Copy link
Collaborator

@danieltabacaru danieltabacaru Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, what I meant is to test that behavior and expectations too (in a separate test not as a section). But I think you added a test in the previous PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the tests added in the previous PR should test the normal cases; although I could add an additional test specifically for async open if you think that is necessary.

Copy link
Collaborator

@danieltabacaru danieltabacaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢 🇮🇹

TestMode mode = GENERATE(location_fails, token_fails, token_not_authorized);

SECTION(util::format("access token expired when realm is opened - mode: %1", txt_test_mode(mode))) {
logger->trace(">>> access token expired when realm is opened - mode: %1", txt_test_mode(mode));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if this is mainly for CI, I don't think we run the tests with trace logs (could use debug)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were originally in for debugging the tests, but I ended up removing them from the test.

REQUIRE_FALSE(error.is_fatal);
return;
}
if (error.status.code() == ErrorCodes::AuthError) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at the websocket replies, these are the only two errors allowed. you could just:

REQUIRE(error.status.code() == ErrorCodes::AuthError);
REQUIRE(error.is_fatal);


TestMode mode = GENERATE(location_fails, token_fails, token_not_authorized);

SECTION(util::format("access token expired when realm is opened - mode: %1", txt_test_mode(mode))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catch has DYNAMIC_SECTION for this: DYNAMIC_SECTION("access token expired when realm is opened - mode: " << txt_test_mode(mode)).

This pattern of doing configuration in the SECTION and then the shared test afterwards has the problem that it results in catch not being able to tell us which section failed, and relying on logging for that is awkward. I think the intended way to do this with Catch would be using GENERATE on a second enum and switching on that instead of using sections. CAPTURE(mode) can be used to make it print the generated mode on failure.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to know 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tgoyne - I updated the tests to use the DYNAMIC_SECTION so the individual sections would show up in the test results.

@dianaafanador3
Copy link
Contributor

dianaafanador3 commented Mar 28, 2024

Hi @tgoyne or @dianaafanador3 - would either of you mind trying these changes out with the Swift @AutoOpen class to make sure it fixes the current issues (e.g. gets stuck in .connecting state while offline)? Thank you! 🙂

So I tested this with the template and we are no longer getting the connection when using AutoOpen, but I do get a lot of connection errors when recovering the connection. The message of the error is not very detailed.
Error Domain=io.realm.sync Code=4 "error" UserInfo={NSLocalizedDescription=error}
The error message is coming from core

@michael-wb
Copy link
Contributor Author

Thank you @dianaafanador3 - do you have a log file (maybe share via slack) and how is the connection recovered? using SyncSession::handle_reconnect()?

Copy link
Contributor

@dianaafanador3 dianaafanador3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Errors are not longer showing during the manual testing, and offline data is synced when the connection is recovered.

@michael-wb
Copy link
Contributor Author

@danieltabacaru / @jbreams - This has been updated to the latest master. Can I please get a quick sanity check before I merge for today's release? Thanks

@@ -1284,14 +1285,17 @@ void App::refresh_access_token(const std::shared_ptr<User>& user, bool update_lo
return;
}

log_debug("App: refresh_access_token: email: %1 %2", user->user_profile().email(),
update_location ? "(updating location)" : "");
log_debug("App: refresh_access_token: user-id: %1%2", user->user_id(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: user_id

// Close the app instance (or tear down the TestAppSession)
void close(bool tear_down = false);
// Re-open the app instance using app_config
void reopen(bool log_in = false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed? 🤔 Can't you create a new TestAppSession if needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so I can shut down and restart the app and retain the same metadata realm and logged in user in order to simulate the "restart client app with cached user" issue where the location info hasn't been queried yet when the sync session is started.
If I create a new TestAppSession, it will create a new directory and a new logged in user (which will request the location info in the process).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it. it may be worth being able to create a TestAppSession that retains the same metadata realm (i.e, by passing the directory and check if the realm exists, otherwise start from fresh).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to have TestAppSession remain a strictly scoped type and have a way to reuse a metadata directory between instances, but it's also not a big deal and not worth spending much time on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can update this in a future PR.

@michael-wb michael-wb changed the base branch from master to release/14.6.0 April 26, 2024 15:32
@michael-wb michael-wb changed the base branch from release/14.6.0 to prepare-14.6.0 April 26, 2024 15:39
{
try {
if (exc_ptr) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't appear to be used anywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was previously using this to get the exception thrown in the Async open start callback, but not anymore. Removed this unused code.

@@ -1348,6 +1495,111 @@ TEST_CASE("Get Realm using Async Open", "[sync][pbs][async open]") {
}
}

#if REALM_ENABLE_AUTH_TESTS
#if REALM_APP_SERVICES
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REALM_ENABLE_AUTH_TESTS implies REALM_APP_SERVICES so they don't both need to be checked.

@michael-wb michael-wb merged commit bddec6c into prepare-14.6.0 Apr 26, 2024
6 of 7 checks passed
@michael-wb michael-wb deleted the mwb/fix-cancel-waits-on-nonfatal-error branch April 26, 2024 16:19
@michael-wb michael-wb mentioned this pull request Apr 26, 2024
michael-wb pushed a commit that referenced this pull request Apr 26, 2024
* Prepare release

* [bindgen] expose both SyncUser and app::User (#7634)

* RCORE-2060 Enabling 'cancel_waits_on_nonfatal_error' does not cancel waits during location update while offline (#7528)

* added tests using cancel_waits_on_nonfatal_error and fix operation during location update
* Updated changelog and updated comments/debug statements
* fix swift build and test and tsan errors
* Added test to replicate swift autoopen feature
* Fixed swift build issue
* removed an unused function
* Updates from review

---------

Co-authored-by: Kenneth Geisshirt <[email protected]>
Co-authored-by: Michael Wilkerson-Barker <[email protected]>
michael-wb pushed a commit that referenced this pull request Apr 26, 2024
…waits during location update while offline (#7528)

* added tests using cancel_waits_on_nonfatal_error and fix operation during location update
* Updated changelog and updated comments/debug statements
* fix swift build and test and tsan errors
* Added test to replicate swift autoopen feature
* Fixed swift build issue
* removed an unused function
* Updates from review
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 26, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enabling 'cancel_waits_on_nonfatal_error' does not cancel waits during location update while offline
5 participants