Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test FLX sync geospatial queries #7042

Merged
merged 10 commits into from
Nov 7, 2023
Merged

Test FLX sync geospatial queries #7042

merged 10 commits into from
Nov 7, 2023

Conversation

ironage
Copy link
Contributor

@ironage ironage commented Oct 6, 2023

Fixes #7007
Updates the BAAS version we test with to today's version.

@ironage ironage requested a review from michael-wb October 6, 2023 23:47
@ironage ironage self-assigned this Oct 6, 2023
@danieltabacaru
Copy link
Collaborator

Some tests fail because the server sends a different error message unrelated to geospatial. You can either fix them, or use an older baas version and I'll fix the tests. I can provide the baas version you can use.

@danieltabacaru
Copy link
Collaborator

danieltabacaru commented Oct 11, 2023

Can we update baas to an even newer version and bump go version to 1.21 (required by latest baas)? Actually, the upgrade happened on October 5th (65d3d091584f47fe8c84d184ae753f7015f59e95), so how is baas still working?

@coveralls-official
Copy link

coveralls-official bot commented Oct 12, 2023

Pull Request Test Coverage Report for Build james.stone_419

  • 129 of 129 (100.0%) changed or added relevant lines in 4 files are covered.
  • 72 unchanged lines in 14 files lost coverage.
  • Overall coverage increased (+0.004%) to 91.662%

Files with Coverage Reduction New Missed Lines %
src/realm/sync/network/http.cpp 2 63.29%
src/realm/table_view.cpp 2 94.18%
src/realm/uuid.cpp 2 97.06%
src/realm/sync/network/network.cpp 3 89.86%
src/realm/sync/noinst/server/server_history.cpp 3 67.85%
src/realm/table.cpp 3 90.63%
src/realm/util/serializer.cpp 3 90.03%
src/realm/sync/noinst/client_reset_operation.cpp 4 87.23%
src/realm/sync/noinst/protocol_codec.hpp 4 76.05%
src/realm/alloc_slab.cpp 5 92.89%
Totals Coverage Status
Change from base Build 1815: 0.004%
Covered Lines: 230803
Relevant Lines: 251799

💛 - Coveralls

@ironage ironage marked this pull request as draft October 16, 2023 17:46
@@ -655,7 +652,7 @@ LocalVersionIDs perform_client_reset_diff(DBRef db_local, DBRef db_remote, sync:
// partially recovered, but interrupted may continue sync the next time it is
// opened with only partially recovered state while having lost the history of any
// offline modifications.
history_local->set_client_file_ident_in_wt(wt_local->get_version(), client_file_ident);
history_local->set_client_file_ident_in_wt(wt_local->get_version(), remote_ident);
Copy link
Collaborator

@danieltabacaru danieltabacaru Oct 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the server complain that we BIND needing a file ident which the server provides, and then we IDENT with a different file ident? I was wondering if we need to restart the session 🤔

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One curiosity, why can't we set the file ident at line 632? (would've helped if the comment at line 631 explained why)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't set the ident on 632 because if the recovery is interrupted at some point before finishing, we still want the server to send a client reset due to having the old ident. In FLX recovery there is a series of commits and we only want to use a new ident once everything has been successfully committed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The server doesn't seem to complain about using the other IDENT. But that change isn't necessary here, and I think I'll restore the original just in case it does in fact matter to the server somehow.

@ironage ironage force-pushed the js/test-geo-flx branch 3 times, most recently from 35238cb to 95f10f2 Compare October 20, 2023 18:18
CHANGELOG.md Outdated Show resolved Hide resolved
@@ -352,6 +352,18 @@ void MutableSubscriptionSet::insert_sub(const Subscription& sub)
m_subs.push_back(sub);
}

void MutableSubscriptionSet::on_boostrap_cleared()
{
update_state(State::Pending);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is setting the subscription state back to pending helping?

Fair point, setting back to the pending state isn't actually necessary. I can revert this. The main thing is that we want this subscription set to be picked up to send to the server again. This is done by get_next_pending_version() which queries for "(pending OR boostrapping) and snapshot_version > client_upload_version" so the only relevant change is to update the snapshot version at this point so that we restart the bootstrap process of this subscription with the server.

CHANGELOG.md Outdated Show resolved Hide resolved
@danieltabacaru
Copy link
Collaborator

@ironage My fixes are in, so you should be able to continue the work on this.

download/upload/advance may have already happened
by the time the test code gets to the wait_for_upload() line
which results in the test hanging forever because there are no further
changes. A fix is to do a timed wait for the expected state which will
succeed immediately if the same race occurs.
Comment on lines -457 to +464
--configFile=etc/configs/test_config.json --configFile="${BASE_PATH}/config_overrides.json" > "${BAAS_SERVER_LOG}" 2>&1 &
--configFile=etc/configs/test_config.json --configFile=etc/configs/test_rcore_config.json > "${BAAS_SERVER_LOG}" 2>&1 &
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tyler requested that the BAAS team maintain the config overrides for our tests so that they can keep them up to date with server features. I've removed our local overrides file evergreen/config_overrides.json and now this points to a config in their repo https://github.com/10gen/baas/blob/master/etc/configs/test_rcore_config.json

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I will eventually move our dependency definitions (e.g. STITCH_SUPPORT_LIB_URL) to use the baas definitions as well, but they need to add the definitions for Ubuntu before this happens.

@ironage ironage marked this pull request as ready for review November 1, 2023 20:28
@@ -441,8 +441,12 @@ TEST_CASE("Test client migration and rollback with recovery", "[sync][flx][flx m
// Migrate back to FLX - and keep the realm session open
trigger_server_migration(session.app_session(), MigrateToFLX, logger_ptr);

REQUIRE(!wait_for_upload(*outer_realm));
REQUIRE(!wait_for_download(*outer_realm));
// wait for the subscription store to initialize after downloading
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of polling, one other way to fix this (I think) is to commit a change and keep the completion callbacks.

@@ -2047,17 +2094,18 @@ TEST_CASE("flx: interrupted bootstrap restarts/recovers on reconnect", "[sync][f
}

auto realm = Realm::get_shared_realm(interrupted_realm_config);
auto table = realm->read_group().get_table("class_TopLevel");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should work:

auto table = realm->read_group().get_table("class_TopLevel");
realm->get_latest_subscription_set().get_state_change_notification(sync::SubscriptionSet::State::Complete).get();
realm->refresh();
REQUIRE(table->size() == obj_ids_at_end.size());
for (auto& id : obj_ids_at_end) {
    REQUIRE(table->find_primary_key(Mixed{id}));
}

Same below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess as to why I saw a failure here was that the server marks the subscription here as complete even with no objects, because the server hasn't processed the object insertions server-side so the translator doesn't know about them yet. Isn't that possible?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm..I thought the issue is that the realm has the data but it's pinned to an older version. I think we wait until the server integrates the changes, but maybe they are not in mongo at the point bootstrap starts and a snapshot is needed (so we could poll on that)

Copy link
Member

@tgoyne tgoyne Nov 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does sound possible, but it's something the test can check for (by querying the mongo collection and waiting for it to be populated). I think the current changes here are making the test incorrectly accept some invalid behaviors (e.g. it would pass if an interrupted bootstrap always resulted in the subscription set immediately being marked as complete). I think all of the introductions of timed_wait_for() here are incorrect and are just making the tests pass even when something went wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair points. I will add some waits on the mongo server collection state and go back to relying on the subscription notifications and see what happens.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If polling on the mongo collection doesn't work, another idea would be to poll on a different Realm than the one being tested, so that you can make strict expectations about how the realm under test should behave given the observed state.

Copy link
Collaborator

@danieltabacaru danieltabacaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

Copy link
Contributor

@michael-wb michael-wb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good - are there going to be additional changes to the tests per the comments?

@ironage
Copy link
Contributor Author

ironage commented Nov 2, 2023

are there going to be additional changes to the tests per the comments?

No, I've removed the test polling code but I think polling the server backend is unnecessary after Daniel's fixes which resolved a few test hangs. We will just have to wait and see if there are any further test hangs and address them as the come up.
Just waiting on final confirmation from the server team that 8db0179 is not going to cause unwanted side effects.

@danieltabacaru danieltabacaru merged commit 7556b53 into master Nov 7, 2023
2 checks passed
@danieltabacaru danieltabacaru deleted the js/test-geo-flx branch November 7, 2023 10:50
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add sync tests for geo queries
4 participants