-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test FLX sync geospatial queries #7042
Conversation
Some tests fail because the server sends a different error message unrelated to geospatial. You can either fix them, or use an older baas version and I'll fix the tests. I can provide the baas version you can use. |
Can we update baas to an even newer version and bump go version to 1.21 (required by latest baas)? Actually, the upgrade happened on October 5th (65d3d091584f47fe8c84d184ae753f7015f59e95), so how is baas still working? |
Pull Request Test Coverage Report for Build james.stone_419
💛 - Coveralls |
@@ -655,7 +652,7 @@ LocalVersionIDs perform_client_reset_diff(DBRef db_local, DBRef db_remote, sync: | |||
// partially recovered, but interrupted may continue sync the next time it is | |||
// opened with only partially recovered state while having lost the history of any | |||
// offline modifications. | |||
history_local->set_client_file_ident_in_wt(wt_local->get_version(), client_file_ident); | |||
history_local->set_client_file_ident_in_wt(wt_local->get_version(), remote_ident); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the server complain that we BIND needing a file ident which the server provides, and then we IDENT with a different file ident? I was wondering if we need to restart the session 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One curiosity, why can't we set the file ident at line 632? (would've helped if the comment at line 631 explained why)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't set the ident on 632 because if the recovery is interrupted at some point before finishing, we still want the server to send a client reset due to having the old ident. In FLX recovery there is a series of commits and we only want to use a new ident once everything has been successfully committed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The server doesn't seem to complain about using the other IDENT. But that change isn't necessary here, and I think I'll restore the original just in case it does in fact matter to the server somehow.
35238cb
to
95f10f2
Compare
src/realm/sync/subscriptions.cpp
Outdated
@@ -352,6 +352,18 @@ void MutableSubscriptionSet::insert_sub(const Subscription& sub) | |||
m_subs.push_back(sub); | |||
} | |||
|
|||
void MutableSubscriptionSet::on_boostrap_cleared() | |||
{ | |||
update_state(State::Pending); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is setting the subscription state back to pending helping?
Fair point, setting back to the pending state isn't actually necessary. I can revert this. The main thing is that we want this subscription set to be picked up to send to the server again. This is done by get_next_pending_version()
which queries for "(pending OR boostrapping) and snapshot_version > client_upload_version" so the only relevant change is to update the snapshot version at this point so that we restart the bootstrap process of this subscription with the server.
@ironage My fixes are in, so you should be able to continue the work on this. |
download/upload/advance may have already happened by the time the test code gets to the wait_for_upload() line which results in the test hanging forever because there are no further changes. A fix is to do a timed wait for the expected state which will succeed immediately if the same race occurs.
95f10f2
to
3caa441
Compare
--configFile=etc/configs/test_config.json --configFile="${BASE_PATH}/config_overrides.json" > "${BAAS_SERVER_LOG}" 2>&1 & | ||
--configFile=etc/configs/test_config.json --configFile=etc/configs/test_rcore_config.json > "${BAAS_SERVER_LOG}" 2>&1 & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tyler requested that the BAAS team maintain the config overrides for our tests so that they can keep them up to date with server features. I've removed our local overrides file evergreen/config_overrides.json
and now this points to a config in their repo https://github.com/10gen/baas/blob/master/etc/configs/test_rcore_config.json
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I will eventually move our dependency definitions (e.g. STITCH_SUPPORT_LIB_URL
) to use the baas definitions as well, but they need to add the definitions for Ubuntu before this happens.
@@ -441,8 +441,12 @@ TEST_CASE("Test client migration and rollback with recovery", "[sync][flx][flx m | |||
// Migrate back to FLX - and keep the realm session open | |||
trigger_server_migration(session.app_session(), MigrateToFLX, logger_ptr); | |||
|
|||
REQUIRE(!wait_for_upload(*outer_realm)); | |||
REQUIRE(!wait_for_download(*outer_realm)); | |||
// wait for the subscription store to initialize after downloading |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of polling, one other way to fix this (I think) is to commit a change and keep the completion callbacks.
@@ -2047,17 +2094,18 @@ TEST_CASE("flx: interrupted bootstrap restarts/recovers on reconnect", "[sync][f | |||
} | |||
|
|||
auto realm = Realm::get_shared_realm(interrupted_realm_config); | |||
auto table = realm->read_group().get_table("class_TopLevel"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should work:
auto table = realm->read_group().get_table("class_TopLevel");
realm->get_latest_subscription_set().get_state_change_notification(sync::SubscriptionSet::State::Complete).get();
realm->refresh();
REQUIRE(table->size() == obj_ids_at_end.size());
for (auto& id : obj_ids_at_end) {
REQUIRE(table->find_primary_key(Mixed{id}));
}
Same below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess as to why I saw a failure here was that the server marks the subscription here as complete even with no objects, because the server hasn't processed the object insertions server-side so the translator doesn't know about them yet. Isn't that possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm..I thought the issue is that the realm has the data but it's pinned to an older version. I think we wait until the server integrates the changes, but maybe they are not in mongo at the point bootstrap starts and a snapshot is needed (so we could poll on that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That does sound possible, but it's something the test can check for (by querying the mongo collection and waiting for it to be populated). I think the current changes here are making the test incorrectly accept some invalid behaviors (e.g. it would pass if an interrupted bootstrap always resulted in the subscription set immediately being marked as complete). I think all of the introductions of timed_wait_for()
here are incorrect and are just making the tests pass even when something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair points. I will add some waits on the mongo server collection state and go back to relying on the subscription notifications and see what happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If polling on the mongo collection doesn't work, another idea would be to poll on a different Realm than the one being tested, so that you can make strict expectations about how the realm under test should behave given the observed state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good - are there going to be additional changes to the tests per the comments?
No, I've removed the test polling code but I think polling the server backend is unnecessary after Daniel's fixes which resolved a few test hangs. We will just have to wait and see if there are any further test hangs and address them as the come up. |
Fixes #7007
Updates the BAAS version we test with to today's version.