Skip to content

RCORE-2192 RCORE-2193 Fix FLX download progress reporting #7870

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 15, 2024

Conversation

tgoyne
Copy link
Member

@tgoyne tgoyne commented Jul 9, 2024

We need to store the download progress for each batch of a bootstrap and not just at the end for it to be useful in any way. Whoops.

The server will sometimes send us DOWNLOAD messages with a non-one estimate followed by a one estimate where the byte-level information is the same (as the final message is empty). When this happens we need to report the download completion to the user, so add the estimate to the fields checked for changes.

A subscription change which doesn't actually change what set of objects is in view can result in an empty DOWNLOAD message with no changes other than the query version, and we should report that too.

Fixes #7869. Fixes #7867.

@tgoyne tgoyne self-assigned this Jul 9, 2024
@cla-bot cla-bot bot added the cla: yes label Jul 9, 2024
@tgoyne tgoyne force-pushed the tg/flx-download-progress branch from 39d7751 to e387595 Compare July 10, 2024 16:15
@tgoyne tgoyne marked this pull request as ready for review July 10, 2024 16:35
@tgoyne tgoyne requested review from jbreams and kiburtse July 10, 2024 16:36
Copy link

coveralls-official bot commented Jul 10, 2024

Pull Request Test Coverage Report for Build thomas.goyne_459

Details

  • 247 of 257 (96.11%) changed or added relevant lines in 10 files are covered.
  • 76 unchanged lines in 16 files lost coverage.
  • Overall coverage increased (+0.03%) to 91.005%

Changes Missing Coverage Covered Lines Changed/Added Lines %
test/object-store/sync/session/progress_notifications.cpp 181 191 94.76%
Files with Coverage Reduction New Missed Lines %
src/realm/list.cpp 1 87.52%
src/realm/mixed.cpp 1 86.46%
src/realm/query_expression.hpp 1 93.88%
src/realm/sort_descriptor.cpp 1 94.06%
test/test_table.cpp 1 99.51%
src/realm/array_blobs_big.cpp 2 98.58%
src/realm/sync/client.cpp 2 91.09%
test/object-store/sync/client_reset.cpp 2 99.1%
test/test_thread.cpp 3 66.14%
src/realm/table.cpp 4 90.55%
Totals Coverage Status
Change from base Build 2487: 0.03%
Covered Lines: 215445
Relevant Lines: 236741

💛 - Coveralls

@@ -65,6 +65,25 @@ void AsyncOpenTask::start(AsyncOpenCallback callback)
session->revive_if_needed();
}

util::Future<ThreadSafeReference> AsyncOpenTask::start()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

@@ -801,7 +804,8 @@ void SessionImpl::update_subscription_version_info()
}

bool SessionImpl::process_flx_bootstrap_message(const SyncProgress& progress, DownloadBatchState batch_state,
int64_t query_version, const ReceivedChangesets& received_changesets)
int64_t query_version, DownloadableProgress download_progress,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: at this point should we just pass a ref to the DownloadMessage rather than picking out its members as separate arguments here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think that's a bit better. Should probably propagate it further, really.

});
task->start().get();

// Since we set the soft byte limit to one byte, we should have received
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is the soft byte limit getting set?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale comment from when I was trying to use qbs_download_changeset_soft_max_byte_size rather than making the objects big.

@tgoyne tgoyne force-pushed the tg/flx-download-progress branch from e387595 to 0834fda Compare July 11, 2024 17:55
Copy link
Contributor

@jbreams jbreams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this lgtm, but i'd like @kiburtse's review as well.

@@ -916,8 +922,8 @@ void SessionImpl::process_pending_flx_bootstrap()
batch_state, pending_batch.changesets.size());

history.integrate_server_changesets(
*pending_batch.progress, downloadable_bytes, pending_batch.changesets, new_version, batch_state, logger,
transact, [&](const TransactionRef& tr, util::Span<Changeset> changesets_applied) {
*pending_batch.progress, 1.0, pending_batch.changesets, new_version, batch_state, logger, transact,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this hardcoded to 1.0 so the progress doesn't change (e.g., go back to 0) while the bootstrap messages are being integrated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it wouldn't change regardless. This was always a hardcoded value, but it used to be a hardcoded value declared sufficiently far from the point of use that it wasn't obviously such.

Copy link
Contributor

@kiburtse kiburtse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this is not fully resolved the issue - general case for reconnect and catch up still reports only 1.0 for progress

Since we got this wrong so many times, i think, we have to check in tests actual values for the progress. Precise value of the estimate is not under our control here, but may be we can at least pass raw value from the download message through debug hook and verify that this value propagates through the implementation and is seen in objstore callback in right order?

@@ -1701,28 +1708,28 @@ void SessionWrapper::report_progress(ReportedProgress& p, DownloadableProgress d
upload_estimate = calculate_progress(p.uploaded, p.uploadable, m_final_uploaded);

bool download_completed = p.downloaded == 0;
double download_estimate = 1.00;
p.download_estimate = 1.00;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that check below should be also fixed to take into account initial history scan case where changes are applied directly without hitting bootstrap store first

i think it should just check that this is the flx connection and set estimate unconditionally since it's always provided

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I wrote up a quick test case that demonstrates the issue:

    SECTION("history scan") {
        config.sync_config->subscription_initializer = [](const std::shared_ptr<Realm>& realm) {
            subscribe_to_all(*realm);
        };

        // Open a realm and bootstrap it so that we are getting progress notifications for history
        // scans rather than bootstraps.
        {
            successfully_async_open_realm(config);
        }

        // In another realm, add some more objects to the server that will need to be downloaded
        // in a bunch of changesets.
        harness->do_with_new_realm([](const std::shared_ptr<Realm>& realm) {
            const size_t padding_size = 1024 * 1024;
            auto buffer = std::make_unique<char[]>(padding_size);
            auto table = realm->read_group().get_table("class_object");
            subscribe_to_all(*realm);
            for (int i = 5; i < 10; ++i) {
                realm->begin_transaction();
                auto obj = table->create_object_with_primary_key(ObjectId::gen());
                obj.set("int", i);
                // ensure that each object is large enough that it'll be sent in
                // a separate DOWNLOAD message
                obj.set("padding", BinaryData(buffer.get(), padding_size));
                realm->commit_transaction();
            }
            wait_for_upload(*realm);
        });

        // Re-open the original realm and wait for it to download everything. We should see
        // a new batch of progress esimates that are also monotonically increasing.
        auto task = Realm::get_synchronized_realm(config);
        std::vector<double> estimates;
        task->register_download_progress_notifier([&](uint64_t, uint64_t, double estimate) {
            // Note that no locking is needed here despite this being called on
            // a background thread as the test provides the required synchronization.
            // If tsan complains about this, it indicates that the notifier is
            // being called at a time that it shouldn't be and there's a bug.
            estimates.push_back(estimate);
        });
        task->start().get();
        REQUIRE(estimates.size() >= 5);
        REQUIRE_THAT(estimates, EstimatesAreValid());
    }

if (m_flx_pending_bootstrap_store) {
if (m_flx_pending_bootstrap_store->has_pending()) {
download_estimate = downloadable.as_estimate();
p.download_estimate = downloadable.as_estimate();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you moved this down to line 1716, right before download_completed gets set, that would solve the issue @kiburtse is calling out above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, obviously we can get download progress estimates while in the steady state and not just when bootstrapping.

@@ -1701,28 +1708,28 @@ void SessionWrapper::report_progress(ReportedProgress& p, DownloadableProgress d
upload_estimate = calculate_progress(p.uploaded, p.uploadable, m_final_uploaded);

bool download_completed = p.downloaded == 0;
double download_estimate = 1.00;
p.download_estimate = 1.00;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I wrote up a quick test case that demonstrates the issue:

    SECTION("history scan") {
        config.sync_config->subscription_initializer = [](const std::shared_ptr<Realm>& realm) {
            subscribe_to_all(*realm);
        };

        // Open a realm and bootstrap it so that we are getting progress notifications for history
        // scans rather than bootstraps.
        {
            successfully_async_open_realm(config);
        }

        // In another realm, add some more objects to the server that will need to be downloaded
        // in a bunch of changesets.
        harness->do_with_new_realm([](const std::shared_ptr<Realm>& realm) {
            const size_t padding_size = 1024 * 1024;
            auto buffer = std::make_unique<char[]>(padding_size);
            auto table = realm->read_group().get_table("class_object");
            subscribe_to_all(*realm);
            for (int i = 5; i < 10; ++i) {
                realm->begin_transaction();
                auto obj = table->create_object_with_primary_key(ObjectId::gen());
                obj.set("int", i);
                // ensure that each object is large enough that it'll be sent in
                // a separate DOWNLOAD message
                obj.set("padding", BinaryData(buffer.get(), padding_size));
                realm->commit_transaction();
            }
            wait_for_upload(*realm);
        });

        // Re-open the original realm and wait for it to download everything. We should see
        // a new batch of progress esimates that are also monotonically increasing.
        auto task = Realm::get_synchronized_realm(config);
        std::vector<double> estimates;
        task->register_download_progress_notifier([&](uint64_t, uint64_t, double estimate) {
            // Note that no locking is needed here despite this being called on
            // a background thread as the test provides the required synchronization.
            // If tsan complains about this, it indicates that the notifier is
            // being called at a time that it shouldn't be and there's a bug.
            estimates.push_back(estimate);
        });
        task->start().get();
        REQUIRE(estimates.size() >= 5);
        REQUIRE_THAT(estimates, EstimatesAreValid());
    }

tgoyne added 3 commits July 12, 2024 11:16
We need to store the download progress for each batch of a bootstrap and not
just at the end for it to be useful in any way.

The server will sometimes send us DOWNLOAD messages with a non-one estimate
followed by a one estimate where the byte-level information is the same (as the
final message is empty). When this happens we need to report the download
completion to the user, so add the estimate to the fields checked for changes.

A subscription change which doesn't actually change what set of objects is in
view can result in an empty DOWNLOAD message with no changes other than the
query version, and we should report that too.
@tgoyne tgoyne force-pushed the tg/flx-download-progress branch from 0834fda to 3b3daac Compare July 12, 2024 18:17
@tgoyne
Copy link
Member Author

tgoyne commented Jul 12, 2024

I've added a test for steady-state progress reporting and made it actually work.

I don't like the idea of using a debug hook to get the raw value which was reported by the server because the existing debug hooks have already been making it much more difficult to modify the code and they really need to be a last resort for when there's no other way to test something. I did make the consistency checks for the estimates quite a bit more precise, though. It now verifies that the progress goes up by about the same amount each time, which should always be the case since the objects being downloaded are all the same size.

@tgoyne tgoyne force-pushed the tg/flx-download-progress branch from 3b3daac to 2f37381 Compare July 12, 2024 20:03
@kiburtse
Copy link
Contributor

I've added a test for steady-state progress reporting and made it actually work.

I don't like the idea of using a debug hook to get the raw value which was reported by the server because the existing debug hooks have already been making it much more difficult to modify the code and they really need to be a last resort for when there's no other way to test something. I did make the consistency checks for the estimates quite a bit more precise, though. It now verifies that the progress goes up by about the same amount each time, which should always be the case since the objects being downloaded are all the same size.

@@ -1710,7 +1710,7 @@ void SessionWrapper::report_progress(ReportedProgress& p, DownloadableProgress d
     bool download_completed = p.downloaded == 0;
     p.download_estimate = 1.00;
     if (m_flx_pending_bootstrap_store) {
-        p.download_estimate = downloadable.as_estimate();
+        // p.download_estimate = downloadable.as_estimate();
         if (m_flx_pending_bootstrap_store->has_pending()) {
             p.downloaded += m_flx_pending_bootstrap_store->pending_stats().pending_changeset_bytes;
         }
$ >> ./test/object-store/realm-object-store-tests.app/Contents/MacOS/realm-object-store-tests -# [#progress_notifications]
Filters: [#progress_notifications]
Randomness seeded to: 4134489157
===============================================================================
All tests passed (273 assertions in 6 test cases)

that was essentially the problem, we really need better checks, no?

@tgoyne
Copy link
Member Author

tgoyne commented Jul 12, 2024

Oh, I removed the monotonic check while cleaning up the test and that wasn't actually redundant. We also can put an upper bound on what progress the first estimate can have.

@tgoyne tgoyne force-pushed the tg/flx-download-progress branch from 2f37381 to 180752e Compare July 12, 2024 21:52
@tgoyne tgoyne force-pushed the tg/flx-download-progress branch from 180752e to 4a90688 Compare July 12, 2024 21:52
@tgoyne
Copy link
Member Author

tgoyne commented Jul 12, 2024

I have updated the test to actually work:

realm-object-store-tests(8668,0x1ff690c00) malloc: nano zone abandoned due to inability to reserve vm space.
Filters: "sync progress: flx download progress"
Randomness seeded to: 1511695017

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
realm-object-store-tests is a Catch2 v3.5.3 host application.
Run with -? for options

-------------------------------------------------------------------------------
sync progress: flx download progress
  async open with initial subscriptions
-------------------------------------------------------------------------------
/opt/s/core-3/test/object-store/sync/session/progress_notifications.cpp:1411
...............................................................................

/opt/s/core-3/test/object-store/sync/session/progress_notifications.cpp:1428: FAILED:
  REQUIRE_THAT( estimates, EstimatesAreValid() )
with expansion:
  { 1.0, 1.0, 1.0, 1.0, 1.0 } estimated progress must progress from non-1 to 1
  in fixed-size non-zero steps

-------------------------------------------------------------------------------
sync progress: flx download progress
  multiple subscription updates which each trigger some downloads
-------------------------------------------------------------------------------
/opt/s/core-3/test/object-store/sync/session/progress_notifications.cpp:1431
...............................................................................

/opt/s/core-3/test/object-store/sync/session/progress_notifications.cpp:1451: FAILED:
  REQUIRE_THAT( estimates, EstimatesAreValid() )
with expansion:
  { 1.0, 1.0 } estimated progress must progress from non-1 to 1 in fixed-size
  non-zero steps

-------------------------------------------------------------------------------
sync progress: flx download progress
  add new objects while in the steady state
-------------------------------------------------------------------------------
/opt/s/core-3/test/object-store/sync/session/progress_notifications.cpp:1491
...............................................................................

/opt/s/core-3/test/object-store/sync/session/progress_notifications.cpp:1550: FAILED:
  REQUIRE_THAT( suspended_estimates, EstimatesAreValid(5) )
with expansion:
  { 1.0, 1.0, 1.0, 1.0, 1.0 } estimated progress must progress from non-1 to 1
  in fixed-size non-zero steps

===============================================================================
test cases:  1 |  0 passed | 1 failed
assertions: 32 | 29 passed | 3 failed

@tgoyne tgoyne merged commit 4f83c59 into master Jul 15, 2024
41 checks passed
@tgoyne tgoyne deleted the tg/flx-download-progress branch July 15, 2024 18:38
@github-actions github-actions bot mentioned this pull request Jul 19, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 14, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

download progress estimate is always 1.0 No-op subscription changes do not report download progress
4 participants