Specify the expected semantics of clear APIs in tests #3287

teh-cmc · 2023-09-12T09:16:21Z

It has been clear from earlier discussions that not everybody is aligned on what should be the semantics of our clear APIs.
We need to agree on those and materialize them as unit tests both as a mean of specification and to prevent regressions.

Related: the implementation of #3023 revealed some flakiness issues that should have been caught by these tests to-be.

Some examples

Assuming that we're querying for LatestAt("frame", 2): what are the expected semantics for this?

rr.set_time_sequence("frame", 1)
rr2.log("points", rr2.Points2D([[0, 0], [1, 1]]))
rr2.log("points", rr2.Clear(True))

Or this?

rr.set_time_sequence("frame", 1)
rr2.log("points", rr2.Clear(True))
rr2.log("points", rr2.Points2D([[0, 0], [1, 1]]))

What about this?

rr.set_time_sequence("frame", 1)
rr2.log("points", rr2.Points2D([[0, 0], [1, 1]]))
rr2.log("points", rr2.Clear(True))
rr2.log("points", rr2.Points2D([[0.5, 0.5], [1.5, 1.5]]))

I.e. do clear APIs follow the usual no-global-ordering/per-thread-row-id-ordering model, or do they break away from that in order to make distributed clears easier to work with?

Update: also, what about the very weird case where a row comes in that contains both ClearSettings component and a bunch of other components? For now, #3023 will make it so the ClearSettings component "wins" over everything else.

The text was updated successfully, but these errors were encountered:

teh-cmc · 2023-09-12T11:26:18Z

Quoting @jleibs:

The existence of pending clear is to handle cases where data is received after the clear event, but still has a time stamp <= the clear event.

So then the most important thing to specify and test is what happens to rects in e.g.

rr.set_time_sequence("frame", 1)
rr2.log("points", rr2.Points2D([[0, 0], [1, 1]]))
rr2.log("points", rr2.Clear(True))

rr.set_time_sequence("frame", 0) # !!
rr2.log("points", rr2.Rect2D([...]))

The clear APIs insert multiple rows worth of data (1 per component) that all re-use the same `RowId` (the `RowId` that was used to do the log_clear() call itself). This leads to a flaky ordering of these rows internally, as there is no way to tie-break between them. It is also less performant than it should be as all of this data could and should live on the same row. The problem is accentuated when doing recursive clears, since in that case the row ID is not only shared between multiple rows of the same entity, but also multiple rows across multiple entities: even more ordering flakiness. This flaky ordering then leaks through the public APIs (e.g. range queries) and, importantly, through roundtrip tests. This is how I first encoutered this issue: #3023 is blocked by this because the roundtrip tests for recursive clears are flaky. This PR fixes that. Unfortunately we don't have a test suite in place for clear APIs (AFAIK), and I'm not even sure what the expected behavior is in all cases, so I cannot write that test suite for now. I've opened an issue for this: - #3287 In the meantime, the roundtrip tests introduced in #3023 will act as a regression test at the very least. ### What ### Checklist * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/3288) (if applicable) - [PR Build Summary](https://build.rerun.io/pr/3288) - [Docs preview](https://rerun.io/preview/7faa657db390c43112be6366b8c125f1a30a512c/docs)  - [Examples preview](https://rerun.io/preview/7faa657db390c43112be6366b8c125f1a30a512c/examples)  - [Recent benchmark results](https://ref.rerun.io/dev/bench/) - [Wasm size tracking](https://ref.rerun.io/dev/sizes/)

teh-cmc · 2023-11-28T13:56:57Z

The behavior of (pending) clears has been fully formalized in #4144.

The only thing that hasn't been formalized is the expected behavior of intra-timestamp tie-breaks for clears (mostly useful when logging to a single entity is distributed across multiple processes and clears are involved).

Today, this implicitly follow the same behavior as any other data: RowId order defines the "winner".
I'm going to make that behavior explicit in a test, and we can decide to change it in the future if needed.

teh-cmc · 2023-11-28T16:03:15Z

Today, this implicitly follow the same behavior as any other data: RowId order defines the "winner". I'm going to make that behavior explicit in a test, and we can decide to change it in the future if needed.

Except the RowId in question will be the RowId generated by the cascaded clear mechanism, which as far as the end user is concerned might as well be unspecified behavior.

So no, we have no way to formalize that at the moment (other than documenting that it is in fact undefined behavior).

teh-cmc added 🔨 testing testing and benchmarks ⛃ re_datastore affects the datastore itself labels Sep 12, 2023

teh-cmc mentioned this issue Sep 12, 2023

Fix row ordering flakiness when using clear APIs #3288

Merged

3 tasks

teh-cmc self-assigned this Nov 28, 2023

teh-cmc added the 🔩 data model label Nov 28, 2023

teh-cmc removed their assignment Nov 28, 2023

teh-cmc mentioned this issue Jun 18, 2024

Query-time clears (latest-at only) #6586

Merged

5 tasks

teh-cmc closed this as completed in #6586 Jul 4, 2024

teh-cmc closed this as completed in 176b937 Jul 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specify the expected semantics of clear APIs in tests #3287

Specify the expected semantics of clear APIs in tests #3287

teh-cmc commented Sep 12, 2023 •

edited

Loading

teh-cmc commented Sep 12, 2023

teh-cmc commented Nov 28, 2023

teh-cmc commented Nov 28, 2023

Specify the expected semantics of clear APIs in tests #3287

Specify the expected semantics of clear APIs in tests #3287

Comments

teh-cmc commented Sep 12, 2023 • edited Loading

Some examples

teh-cmc commented Sep 12, 2023

teh-cmc commented Nov 28, 2023

teh-cmc commented Nov 28, 2023

teh-cmc commented Sep 12, 2023 •

edited

Loading