`latest_at` very slow (`O(N)`?) #1545

emilk · 2023-03-09T17:34:09Z

latest_at gets very slow when there is a lot of data points. It looks like there is O(N) behavior.

Easiest repro is with:

just py-build
examples/python/clock/main.py --steps 50000 --save clock_50k.rrd
cargo r -p rerun -- clock_50k.rrd --profile

We see that each latest_at call goes through 5050 buckets

The text was updated successfully, but these errors were encountered:

teh-cmc · 2023-03-09T22:06:17Z

I believe this is a particularly exacerbated manifestation of #453.

Every frame, the viewer asks for the latest (i.e. time=+∞!) Transform component for each entity in the scene, but there aren't any to be found since clocks doesn't log any.
To answer these queries we look for the bucket that corresponds to time=+∞ in O(log(n)) (which of course always turns out to be the last one) and then start an O(n) backward walk from there until we get all the way back to the first bucket, only to realize there never was anything there.

It's very pronounced in this particular case because clocks @ 50k creates thousands of buckets for every entity.
There's an early check when picking a wrong bucket, but it's behind a lock which makes things even worse when running in debug.

emilk · 2023-03-10T08:41:48Z

So a quick-fix for this case would be a top-level early-out for "this entity doesn't even have this component".

Another question is why we get so many buckets. Every ~ten steps create a new bucket, which seems very wrong to me. Each step of the clock is only logging three points and three arrows.

teh-cmc · 2023-03-10T08:52:56Z

So a quick-fix for this case would be a top-level early-out for "this entity doesn't even have this component".

Yep, was about to open a PR for that; we even already have a benchmark for it.

Another question is why we get so many buckets. Every ~ten steps create a new bucket, which seems very wrong to me. Each step of the clock is only logging three points and three arrows.

Default config for indices is:

index_bucket_size_bytes: 32 * 1024, // 32kiB
index_bucket_nb_rows: 1024,

so this should actually create 1 bucket for every 1k entries... Not sure what's going on there yet, but hopefully I'm finally going to dig into the GC issues today, and it wouldnt surprise me that the 2 are related...

emilk · 2023-03-10T09:29:05Z

A UI inspector for the data store would also be very useful in order to investigate issues like these:

re_datastore: arrow table debug-view #404

teh-cmc · 2023-03-10T11:32:01Z

Another question is why we get so many buckets. Every ~ten steps create a new bucket, which seems very wrong to me. Each step of the clock is only logging three points and three arrows.

Investigating this further with the help of #1555, there actually doesn't seem to be any actual bug here but rather a misconfiguration.

clock 50k logs data for 6 entities, on 2 timelines, 50k times: that's 2 * 6 * 50k = 600k index rows which matches the stats from the store we see on the screenshot above.
That's approximately 30MiB worth of index data.

In LogDb, the DataStore is intantiated with:

data_store: re_arrow_store::DataStore::new(
    InstanceKey::name(),
    DataStoreConfig {
        component_bucket_size_bytes: 1024 * 1024, // 1 MiB
        index_bucket_size_bytes: 1024,            // 1 KiB
        ..Default::default()
    },
),

I.e. we split in half any bucket with more than 1024 index rows (default) or more than 1KiB worth of index data.

So, accounting for row limits, we have: 600k / 1024 * 2 = 600 buckets (the extra * 2 is because we split in half and end up with 2 already half-full buckets), so clearly that's not the limit we're hitting in this case.

Accounting for the size limit OTOH: 30MiB / 1KiB * 2 = 60k buckets, which matches the current situation.

Now, when it comes to index buckets, row limits are what actually matters as they put an upper bound on the cost of sorting the bucket.

Size limits don't matter at all OTOH, since we don't even GC index buckets anymore at the moment (because of the MsgId mismatch problem, which is the exact same issue described in #1535 and is generally the root of all our issues of that nature).

The fix here should be to remove the index data size limit.

emilk added ⛃ re_datastore affects the datastore itself 🚀 performance Optimization, memory use, etc labels Mar 9, 2023

This was referenced Mar 9, 2023

Add re_arrow_store profile scopes #1546

Merged

The time panel is slow when viewing a lot of data points #619

Closed

teh-cmc mentioned this issue Mar 10, 2023

datastore: early exit missing components at table level #1554

Merged

teh-cmc self-assigned this Mar 10, 2023

teh-cmc mentioned this issue Mar 10, 2023

LogDb: dont split on index bucket size #1558

Merged

teh-cmc closed this as completed in #1558 Mar 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`latest_at` very slow (`O(N)`?) #1545

`latest_at` very slow (`O(N)`?) #1545

emilk commented Mar 9, 2023

teh-cmc commented Mar 9, 2023

emilk commented Mar 10, 2023

teh-cmc commented Mar 10, 2023

emilk commented Mar 10, 2023

teh-cmc commented Mar 10, 2023

latest_at very slow (O(N)?) #1545

latest_at very slow (O(N)?) #1545

Comments

emilk commented Mar 9, 2023

teh-cmc commented Mar 9, 2023

emilk commented Mar 10, 2023

teh-cmc commented Mar 10, 2023

emilk commented Mar 10, 2023

teh-cmc commented Mar 10, 2023

`latest_at` very slow (`O(N)`?) #1545

`latest_at` very slow (`O(N)`?) #1545