Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

latest_at very slow (O(N)?) #1545

Closed
emilk opened this issue Mar 9, 2023 · 5 comments · Fixed by #1558
Closed

latest_at very slow (O(N)?) #1545

emilk opened this issue Mar 9, 2023 · 5 comments · Fixed by #1558
Assignees
Labels
🚀 performance Optimization, memory use, etc ⛃ re_datastore affects the datastore itself

Comments

@emilk
Copy link
Member

emilk commented Mar 9, 2023

latest_at gets very slow when there is a lot of data points. It looks like there is O(N) behavior.

Easiest repro is with:

just py-build
examples/python/clock/main.py --steps 50000 --save clock_50k.rrd
cargo r -p rerun -- clock_50k.rrd --profile

Screen Shot 2023-03-09 at 18 32 43

We see that each latest_at call goes through 5050 buckets

@emilk emilk added ⛃ re_datastore affects the datastore itself 🚀 performance Optimization, memory use, etc labels Mar 9, 2023
@teh-cmc
Copy link
Member

teh-cmc commented Mar 9, 2023

I believe this is a particularly exacerbated manifestation of #453.

Every frame, the viewer asks for the latest (i.e. time=+∞!) Transform component for each entity in the scene, but there aren't any to be found since clocks doesn't log any.
To answer these queries we look for the bucket that corresponds to time=+∞ in O(log(n)) (which of course always turns out to be the last one) and then start an O(n) backward walk from there until we get all the way back to the first bucket, only to realize there never was anything there.

It's very pronounced in this particular case because clocks @ 50k creates thousands of buckets for every entity.
There's an early check when picking a wrong bucket, but it's behind a lock which makes things even worse when running in debug.

@emilk
Copy link
Member Author

emilk commented Mar 10, 2023

So a quick-fix for this case would be a top-level early-out for "this entity doesn't even have this component".

Another question is why we get so many buckets. Every ~ten steps create a new bucket, which seems very wrong to me. Each step of the clock is only logging three points and three arrows.

@teh-cmc
Copy link
Member

teh-cmc commented Mar 10, 2023

So a quick-fix for this case would be a top-level early-out for "this entity doesn't even have this component".

Yep, was about to open a PR for that; we even already have a benchmark for it.

Another question is why we get so many buckets. Every ~ten steps create a new bucket, which seems very wrong to me. Each step of the clock is only logging three points and three arrows.

Default config for indices is:

index_bucket_size_bytes: 32 * 1024, // 32kiB
index_bucket_nb_rows: 1024,

so this should actually create 1 bucket for every 1k entries... Not sure what's going on there yet, but hopefully I'm finally going to dig into the GC issues today, and it wouldnt surprise me that the 2 are related...

@emilk
Copy link
Member Author

emilk commented Mar 10, 2023

A UI inspector for the data store would also be very useful in order to investigate issues like these:

@teh-cmc
Copy link
Member

teh-cmc commented Mar 10, 2023

Another question is why we get so many buckets. Every ~ten steps create a new bucket, which seems very wrong to me. Each step of the clock is only logging three points and three arrows.

Investigating this further with the help of #1555, there actually doesn't seem to be any actual bug here but rather a misconfiguration.

image

clock 50k logs data for 6 entities, on 2 timelines, 50k times: that's 2 * 6 * 50k = 600k index rows which matches the stats from the store we see on the screenshot above.
That's approximately 30MiB worth of index data.

In LogDb, the DataStore is intantiated with:

data_store: re_arrow_store::DataStore::new(
    InstanceKey::name(),
    DataStoreConfig {
        component_bucket_size_bytes: 1024 * 1024, // 1 MiB
        index_bucket_size_bytes: 1024,            // 1 KiB
        ..Default::default()
    },
),

I.e. we split in half any bucket with more than 1024 index rows (default) or more than 1KiB worth of index data.

So, accounting for row limits, we have: 600k / 1024 * 2 = 600 buckets (the extra * 2 is because we split in half and end up with 2 already half-full buckets), so clearly that's not the limit we're hitting in this case.

Accounting for the size limit OTOH: 30MiB / 1KiB * 2 = 60k buckets, which matches the current situation.


Now, when it comes to index buckets, row limits are what actually matters as they put an upper bound on the cost of sorting the bucket.

Size limits don't matter at all OTOH, since we don't even GC index buckets anymore at the moment (because of the MsgId mismatch problem, which is the exact same issue described in #1535 and is generally the root of all our issues of that nature).

The fix here should be to remove the index data size limit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🚀 performance Optimization, memory use, etc ⛃ re_datastore affects the datastore itself
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants