-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
latest_at
very slow (O(N)
?)
#1545
Comments
I believe this is a particularly exacerbated manifestation of #453. Every frame, the viewer asks for the latest (i.e. It's very pronounced in this particular case because |
So a quick-fix for this case would be a top-level early-out for "this entity doesn't even have this component". Another question is why we get so many buckets. Every ~ten steps create a new bucket, which seems very wrong to me. Each step of the clock is only logging three points and three arrows. |
Yep, was about to open a PR for that; we even already have a benchmark for it.
Default config for indices is: index_bucket_size_bytes: 32 * 1024, // 32kiB
index_bucket_nb_rows: 1024, so this should actually create 1 bucket for every 1k entries... Not sure what's going on there yet, but hopefully I'm finally going to dig into the GC issues today, and it wouldnt surprise me that the 2 are related... |
A UI inspector for the data store would also be very useful in order to investigate issues like these: |
Investigating this further with the help of #1555, there actually doesn't seem to be any actual bug here but rather a misconfiguration.
In data_store: re_arrow_store::DataStore::new(
InstanceKey::name(),
DataStoreConfig {
component_bucket_size_bytes: 1024 * 1024, // 1 MiB
index_bucket_size_bytes: 1024, // 1 KiB
..Default::default()
},
), I.e. we split in half any bucket with more than So, accounting for row limits, we have: Accounting for the size limit OTOH: Now, when it comes to index buckets, row limits are what actually matters as they put an upper bound on the cost of sorting the bucket. Size limits don't matter at all OTOH, since we don't even GC index buckets anymore at the moment (because of the The fix here should be to remove the index data size limit. |
latest_at
gets very slow when there is a lot of data points. It looks like there isO(N)
behavior.Easiest repro is with:
We see that each
latest_at
call goes through 5050 bucketsThe text was updated successfully, but these errors were encountered: