Parallelize data ingestion #4298

teh-cmc · 2023-11-21T16:46:35Z

(Random thoughts as I'm struggling with slow ingestion times)

Speaking of multi-threading: further down the line the bottleneck for real-time use cases is probably gonna be the speed of ingestion when using connect(), which is currently bounded by the time it takes to render the UI since they both run serially on the same thread (right?).

There are two main fronts that I can think of to improve things, though they are both far more challenging than parallelizing space-views:

Run insertion and UI rendering in parallel.
This requires a consistent view of the datastore for the entire duration of a frame, so results are consistent across views.
The standard approach would be to implement some kind of snapshotting/MVCC for the store and its views (which would actually be possible thanks to our RowIds), but thankfully unnecessary in our case if we follow our plan of handling all query work in one centralized place (just lock everything, get the data out, unlock, then render UI as usual).
This asynchronicity between the store and the UI rendering is something we need to move towards to in any case in order to ultimately get the database out in its own process (the "hub").
Parallelize insertions into the store.
This should not be too hard as long as we partition on (EntityPath, Timeline) so that it matches the natural indexing of the store (though Im sure we're going to discover some nasty race conditions on the way).
Similarly, most of our builtin store-subscribers should be capable of updating in parallel when following this partition scheme.

On the bright side, this also means that the upcoming work to multi-thread the UI rendering should actually improve ingestion speeds too by side-effect:

Parallelize space views #1325

The text was updated successfully, but these errors were encountered:

teh-cmc · 2024-04-23T10:21:19Z

Blocked on Store data in dense chunks #6066

teh-cmc · 2024-08-23T09:21:47Z

We really need to move ingestion off of the UI thread -- this really makes using Rerun hard on some OSes / window managers (including mine).

Reminder: somehow all of this needs to work on the web too.

emilk · 2024-10-22T06:48:58Z

We really need to move ingestion off of the UI thread -- this really makes using Rerun hard on some OSes / window managers (including mine).

An alternative (at least in the short term) is to ensure data ingestion still happens when Rerun is hidden:

Data ingestion paused when viewer is invisible on Wayland and Web #7427

This may be an easier fix, though I agree proper parallel data ingestion is more desirable.

teh-cmc · 2024-11-13T09:13:11Z

Not blocked anymore: we have the chunks, and we have the storage handles. That doesn't mean it'll be easy though, far from it.

teh-cmc added ⛃ re_datastore affects the datastore itself 📺 re_viewer affects re_viewer itself 📉 performance Optimization, memory use, etc labels Nov 21, 2023

emilk mentioned this issue Dec 1, 2023

Logging lots of out-of-order data using the same timepoint on a timeline leads to extreme slowdown in datastore #4415

Closed

teh-cmc self-assigned this Jan 30, 2024

teh-cmc modified the milestone: 0.13 Jan 30, 2024

teh-cmc mentioned this issue Apr 8, 2024

Consider a mode where the UI is "off" during ingestion, in order to limit the UI-specific overhead #5812

Closed

teh-cmc added the blocked can't make progress right now label Apr 23, 2024

teh-cmc mentioned this issue Aug 23, 2024

Memory usage increase indefinitely after window minimized on Linux #7254

Closed

emilk mentioned this issue Sep 16, 2024

Data ingestion paused when viewer is invisible on Wayland and Web #7427

Open

teh-cmc mentioned this issue Sep 24, 2024

Introduce cheaply clonable ChunkStoreHandle and QueryCacheHandle #7486

Closed

teh-cmc mentioned this issue Oct 11, 2024

Slow rendering causes data ingestion to be even slower #7678

Open

teh-cmc mentioned this issue Oct 28, 2024

Storage handles 1: Introduce ChunkStoreHandle #7917

Closed

6 tasks

teh-cmc removed the blocked can't make progress right now label Nov 13, 2024

teh-cmc mentioned this issue Nov 13, 2024

Performance degradation in Viewer when log multiple scalar values #8101

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize data ingestion #4298

Parallelize data ingestion #4298

teh-cmc commented Nov 21, 2023 •

edited

Loading

teh-cmc commented Apr 23, 2024

teh-cmc commented Aug 23, 2024 •

edited

Loading

emilk commented Oct 22, 2024

teh-cmc commented Nov 13, 2024 •

edited

Loading

Parallelize data ingestion #4298

Parallelize data ingestion #4298

Comments

teh-cmc commented Nov 21, 2023 • edited Loading

teh-cmc commented Apr 23, 2024

teh-cmc commented Aug 23, 2024 • edited Loading

emilk commented Oct 22, 2024

teh-cmc commented Nov 13, 2024 • edited Loading

teh-cmc commented Nov 21, 2023 •

edited

Loading

teh-cmc commented Aug 23, 2024 •

edited

Loading

teh-cmc commented Nov 13, 2024 •

edited

Loading