-
Notifications
You must be signed in to change notification settings - Fork 334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelize data ingestion #4298
Comments
|
We really need to move ingestion off of the UI thread -- this really makes using Rerun hard on some OSes / window managers (including mine). Reminder: somehow all of this needs to work on the web too. |
An alternative (at least in the short term) is to ensure data ingestion still happens when Rerun is hidden: This may be an easier fix, though I agree proper parallel data ingestion is more desirable. |
Not blocked anymore: we have the chunks, and we have the storage handles. That doesn't mean it'll be easy though, far from it. |
(Random thoughts as I'm struggling with slow ingestion times)
Speaking of multi-threading: further down the line the bottleneck for real-time use cases is probably gonna be the speed of ingestion when using
connect()
, which is currently bounded by the time it takes to render the UI since they both run serially on the same thread (right?).There are two main fronts that I can think of to improve things, though they are both far more challenging than parallelizing space-views:
Run insertion and UI rendering in parallel.
This requires a consistent view of the datastore for the entire duration of a frame, so results are consistent across views.
The standard approach would be to implement some kind of snapshotting/MVCC for the store and its views (which would actually be possible thanks to our
RowId
s), but thankfully unnecessary in our case if we follow our plan of handling all query work in one centralized place (just lock everything, get the data out, unlock, then render UI as usual).This asynchronicity between the store and the UI rendering is something we need to move towards to in any case in order to ultimately get the database out in its own process (the "hub").
Parallelize insertions into the store.
This should not be too hard as long as we partition on
(EntityPath, Timeline)
so that it matches the natural indexing of the store (though Im sure we're going to discover some nasty race conditions on the way).Similarly, most of our builtin store-subscribers should be capable of updating in parallel when following this partition scheme.
On the bright side, this also means that the upcoming work to multi-thread the UI rendering should actually improve ingestion speeds too by side-effect:
The text was updated successfully, but these errors were encountered: