Logging lots of out-of-order data using the same timepoint on a timeline leads to extreme slowdown in datastore #4415

emilk · 2023-12-01T10:10:48Z

A user shared a puffin profile of very slow data ingestion

slow_ingestion.puffin.zip

What I see is this: the data ingestion (which is currently happening in the main rendering loop, see #4298) gets a message containing ~3800 rows of data. Processing that takes ~2 seconds, because each insertion takes a whopping ~9ms. The bulk of the insertion cost is re-sorting, which makes me suspect that the reason for this new slowdown is out-of-order insertions.

There are several layers of bad here:

As mentioned, we shouldn't be doing insertions on the main render loop (Parallelize data ingestion #4298)
If we do, we should make sure to not use more than a certain time budget
Each insertion should not be this slow, even with out-of-order insertion

Workarounds

Run the logging with RERUN_FLUSH_NUM_ROWS=1: this will create smaller chunks of work for the viewer, which should get back the viewer frame-rate, but still with slow ingestion
Stop logging data out-of-order. This could happen if you are using custom timelines that are not monotonically increasing. Rerun should support out-of-order logging, but it seems like we're hitting a bad corner case here.

Other notes

It would be great if we could tell from a puffin profile whether or not the code that was running was a debug or release build.
If this was from a debug build, then this whole issue is a mistake.

EDIT: Log text via puffin EmbarkStudios/puffin#171

The text was updated successfully, but these errors were encountered:

teh-cmc · 2023-12-01T11:25:43Z

The first thing that immediately jumps out to me is: why are the sorting routines even running at all on the ingestion path?!

There is only one very specific case where we ever try to sort during ingestion: the bucket is full and has to be split.
Since I don't see any trace of IndexedBucket::split in this flamegraph, it means we're early exiting before the actual split, which means the bucket only contains a single timestamp.

rerun/crates/re_arrow_store/src/store_write.rs

Lines 546 to 577 in 52ce18d

    
               fn split(&self) -> Option<(TimeInt, Self)> { 
        
                   let Self { 
        
                       timeline, 
        
                       cluster_key: _, 
        
                       inner, 
        
                   } = self; 
        
                   let mut inner1 = inner.write(); 
        
                   inner1.sort(); 
        
                   let IndexedBucketInner { 
        
                       is_sorted: _, 
        
                       time_range: time_range1, 
        
                       col_time: col_time1, 
        
                       col_insert_id: col_insert_id1, 
        
                       col_row_id: col_row_id1, 
        
                       col_num_instances: col_num_instances1, 
        
                       columns: columns1, 
        
                       size_bytes: _, // NOTE: recomputed below 
        
                   } = &mut *inner1; 
        
                   if col_time1.len() < 2 { 
        
                       return None; // early exit: can't split the unsplittable 
        
                   } 
        
                   if col_time1.first() == col_time1.last() { 
        
                       // The entire bucket contains only one timepoint, thus it's impossible to find 
        
                       // a split index to begin with. 
        
                       return None; 
        
                   } 
        
                   re_tracing::profile_function!();

But then why is the bucket sorted flag invalidated every frame?

This code looks correct:

rerun/crates/re_arrow_store/src/store_write.rs

Line 451 in 52ce18d

*is_sorted &= (*last_time, *last_row_id) <= (time.as_i64(), row.row_id());

So the only explanation I can think of is that the user is somehow repeatedly ingesting data at a single timestamp and doing so with out-of-order RowIds (maybe multiple processes and/or threads are involved)?

In any case I can probably make a patch to move that sort after the early exit: the bucket's time-range bounds are up-to-date even when the bucket is unsorted.

teh-cmc · 2023-12-01T11:26:19Z

It would be great if we could tell from a puffin profile whether or not the code that was running was a debug or release build.
If this was from a debug build, then this whole issue is a mistake.

+1 -- even the whole rerun versioning string ideally

…ath (#4417) This should fix #4415, although without knowing how exactly the user ended up in that situation it's hard to say for certain.

emilk · 2023-12-04T16:16:25Z

#4417 only pushed the problem to later:

The user was using rec.set_time_sequence temporarily for logging a few things, then never called disable_timeline. The user then went on to log 800k+ of other things, all ending up on the same timepoint in this "temporary" timeline.

EDIT: I have been shared a reproduce on a private repository. It logs everything on the main thread, yet the RowId:s are reaching the batcher out-of-order for some unknown reason.

EDIT2: The cause was this: #4658

emilk · 2023-12-05T13:16:11Z

We do log a warning:

[2023-12-05T13:00:56Z WARN re_arrow_store::store_write] Found over 512 rows with the same timepoint "keyframes"=#35 - perhaps you forgot to update or remove the timeline "keyframes"?

I think the only solution is to actually split these buckets so they don't grow.

### What Rerun is designed to be able to handle out-of-order ingestion, but with a performance hit. Usually that performance hit is very small, but when logging to the exact same timestamp many times, we hit a corner-case of the data store where we get a huge bucket (see #4415). If the data arrives out-of-order, this means an expensive resort of the huge bucket on each ingestion. Usually such out-of-order log rows should only happen in multi-threaded applications, but due to a bug it happened almost always. This PR fixes this bug, and adds regression test for it. This PR thus alleviates the problem in #4415, but does not fix it for actual out-of-order multi-threaded applications. ~I introduced the bug in #4502 after the 0.11 release.~ **EDIT:** no, it was pre-existing! <img src="https://github.com/rerun-io/rerun/assets/1148717/fde1f973-deec-46be-b6cc-b671968bf633" width="250"> ### Checklist * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested the web demo (if applicable): * Using newly built examples: [app.rerun.io](https://app.rerun.io/pr/4658/index.html) * Using examples from latest `main` build: [app.rerun.io](https://app.rerun.io/pr/4658/index.html?manifest_url=https://app.rerun.io/version/main/examples_manifest.json) * Using full set of examples from `nightly` build: [app.rerun.io](https://app.rerun.io/pr/4658/index.html?manifest_url=https://app.rerun.io/version/nightly/examples_manifest.json) * [x] The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG - [PR Build Summary](https://build.rerun.io/pr/4658) - [Docs preview](https://rerun.io/preview/b58459b740b693d86be9e9b3e846d37bf565f060/docs)  - [Examples preview](https://rerun.io/preview/b58459b740b693d86be9e9b3e846d37bf565f060/examples)  - [Recent benchmark results](https://build.rerun.io/graphs/crates.html) - [Wasm size tracking](https://build.rerun.io/graphs/sizes.html)

### What In order to investigate #4415 I wanted to reproduce it. This PR adds a benchmark that inserts many rows using the same timepoint, either in-order or shuffled. I also fixed a "bug" in the existing insertion benchmarks, which accidentally were _also_ measuring the time to call `DataTable::to_rows` (and collect the result), which was about 10% of the measured time. --- 5950X pinned to core #7 / Arch: ``` datastore/num_rows=1000/num_instances=1/shuffled=false/insert_same_time_point/insert 1.00 1641.2±28.89µs 595.0 KElem/sec datastore/num_rows=1000/num_instances=1/shuffled=true/insert_same_time_point/insert 1.00 3.0±0.04ms 328.4 KElem/sec datastore/num_rows=10000/num_instances=1/shuffled=false/insert_same_time_point/insert 1.00 27.1±1.59ms 361.0 KElem/sec datastore/num_rows=10000/num_instances=1/shuffled=true/insert_same_time_point/insert 1.00 162.5±1.23ms 60.1 KElem/sec ``` ### Checklist * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested the web demo (if applicable): * Using newly built examples: [app.rerun.io](https://app.rerun.io/pr/4676/index.html) * Using examples from latest `main` build: [app.rerun.io](https://app.rerun.io/pr/4676/index.html?manifest_url=https://app.rerun.io/version/main/examples_manifest.json) * Using full set of examples from `nightly` build: [app.rerun.io](https://app.rerun.io/pr/4676/index.html?manifest_url=https://app.rerun.io/version/nightly/examples_manifest.json) * [x] The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG - [PR Build Summary](https://build.rerun.io/pr/4676) - [Docs preview](https://rerun.io/preview/75ba0c0685dbea827b473270b1b350e348e72974/docs)  - [Examples preview](https://rerun.io/preview/75ba0c0685dbea827b473270b1b350e348e72974/examples)  - [Recent benchmark results](https://build.rerun.io/graphs/crates.html) - [Wasm size tracking](https://build.rerun.io/graphs/sizes.html) --------- Co-authored-by: Clement Rey <[email protected]>

emilk · 2024-01-05T12:00:34Z

Summary: When this issue was open, we hit this extreme slow-down every time we logged a lot of data using the same time point. Since then two PRs have landed:

The first reduces the sorting to only be done at query time.
This still makes the viewer really slow, but it can ingest a lot more between each slow frame,
perhaps reaching the end of the recording (if any) fast enough.

The second PR ensures we only hit the corner case if data is logged out-of-order.
It means the log rows arrive to the batcher in a different order than they were produced (swapped RowIds).
This mean there has to be a race: two threads logging to the same timeline, using the same time point.

In other words, the risk for this bug to produce extreme slowdowns is a lot smaller now.
It can still happen, but I think it is safe to lower its priority until we get more reports of it.

emilk · 2024-01-05T13:04:59Z

For future adventurers:

The problem with splitting the buckets further even when they share the same timepoint, is the definition of IndexTable::buckets:

pub buckets: BTreeMap<TimeInt, IndexedBucket>

Since BTreeMap doesn't allow duplicate keys, we would need to change the key there, or add "sub-buckets" inside IndexedBucket split on RowId. In either case, it is an annoying change.

emilk added 🪳 bug Something isn't working ⛃ re_datastore affects the datastore itself 🚀 performance Optimization, memory use, etc labels Dec 1, 2023

teh-cmc self-assigned this Dec 1, 2023

teh-cmc mentioned this issue Dec 1, 2023

Datastore: don't eagerly sort in bucket split rountine on ingestion path #4417

Merged

4 tasks

teh-cmc closed this as completed in #4417 Dec 1, 2023

teh-cmc added a commit that referenced this issue Dec 1, 2023

Datastore: don't eagerly sort in bucket split rountine on ingestion p…

ec77988

…ath (#4417) This should fix #4415, although without knowing how exactly the user ended up in that situation it's hard to say for certain.

emilk reopened this Dec 4, 2023

emilk assigned emilk and unassigned teh-cmc Dec 5, 2023

emilk changed the title ~~Extremely slow data ingestion~~ Logging lots of out-of-order data using the same timepoint on a timeline leads to extreme slowdown in datastore Dec 5, 2023

emilk added this to the Triage milestone Dec 5, 2023

emilk removed their assignment Dec 11, 2023

emilk mentioned this issue Jan 3, 2024

Fix wrong RowId order of logged data #4658

Merged

4 tasks

emilk self-assigned this Jan 4, 2024

emilk mentioned this issue Jan 4, 2024

Benchmark data store insertion using same timepoint #4676

Merged

4 tasks

emilk removed their assignment Jan 5, 2024

teh-cmc self-assigned this May 31, 2024

teh-cmc mentioned this issue Jun 18, 2024

Implement ChunkStore and integrate it everywhere #6570

Merged

5 tasks

teh-cmc closed this as completed in #6570 Jul 4, 2024

teh-cmc closed this as completed in ed20cba Jul 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logging lots of out-of-order data using the same timepoint on a timeline leads to extreme slowdown in datastore #4415

Logging lots of out-of-order data using the same timepoint on a timeline leads to extreme slowdown in datastore #4415

emilk commented Dec 1, 2023 •

edited

Loading

teh-cmc commented Dec 1, 2023

teh-cmc commented Dec 1, 2023

emilk commented Dec 4, 2023 •

edited

Loading

emilk commented Dec 5, 2023

emilk commented Jan 5, 2024 •

edited

Loading

emilk commented Jan 5, 2024

Logging lots of out-of-order data using the same timepoint on a timeline leads to extreme slowdown in datastore #4415

Logging lots of out-of-order data using the same timepoint on a timeline leads to extreme slowdown in datastore #4415

Comments

emilk commented Dec 1, 2023 • edited Loading

Workarounds

Other notes

teh-cmc commented Dec 1, 2023

teh-cmc commented Dec 1, 2023

emilk commented Dec 4, 2023 • edited Loading

emilk commented Dec 5, 2023

emilk commented Jan 5, 2024 • edited Loading

emilk commented Jan 5, 2024

emilk commented Dec 1, 2023 •

edited

Loading

emilk commented Dec 4, 2023 •

edited

Loading

emilk commented Jan 5, 2024 •

edited

Loading