Garbage collection should be aware of `app_id`/`recording_id` semantics #1904

teh-cmc · 2023-04-18T10:55:24Z

We've seen plenty of reports of users that start a Rerun instance and then run their algorithm a bunch of times as they go through their iterative improvement cycle.

It ends up looking like a little this:

Now obviously at some point these users run out of memory, at which point they learn about --memory-limit.
That's all fine except that garbage collection is completely unaware of these app_id/recording_id semantics, and so will only purge the currently active datastore (which likely contains the only data that the user still cares about at this point) while the old recordings are left untouched forever.

The text was updated successfully, but these errors were encountered:

emilk · 2023-04-18T11:01:08Z

Show all loaded recordings as tabs #1905

emilk · 2023-04-18T12:16:29Z

There are many ways to solve this:

GC the oldest recording first
GC all at the same rate
…

Different use cases have different requirements. What is obvious that the current behavior sucks.

I suggest we just GC every open recording, OR drop the oldest recording, whatever is easier.

nikolausWest · 2023-09-11T08:05:35Z

Related ?

datastore: bake latest-at semantics into the garbage collector #1803

teh-cmc · 2023-11-07T08:35:48Z

As far as I'm aware we now evenly distribute GC pass across all recordings:

pub fn purge_fraction_of_ram(&mut self, fraction_to_purge: f32) {
    re_tracing::profile_function!();

    for store_db in self.store_dbs.values_mut() {
        store_db.purge_fraction_of_ram(fraction_to_purge);
    }
}

which is overall an improvement, but there probably should be a blueprint setting to configure whether you want to distribute evenly, or prioritize cleaning up previous recordings of the same app_id first.

nikolausWest · 2023-11-07T08:50:45Z

I think we should make the default behavior be to drop old data in a way that if you are running serial experiments, old recordings get dropped first, and if you are doing parallel experiments, we drop data from all recordings evenly. That is, we should drop based on time.

nikolausWest · 2023-11-07T08:52:04Z

As it stands, I think this is a blocker for the hugging face spaces demo

emilk · 2023-11-07T13:17:51Z

A good starting strategy: only run gc on oldest recording. When it is empty, close it.

Werinkle: row-protection

**Commit by commit, there's renaming involved!** GC will now focus on the oldest-modified recording first. Tried a lot of fancy things, but a lot of stress testing has shown that nothing worked as well as doing this the dumb way. Speaking of stress testing, the scripts I've used are now committed in the repository. Make sure to try them out when modifying the GC code :grimacing:. In general, the GC supports stress much better than I thought/hoped: - `many_medium_sized_single_row_recordings.py`, `many_medium_sized_many_rows_recordings.py` & `many_large_many_rows_recordings.py` all behave pretty nicely, something like this: https://github.com/rerun-io/rerun/assets/2910679/26f67d69-de0e-4002-8936-2ac32c451cc3 - `many_large_single_row_recordings.py` on the other hand is _still_ a disaster (watch til the end, this slowly devolves into a blackhole): https://github.com/rerun-io/rerun/assets/2910679/673ee10c-2eca-4e3e-b285-77714e5c3d61 This is not a new problem (not to me at least 😬), large recordings with very few rows have always been a nightmare on the GC (not specifically the DataStore GC, the GC as a whole through the entire app). I've never had time to investigate why, but now we have an issue for it at least: - #4185 --- - Fixes #1904

teh-cmc added 🪳 bug Something isn't working ⛃ re_datastore affects the datastore itself 📺 re_viewer affects re_viewer itself labels Apr 18, 2023

teh-cmc mentioned this issue Apr 18, 2023

Tracking issue: Datastore performance/QOL improvements #1898

Closed

23 tasks

teh-cmc self-assigned this Nov 8, 2023

teh-cmc mentioned this issue Nov 8, 2023

Implement recording/last-modified-at aware garbage collection #4183

Merged

4 tasks

teh-cmc closed this as completed in #4183 Nov 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Garbage collection should be aware of `app_id`/`recording_id` semantics #1904

Garbage collection should be aware of `app_id`/`recording_id` semantics #1904

teh-cmc commented Apr 18, 2023

emilk commented Apr 18, 2023

emilk commented Apr 18, 2023

nikolausWest commented Sep 11, 2023

teh-cmc commented Nov 7, 2023

nikolausWest commented Nov 7, 2023

nikolausWest commented Nov 7, 2023

emilk commented Nov 7, 2023

Garbage collection should be aware of app_id/recording_id semantics #1904

Garbage collection should be aware of app_id/recording_id semantics #1904

Comments

teh-cmc commented Apr 18, 2023

emilk commented Apr 18, 2023

emilk commented Apr 18, 2023

nikolausWest commented Sep 11, 2023

teh-cmc commented Nov 7, 2023

nikolausWest commented Nov 7, 2023

nikolausWest commented Nov 7, 2023

emilk commented Nov 7, 2023

Garbage collection should be aware of `app_id`/`recording_id` semantics #1904

Garbage collection should be aware of `app_id`/`recording_id` semantics #1904