Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC the blueprints before saving while preserving the current state #3148

Merged
merged 21 commits into from
Aug 30, 2023

Conversation

jleibs
Copy link
Member

@jleibs jleibs commented Aug 29, 2023

What

Resolves: #3098
Related to: #1803

Because blueprints used timeless data and timeless data wasn't GC'd, we previously had no great way to clean up blueprints.

This PR paves the way for better overall GC behavior in the future but doesn't change the default behavior yet.

This PR:

  • Introduces a new GarbageCollectionOptions instead of just providing a target. This allows you to configure whether you want to gc the timeless data, and additionally how many latest_at values you want to preserve.
  • Introduces a new gc target: Everything.
  • Calculates a set of protected rows for every component based on the last relevant row across every timeline (including timeless).
  • Modifies both gc_drop_at_least_num_bytes and the new gc_everything to respect the protected rows during gc.
  • Modifies the store_hub to gc the blueprint before saving it.

Photogrammetry with --no-frames is another "worst-case" for blueprint because every image is a space-view, so you can easily create a huge blueprint history by repeatedly resetting the blueprint.
image

Checklist

@jleibs jleibs added 🟦 blueprint The data that defines our UI ⛃ re_datastore affects the datastore itself labels Aug 29, 2023
@jleibs jleibs marked this pull request as ready for review August 29, 2023 21:00
@jleibs jleibs force-pushed the jleibs/flatten_blueprint branch from 6eead83 to 314bc83 Compare August 29, 2023 21:15
@jleibs jleibs force-pushed the jleibs/flatten_blueprint branch from 7c9ae50 to 2c4a3ec Compare August 30, 2023 00:37
@emilk emilk self-requested a review August 30, 2023 06:27
Copy link
Member

@emilk emilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks really good, but there was one piece of logic I failed to follow.

There is also, I believe, a chance to reduce code duplication by recognizing that "Everything" can be expressed numerically

@@ -13,6 +16,31 @@ pub enum GarbageCollectionTarget {
///
/// The fraction must be a float in the range [0.0 : 1.0].
DropAtLeastFraction(f64),

/// GC Everything that isn't protected
Everything,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this different from DropAtLeastFraction(1.0) ?

Copy link
Member Author

@jleibs jleibs Aug 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two differences:

  • DropAtLeastFraction does size book-keeping and potentially decides to stop early. I was running into edge cases where it would conclude it had met the 1.0 threshold, but in fact still had stuff it could GC.
  • DropAtLeastFraction needs to do an oldest-first incremental traversal. Everything has lots more room for optimizations that require less shuffling and re-sorting.

Special-casing the numerical value of 1.0 felt weirder to me than special-casing on an explicit enum but I can make that switch if you think it's clearer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that makes sense 👍

crates/re_arrow_store/src/store_gc.rs Outdated Show resolved Hide resolved
crates/re_arrow_store/src/store_gc.rs Outdated Show resolved Hide resolved
crates/re_arrow_store/src/store_gc.rs Outdated Show resolved Hide resolved
crates/re_arrow_store/src/store_gc.rs Outdated Show resolved Hide resolved
crates/re_arrow_store/src/store_gc.rs Outdated Show resolved Hide resolved
crates/re_arrow_store/src/store_gc.rs Outdated Show resolved Hide resolved
crates/re_arrow_store/tests/data_store.rs Outdated Show resolved Hide resolved
crates/re_data_store/src/store_db.rs Show resolved Hide resolved
crates/re_viewer/src/store_hub.rs Outdated Show resolved Hide resolved
@jleibs jleibs added this to the 0.8.2 milestone Aug 30, 2023
@jleibs jleibs merged commit 620ea8e into main Aug 30, 2023
@jleibs jleibs deleted the jleibs/flatten_blueprint branch August 30, 2023 19:03
jleibs added a commit that referenced this pull request Aug 31, 2023
…3148)

Resolves: #3098
Related to: #1803

Because blueprints used timeless data and timeless data wasn't GC'd, we
previously had no great way to clean up blueprints.

This PR paves the way for better overall GC behavior in the future but
doesn't change the default behavior yet.

This PR:
- Introduces a new `GarbageCollectionOptions` instead of just providing
a target. This allows you to configure whether you want to gc the
timeless data, and additionally how many latest_at values you want to
preserve.
 - Introduces a new gc target: Everything.
- Calculates a set of protected rows for every component based on the
last relevant row across every timeline (including timeless).
- Modifies both `gc_drop_at_least_num_bytes` and the new `gc_everything`
to respect the protected rows during gc.
 - Modifies the store_hub to gc the blueprint before saving it.

Photogrammetry with `--no-frames` is another "worst-case" for blueprint
because every image is a space-view, so you can easily create a huge
blueprint history by repeatedly resetting the blueprint.

![image](https://github.com/rerun-io/rerun/assets/3312232/03df3d06-a780-47b3-b0d9-aaf564793230)

* [x] I have read and agree to [Contributor
Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and
the [Code of
Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md)
* [x] I've included a screenshot or gif (if applicable)
* [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/3148) (if
applicable)

- [PR Build Summary](https://build.rerun.io/pr/3148)
- [Docs
preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/docs)
- [Examples
preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/examples)
- [Recent benchmark results](https://ref.rerun.io/dev/bench/)
- [Wasm size tracking](https://ref.rerun.io/dev/sizes/)
jleibs added a commit that referenced this pull request Aug 31, 2023
…3148)

Resolves: #3098
Related to: #1803

Because blueprints used timeless data and timeless data wasn't GC'd, we
previously had no great way to clean up blueprints.

This PR paves the way for better overall GC behavior in the future but
doesn't change the default behavior yet.

This PR:
- Introduces a new `GarbageCollectionOptions` instead of just providing
a target. This allows you to configure whether you want to gc the
timeless data, and additionally how many latest_at values you want to
preserve.
 - Introduces a new gc target: Everything.
- Calculates a set of protected rows for every component based on the
last relevant row across every timeline (including timeless).
- Modifies both `gc_drop_at_least_num_bytes` and the new `gc_everything`
to respect the protected rows during gc.
 - Modifies the store_hub to gc the blueprint before saving it.

Photogrammetry with `--no-frames` is another "worst-case" for blueprint
because every image is a space-view, so you can easily create a huge
blueprint history by repeatedly resetting the blueprint.

![image](https://github.com/rerun-io/rerun/assets/3312232/03df3d06-a780-47b3-b0d9-aaf564793230)

* [x] I have read and agree to [Contributor
Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and
the [Code of
Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md)
* [x] I've included a screenshot or gif (if applicable)
* [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/3148) (if
applicable)

- [PR Build Summary](https://build.rerun.io/pr/3148)
- [Docs
preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/docs)
- [Examples
preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/examples)
- [Recent benchmark results](https://ref.rerun.io/dev/bench/)
- [Wasm size tracking](https://ref.rerun.io/dev/sizes/)
jleibs added a commit that referenced this pull request Aug 31, 2023
…3148)

Resolves: #3098
Related to: #1803

Because blueprints used timeless data and timeless data wasn't GC'd, we
previously had no great way to clean up blueprints.

This PR paves the way for better overall GC behavior in the future but
doesn't change the default behavior yet.

This PR:
- Introduces a new `GarbageCollectionOptions` instead of just providing
a target. This allows you to configure whether you want to gc the
timeless data, and additionally how many latest_at values you want to
preserve.
 - Introduces a new gc target: Everything.
- Calculates a set of protected rows for every component based on the
last relevant row across every timeline (including timeless).
- Modifies both `gc_drop_at_least_num_bytes` and the new `gc_everything`
to respect the protected rows during gc.
 - Modifies the store_hub to gc the blueprint before saving it.

Photogrammetry with `--no-frames` is another "worst-case" for blueprint
because every image is a space-view, so you can easily create a huge
blueprint history by repeatedly resetting the blueprint.

![image](https://github.com/rerun-io/rerun/assets/3312232/03df3d06-a780-47b3-b0d9-aaf564793230)

* [x] I have read and agree to [Contributor
Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and
the [Code of
Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md)
* [x] I've included a screenshot or gif (if applicable)
* [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/3148) (if
applicable)

- [PR Build Summary](https://build.rerun.io/pr/3148)
- [Docs
preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/docs)
- [Examples
preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/examples)
- [Recent benchmark results](https://ref.rerun.io/dev/bench/)
- [Wasm size tracking](https://ref.rerun.io/dev/sizes/)
@jleibs jleibs mentioned this pull request Aug 31, 2023
3 tasks
jleibs added a commit that referenced this pull request Aug 31, 2023
…3148)

Resolves: #3098
Related to: #1803

Because blueprints used timeless data and timeless data wasn't GC'd, we
previously had no great way to clean up blueprints.

This PR paves the way for better overall GC behavior in the future but
doesn't change the default behavior yet.

This PR:
- Introduces a new `GarbageCollectionOptions` instead of just providing
a target. This allows you to configure whether you want to gc the
timeless data, and additionally how many latest_at values you want to
preserve.
 - Introduces a new gc target: Everything.
- Calculates a set of protected rows for every component based on the
last relevant row across every timeline (including timeless).
- Modifies both `gc_drop_at_least_num_bytes` and the new `gc_everything`
to respect the protected rows during gc.
 - Modifies the store_hub to gc the blueprint before saving it.

Photogrammetry with `--no-frames` is another "worst-case" for blueprint
because every image is a space-view, so you can easily create a huge
blueprint history by repeatedly resetting the blueprint.

![image](https://github.com/rerun-io/rerun/assets/3312232/03df3d06-a780-47b3-b0d9-aaf564793230)

* [x] I have read and agree to [Contributor
Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and
the [Code of
Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md)
* [x] I've included a screenshot or gif (if applicable)
* [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/3148) (if
applicable)

- [PR Build Summary](https://build.rerun.io/pr/3148)
- [Docs
preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/docs)
- [Examples
preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/examples)
- [Recent benchmark results](https://ref.rerun.io/dev/bench/)
- [Wasm size tracking](https://ref.rerun.io/dev/sizes/)
jleibs added a commit that referenced this pull request Aug 31, 2023
…3148)

Resolves: #3098
Related to: #1803

Because blueprints used timeless data and timeless data wasn't GC'd, we
previously had no great way to clean up blueprints.

This PR paves the way for better overall GC behavior in the future but
doesn't change the default behavior yet.

This PR:
- Introduces a new `GarbageCollectionOptions` instead of just providing
a target. This allows you to configure whether you want to gc the
timeless data, and additionally how many latest_at values you want to
preserve.
 - Introduces a new gc target: Everything.
- Calculates a set of protected rows for every component based on the
last relevant row across every timeline (including timeless).
- Modifies both `gc_drop_at_least_num_bytes` and the new `gc_everything`
to respect the protected rows during gc.
 - Modifies the store_hub to gc the blueprint before saving it.

Photogrammetry with `--no-frames` is another "worst-case" for blueprint
because every image is a space-view, so you can easily create a huge
blueprint history by repeatedly resetting the blueprint.

![image](https://github.com/rerun-io/rerun/assets/3312232/03df3d06-a780-47b3-b0d9-aaf564793230)

* [x] I have read and agree to [Contributor
Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and
the [Code of
Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md)
* [x] I've included a screenshot or gif (if applicable)
* [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/3148) (if
applicable)

- [PR Build Summary](https://build.rerun.io/pr/3148)
- [Docs
preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/docs)
- [Examples
preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/examples)
- [Recent benchmark results](https://ref.rerun.io/dev/bench/)
- [Wasm size tracking](https://ref.rerun.io/dev/sizes/)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🟦 blueprint The data that defines our UI ⛃ re_datastore affects the datastore itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Blueprints file should be flattened during save
2 participants