Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datastore: bake latest-at semantics into the garbage collector #1803

Closed
teh-cmc opened this issue Apr 10, 2023 · 2 comments · Fixed by #3357
Closed

datastore: bake latest-at semantics into the garbage collector #1803

teh-cmc opened this issue Apr 10, 2023 · 2 comments · Fixed by #3357
Labels
🏎️ Quick Issue Can be fixed in a few hours or less ⛃ re_datastore affects the datastore itself

Comments

@teh-cmc
Copy link
Member

teh-cmc commented Apr 10, 2023

Consider the following log calls:

log_color("some/entity", frame_nr=0, [{255, 0, 0, 255}])
log_point("some/entity", frame_nr=1, [{1.0, 1.0}])
log_point("some/entity", frame_nr=2, [{2.0, 2.0}])
log_point("some/entity", frame_nr=3, [{3.0, 3.0}])
log_point("some/entity", frame_nr=4, [{4.0, 4.0}])
log_point("some/entity", frame_nr=5, [{5.0, 5.0}])

Querying for LatestAt("some/entity", ("frame_nr", 5)) will unsurprisingly yield a red point at (5.0, 5.0).

Now, consider what happens after running a GC that drops 50% of the data, leaving us with:

log_point("some/entity", frame_nr=3, [{3.0, 3.0}])
log_point("some/entity", frame_nr=4, [{4.0, 4.0}])
log_point("some/entity", frame_nr=5, [{5.0, 5.0}])

Querying for LatestAt("some/entity", ("frame_nr", 5)) will now yield a point at (5.0, 5.0) with whatever is currently defined as the default color, rather than red. This is just plain wrong.

This happens because the GC blindly drops data rather than doing the correct thing: compacting what gets dropped into a latest-at kind of state and keeping that around for future queries.

@teh-cmc teh-cmc added the ⛃ re_datastore affects the datastore itself label Apr 10, 2023
@emilk
Copy link
Member

emilk commented Apr 17, 2023

Also known as "flattening" this will be useful for our plan of storing entity properties in the store. Each edit will be added, but on save we only want the latest of every property.

@teh-cmc teh-cmc self-assigned this Apr 18, 2023
@emilk emilk added this to the 0.8.2 milestone Aug 25, 2023
jleibs added a commit that referenced this issue Aug 30, 2023
…3148)

### What
Resolves: #3098
Related to: #1803

Because blueprints used timeless data and timeless data wasn't GC'd, we
previously had no great way to clean up blueprints.

This PR paves the way for better overall GC behavior in the future but
doesn't change the default behavior yet.

This PR:
- Introduces a new `GarbageCollectionOptions` instead of just providing
a target. This allows you to configure whether you want to gc the
timeless data, and additionally how many latest_at values you want to
preserve.
 - Introduces a new gc target: Everything.
- Calculates a set of protected rows for every component based on the
last relevant row across every timeline (including timeless).
- Modifies both `gc_drop_at_least_num_bytes` and the new `gc_everything`
to respect the protected rows during gc.
 - Modifies the store_hub to gc the blueprint before saving it.

Photogrammetry with `--no-frames` is another "worst-case" for blueprint
because every image is a space-view, so you can easily create a huge
blueprint history by repeatedly resetting the blueprint.

![image](https://github.com/rerun-io/rerun/assets/3312232/03df3d06-a780-47b3-b0d9-aaf564793230)

### Checklist
* [x] I have read and agree to [Contributor
Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and
the [Code of
Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md)
* [x] I've included a screenshot or gif (if applicable)
* [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/3148) (if
applicable)

- [PR Build Summary](https://build.rerun.io/pr/3148)
- [Docs
preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/docs)
- [Examples
preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/examples)
- [Recent benchmark results](https://ref.rerun.io/dev/bench/)
- [Wasm size tracking](https://ref.rerun.io/dev/sizes/)
jleibs added a commit that referenced this issue Aug 31, 2023
…3148)

Resolves: #3098
Related to: #1803

Because blueprints used timeless data and timeless data wasn't GC'd, we
previously had no great way to clean up blueprints.

This PR paves the way for better overall GC behavior in the future but
doesn't change the default behavior yet.

This PR:
- Introduces a new `GarbageCollectionOptions` instead of just providing
a target. This allows you to configure whether you want to gc the
timeless data, and additionally how many latest_at values you want to
preserve.
 - Introduces a new gc target: Everything.
- Calculates a set of protected rows for every component based on the
last relevant row across every timeline (including timeless).
- Modifies both `gc_drop_at_least_num_bytes` and the new `gc_everything`
to respect the protected rows during gc.
 - Modifies the store_hub to gc the blueprint before saving it.

Photogrammetry with `--no-frames` is another "worst-case" for blueprint
because every image is a space-view, so you can easily create a huge
blueprint history by repeatedly resetting the blueprint.

![image](https://github.com/rerun-io/rerun/assets/3312232/03df3d06-a780-47b3-b0d9-aaf564793230)

* [x] I have read and agree to [Contributor
Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and
the [Code of
Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md)
* [x] I've included a screenshot or gif (if applicable)
* [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/3148) (if
applicable)

- [PR Build Summary](https://build.rerun.io/pr/3148)
- [Docs
preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/docs)
- [Examples
preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/examples)
- [Recent benchmark results](https://ref.rerun.io/dev/bench/)
- [Wasm size tracking](https://ref.rerun.io/dev/sizes/)
jleibs added a commit that referenced this issue Aug 31, 2023
…3148)

Resolves: #3098
Related to: #1803

Because blueprints used timeless data and timeless data wasn't GC'd, we
previously had no great way to clean up blueprints.

This PR paves the way for better overall GC behavior in the future but
doesn't change the default behavior yet.

This PR:
- Introduces a new `GarbageCollectionOptions` instead of just providing
a target. This allows you to configure whether you want to gc the
timeless data, and additionally how many latest_at values you want to
preserve.
 - Introduces a new gc target: Everything.
- Calculates a set of protected rows for every component based on the
last relevant row across every timeline (including timeless).
- Modifies both `gc_drop_at_least_num_bytes` and the new `gc_everything`
to respect the protected rows during gc.
 - Modifies the store_hub to gc the blueprint before saving it.

Photogrammetry with `--no-frames` is another "worst-case" for blueprint
because every image is a space-view, so you can easily create a huge
blueprint history by repeatedly resetting the blueprint.

![image](https://github.com/rerun-io/rerun/assets/3312232/03df3d06-a780-47b3-b0d9-aaf564793230)

* [x] I have read and agree to [Contributor
Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and
the [Code of
Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md)
* [x] I've included a screenshot or gif (if applicable)
* [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/3148) (if
applicable)

- [PR Build Summary](https://build.rerun.io/pr/3148)
- [Docs
preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/docs)
- [Examples
preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/examples)
- [Recent benchmark results](https://ref.rerun.io/dev/bench/)
- [Wasm size tracking](https://ref.rerun.io/dev/sizes/)
jleibs added a commit that referenced this issue Aug 31, 2023
…3148)

Resolves: #3098
Related to: #1803

Because blueprints used timeless data and timeless data wasn't GC'd, we
previously had no great way to clean up blueprints.

This PR paves the way for better overall GC behavior in the future but
doesn't change the default behavior yet.

This PR:
- Introduces a new `GarbageCollectionOptions` instead of just providing
a target. This allows you to configure whether you want to gc the
timeless data, and additionally how many latest_at values you want to
preserve.
 - Introduces a new gc target: Everything.
- Calculates a set of protected rows for every component based on the
last relevant row across every timeline (including timeless).
- Modifies both `gc_drop_at_least_num_bytes` and the new `gc_everything`
to respect the protected rows during gc.
 - Modifies the store_hub to gc the blueprint before saving it.

Photogrammetry with `--no-frames` is another "worst-case" for blueprint
because every image is a space-view, so you can easily create a huge
blueprint history by repeatedly resetting the blueprint.

![image](https://github.com/rerun-io/rerun/assets/3312232/03df3d06-a780-47b3-b0d9-aaf564793230)

* [x] I have read and agree to [Contributor
Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and
the [Code of
Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md)
* [x] I've included a screenshot or gif (if applicable)
* [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/3148) (if
applicable)

- [PR Build Summary](https://build.rerun.io/pr/3148)
- [Docs
preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/docs)
- [Examples
preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/examples)
- [Recent benchmark results](https://ref.rerun.io/dev/bench/)
- [Wasm size tracking](https://ref.rerun.io/dev/sizes/)
jleibs added a commit that referenced this issue Aug 31, 2023
…3148)

Resolves: #3098
Related to: #1803

Because blueprints used timeless data and timeless data wasn't GC'd, we
previously had no great way to clean up blueprints.

This PR paves the way for better overall GC behavior in the future but
doesn't change the default behavior yet.

This PR:
- Introduces a new `GarbageCollectionOptions` instead of just providing
a target. This allows you to configure whether you want to gc the
timeless data, and additionally how many latest_at values you want to
preserve.
 - Introduces a new gc target: Everything.
- Calculates a set of protected rows for every component based on the
last relevant row across every timeline (including timeless).
- Modifies both `gc_drop_at_least_num_bytes` and the new `gc_everything`
to respect the protected rows during gc.
 - Modifies the store_hub to gc the blueprint before saving it.

Photogrammetry with `--no-frames` is another "worst-case" for blueprint
because every image is a space-view, so you can easily create a huge
blueprint history by repeatedly resetting the blueprint.

![image](https://github.com/rerun-io/rerun/assets/3312232/03df3d06-a780-47b3-b0d9-aaf564793230)

* [x] I have read and agree to [Contributor
Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and
the [Code of
Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md)
* [x] I've included a screenshot or gif (if applicable)
* [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/3148) (if
applicable)

- [PR Build Summary](https://build.rerun.io/pr/3148)
- [Docs
preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/docs)
- [Examples
preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/examples)
- [Recent benchmark results](https://ref.rerun.io/dev/bench/)
- [Wasm size tracking](https://ref.rerun.io/dev/sizes/)
jleibs added a commit that referenced this issue Aug 31, 2023
…3148)

Resolves: #3098
Related to: #1803

Because blueprints used timeless data and timeless data wasn't GC'd, we
previously had no great way to clean up blueprints.

This PR paves the way for better overall GC behavior in the future but
doesn't change the default behavior yet.

This PR:
- Introduces a new `GarbageCollectionOptions` instead of just providing
a target. This allows you to configure whether you want to gc the
timeless data, and additionally how many latest_at values you want to
preserve.
 - Introduces a new gc target: Everything.
- Calculates a set of protected rows for every component based on the
last relevant row across every timeline (including timeless).
- Modifies both `gc_drop_at_least_num_bytes` and the new `gc_everything`
to respect the protected rows during gc.
 - Modifies the store_hub to gc the blueprint before saving it.

Photogrammetry with `--no-frames` is another "worst-case" for blueprint
because every image is a space-view, so you can easily create a huge
blueprint history by repeatedly resetting the blueprint.

![image](https://github.com/rerun-io/rerun/assets/3312232/03df3d06-a780-47b3-b0d9-aaf564793230)

* [x] I have read and agree to [Contributor
Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and
the [Code of
Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md)
* [x] I've included a screenshot or gif (if applicable)
* [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/3148) (if
applicable)

- [PR Build Summary](https://build.rerun.io/pr/3148)
- [Docs
preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/docs)
- [Examples
preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/examples)
- [Recent benchmark results](https://ref.rerun.io/dev/bench/)
- [Wasm size tracking](https://ref.rerun.io/dev/sizes/)
@emilk
Copy link
Member

emilk commented Sep 11, 2023

The code is there, we just need to turn it on for the normal recordings

@emilk emilk added the 🏎️ Quick Issue Can be fixed in a few hours or less label Sep 11, 2023
@emilk emilk unassigned teh-cmc and jleibs Sep 18, 2023
jleibs added a commit that referenced this issue Sep 19, 2023
)

### What
Now that GC has the abillity to protect data, turn the feature on for
our normal `purge_fraction_of_ram` operations.

Resolves: #1803

### Checklist
* [x] I have read and agree to [Contributor
Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and
the [Code of
Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md)
* [x] I've included a screenshot or gif (if applicable)
* [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/3357) (if
applicable)

- [PR Build Summary](https://build.rerun.io/pr/3357)
- [Docs
preview](https://rerun.io/preview/678cf75238c49f71ab338a09cc99790de0626efa/docs)
<!--DOCS-PREVIEW-->
- [Examples
preview](https://rerun.io/preview/678cf75238c49f71ab338a09cc99790de0626efa/examples)
<!--EXAMPLES-PREVIEW-->
- [Recent benchmark results](https://ref.rerun.io/dev/bench/)
- [Wasm size tracking](https://ref.rerun.io/dev/sizes/)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏎️ Quick Issue Can be fixed in a few hours or less ⛃ re_datastore affects the datastore itself
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants