Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EntityTree: only check for entity deletions when necessary #8103

Merged
merged 4 commits into from
Nov 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 16 additions & 5 deletions crates/store/re_entity_db/src/entity_db.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ use parking_lot::Mutex;

use re_chunk::{Chunk, ChunkResult, RowId, TimeInt};
use re_chunk_store::{
ChunkStore, ChunkStoreChunkStats, ChunkStoreConfig, ChunkStoreEvent, ChunkStoreHandle,
ChunkStoreSubscriber, GarbageCollectionOptions, GarbageCollectionTarget,
ChunkStore, ChunkStoreChunkStats, ChunkStoreConfig, ChunkStoreDiffKind, ChunkStoreEvent,
ChunkStoreHandle, ChunkStoreSubscriber, GarbageCollectionOptions, GarbageCollectionTarget,
};
use re_log_types::{
ApplicationId, EntityPath, EntityPathHash, LogMsg, ResolvedTimeRange, ResolvedTimeRangeF,
Expand Down Expand Up @@ -385,7 +385,13 @@ impl EntityDb {

// It is possible for writes to trigger deletions: specifically in the case of
// overwritten static data leading to dangling chunks.
self.tree.on_store_deletions(&engine, &store_events);
let entity_paths_with_deletions = store_events
.iter()
.filter(|event| event.kind == ChunkStoreDiffKind::Deletion)
.map(|event| event.chunk.entity_path().clone())
.collect();
self.tree
.on_store_deletions(&engine, &entity_paths_with_deletions, &store_events);

// We inform the stats last, since it measures e2e latency.
self.stats.on_events(&store_events);
Expand Down Expand Up @@ -444,7 +450,7 @@ impl EntityDb {
store_events
}

fn gc(&mut self, gc_options: &GarbageCollectionOptions) -> Vec<ChunkStoreEvent> {
pub(crate) fn gc(&mut self, gc_options: &GarbageCollectionOptions) -> Vec<ChunkStoreEvent> {
re_tracing::profile_function!();

let mut engine = self.storage_engine.write();
Expand Down Expand Up @@ -544,7 +550,12 @@ impl EntityDb {
time_histogram_per_timeline.on_events(store_events);

let engine = engine.downgrade();
tree.on_store_deletions(&engine, store_events);
let entity_paths_with_deletions = store_events
.iter()
.filter(|event| event.kind == ChunkStoreDiffKind::Deletion)
.map(|event| event.chunk.entity_path().clone())
.collect();
tree.on_store_deletions(&engine, &entity_paths_with_deletions, store_events);
}

/// Key used for sorting recordings in the UI.
Expand Down
109 changes: 105 additions & 4 deletions crates/store/re_entity_db/src/entity_tree.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
use std::collections::BTreeMap;

use ahash::HashSet;
use nohash_hasher::IntMap;
use nohash_hasher::{IntMap, IntSet};

use re_chunk::RowId;
use re_chunk_store::{ChunkStoreDiffKind, ChunkStoreEvent, ChunkStoreSubscriber};
Expand All @@ -14,6 +14,7 @@ use re_types_core::ComponentName;
/// A recursive, manually updated [`ChunkStoreSubscriber`] that maintains the entity hierarchy.
///
/// The tree contains a list of subtrees, and so on recursively.
#[derive(Debug)]
pub struct EntityTree {
/// Full path prefix to the root of this (sub)tree.
pub path: EntityPath,
Expand Down Expand Up @@ -123,7 +124,12 @@ impl EntityTree {
}

/// Returns `true` if this entity has no children and no data.
pub fn is_empty(&self, engine: &StorageEngineReadGuard<'_>) -> bool {
///
/// Checking for the absence of data is neither costly nor totally free: do it a few hundreds or
/// thousands times a frame and it will absolutely kill framerate.
/// Don't blindly call this on every existing entity every frame: use [`ChunkStoreEvent`]s to make
/// sure anything changed at all first.
pub fn check_is_empty(&self, engine: &StorageEngineReadGuard<'_>) -> bool {
self.children.is_empty() && !engine.store().entity_has_data(&self.path)
}

Expand Down Expand Up @@ -160,6 +166,7 @@ impl EntityTree {
pub fn on_store_deletions(
&mut self,
engine: &StorageEngineReadGuard<'_>,
entity_paths_with_deletions: &IntSet<EntityPath>,
events: &[ChunkStoreEvent],
) {
re_tracing::profile_function!();
Expand All @@ -171,9 +178,22 @@ impl EntityTree {

self.children.retain(|_, entity| {
// this is placed first, because we'll only know if the child entity is empty after telling it to clear itself.
entity.on_store_deletions(engine, events);
entity.on_store_deletions(engine, entity_paths_with_deletions, events);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not totally following how this improves performance.

The inclusion of entity_paths_with_deletions doesn't change the fact that on_store_deletions does an entire tree-walk.

Is the whole point of this optimization to bypass the overhead of the is_empty() call in cases where we know that the intermediate child couldn't have been deleted?

That said, this seems like this will no longer successfully delete intermediate children that only existed as containers for other entities but don't have their own data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inclusion of entity_paths_with_deletions doesn't change the fact that on_store_deletions does an entire tree-walk.

Is the whole point of this optimization to bypass the overhead of the is_empty() call in cases where we know that the intermediate child couldn't have been deleted?

Yeah, the tree walk in itself is imperceptible (it's just a few thousands recursions in the worst case, it's barely measurable) -- a few thousands is_empty() on the other hand is extremely costly.

This PR basically brings the latency down from several seconds (ever increasing) to a constant 10ms (using the benchmark script in the issue).

That said, this seems like this will no longer successfully delete intermediate children that only existed as containers for other entities but don't have their own data.

Haa! I didn't even know that that was the point of this thing. We need to tweak this slightly then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few thousands is_empty() on the other hand is extremely costly

Got it -- that wasn't obvious at all and definitely warrants a comment then. I wonder if is_empty() should be renamed to something like check_if_empty() to better imply there's an active cost to be paid and it's not just accessing some pre-computed state.


let has_children = || !entity.children.is_empty();
// Checking for lack of data is not free, so make sure there is any reason to believe
// that any relevant data has changed first.
let has_recursive_deletion_events = || {
entity_paths_with_deletions
.iter()
.any(|removed_entity_path| removed_entity_path.starts_with(&entity.path))
};
let has_data = || engine.store().entity_has_data(&entity.path);

let should_be_removed =
!has_children() && (has_recursive_deletion_events() && !has_data());

!entity.is_empty(engine)
!should_be_removed
});
}

Expand Down Expand Up @@ -237,3 +257,84 @@ impl EntityTree {
visit(self, &mut predicate)
}
}

#[cfg(test)]
mod tests {
use std::sync::Arc;

use re_chunk::{Chunk, RowId};
use re_log_types::{example_components::MyPoint, EntityPath, StoreId, TimePoint, Timeline};

use crate::EntityDb;

#[test]
fn deleting_descendants() -> anyhow::Result<()> {
re_log::setup_logging();

let mut db = EntityDb::new(StoreId::random(re_log_types::StoreKind::Recording));

let timeline_frame = Timeline::new_sequence("frame");

let entity_path_parent: EntityPath = "parent".into();
let entity_path_child: EntityPath = "parent/child1".into();
let entity_path_grandchild: EntityPath = "parent/child1/grandchild".into();

assert!(db.tree().check_is_empty(&db.storage_engine()));

{
let row_id = RowId::new();
let timepoint = TimePoint::from_iter([(timeline_frame, 10)]);
let point = MyPoint::new(1.0, 2.0);
let chunk = Chunk::builder(entity_path_grandchild.clone())
.with_component_batches(row_id, timepoint, [&[point] as _])
.build()?;

db.add_chunk(&Arc::new(chunk))?;
}

{
let parent = db
.tree()
.find_first_child_recursive(|entity_path| *entity_path == entity_path_parent)
.unwrap();
let child = db
.tree()
.find_first_child_recursive(|entity_path| *entity_path == entity_path_child)
.unwrap();
let grandchild = db
.tree()
.find_first_child_recursive(|entity_path| *entity_path == entity_path_grandchild)
.unwrap();

assert_eq!(1, parent.children.len());
assert_eq!(1, child.children.len());
assert_eq!(0, grandchild.children.len());

assert!(!db.tree().check_is_empty(&db.storage_engine()));
assert!(!parent.check_is_empty(&db.storage_engine()));
assert!(!child.check_is_empty(&db.storage_engine()));
assert!(!grandchild.check_is_empty(&db.storage_engine()));
}

let _store_events = db.gc(&re_chunk_store::GarbageCollectionOptions::gc_everything());

{
let parent = db
.tree()
.find_first_child_recursive(|entity_path| *entity_path == entity_path_parent);
let child = db
.tree()
.find_first_child_recursive(|entity_path| *entity_path == entity_path_child);
let grandchild = db
.tree()
.find_first_child_recursive(|entity_path| *entity_path == entity_path_grandchild);

assert!(db.tree().check_is_empty(&db.storage_engine()));
assert!(parent.is_none());
assert!(child.is_none());
assert!(grandchild.is_none());
}

Ok(())
}
}
Loading