Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slowdown with 1'000 entities #2285

Closed
emilk opened this issue May 31, 2023 · 10 comments · Fixed by #3078
Closed

Slowdown with 1'000 entities #2285

emilk opened this issue May 31, 2023 · 10 comments · Fixed by #3078
Assignees
Labels
🚀 performance Optimization, memory use, etc

Comments

@emilk
Copy link
Member

emilk commented May 31, 2023

When we have 1k entities the viewer starts slowing down due to a score of places where we do store lookups for each entity each frame.

Some of these store lookups is due to scene ingestion: every entity is checked against every archetype ("is this a point? is this a mesh? is this an arrow?"). This O(N*K) complexity needs to be solved.

Furthermore, these queries are taking up way too much time, they could be a lot faster.

We also need a good example/test of this to work from.

@emilk emilk added the 🚀 performance Optimization, memory use, etc label May 31, 2023
@emilk
Copy link
Member Author

emilk commented May 31, 2023

Here is a screenshot of puffin profiler with 1k entities (in a release build). App::update takes around 50ms.

image

About half (!) of this time is overhead from puffin scopes (there are just too many of them), but even with them off this takes around 25 ms/frame.

@teh-cmc
Copy link
Member

teh-cmc commented May 31, 2023

Some of these store lookups is due to scene ingestion: every entity is checked against every archetype ("is this a point? is this a mesh? is this an arrow?"). This O(N*K) complexity needs to be solved.

Furthermore, these queries are taking up way too much time, they could be a lot faster.

I believe this is the result of us addressing the problem backwards at the moment.
I.e. we should be asking the store "hey, i have this archetype: which entities might possibly be concerned at all?", rather than that O(nk) query we're doing right now.

In fact the store might actually already be capable of answering that question efficiently, iirc.

@emilk
Copy link
Member Author

emilk commented Aug 21, 2023

@jleibs brought up the idea of entity tagging, which is very relevant for this. Each archetype logged to an entity adds a tag for that archetype (e.g. its name). This means each entity has a set of archetype tags (usually only one tag). This means archetypes are no longer duck-typed.

We also have the question of wether to categorize entities every frame, or once when they are added to a space view.

@emilk
Copy link
Member Author

emilk commented Aug 21, 2023

A simple reproduce is running examples/python/open_photogrammetry_format/main.py --no-frames and then hiding the points (the world/pcl entity). There are 116 cameras = 348 entities.

image

Enhance:
image

@emilk
Copy link
Member Author

emilk commented Aug 21, 2023

Me, @Wumpf, @jleibs and @teh-cmc discussed this, with a bunch of conclusions.

The first low-hanging fruit we should do is just to speed up the mapping, without changing any behavior or ui.

This can be done in a few hours

For each space view we pre-calculate a lookup map of ComponentName -> ViewPartSystemId like so (pseudo-code):

let mut component_view_parts : HashMap<ComponentName, Set<ViewPartSystemId>> = {}
for view_part in all_view_part_categories {
    for required in view_part.required_components() {
        component_view_parts[required].insert(view_part.id());
    }
}

This will produce a lookup looking something like:

{
    "rerun.mesh": {"mesh"},
    "rerun.point3d": {"point3d", "arrow"},
    "rerun.transform": {"transform_arrows"},
     
}

Note that the cardinality of the Set<ViewPartSystemId> is usually very low, and we may want to optimize based on this.

Using this we then each frame categorize the entities of a Space View like so:

let mut view_part_entities: Map<ViewPartSystemId, Vec<Entity>> = {}

for entity in space_view.entities {
    let entity_comps = db.all_components(entity);

    let mut potential_parts: Set<ViewPartSystemId> = {}
    for component_name in entity_comps {
        potential_parts = potential_parts.union(potential_parts[component_name]);
    }

    for part_id in potential_parts {
        let view_part_system = all_view_part_systems[id];
        if view_part_system.required_components().all(|comp| db.entity_has_component(comp)) {
            // It is a hit!
            view_part_entities[part_id].push(entity);
        }
    }
}

And then we have what we wanted: for each view part system, we have a list of entities for that.
Will this be faster? Probably. It can also be trivially parallelized (e.g. using rayon).

We would need to add an id to the trait ViewPartSystem, or just use Any::type_id (ViewPartSystem already has as_any).


This same idea can later be reused to speed up the broader query of "What SpaceViewClasses does this Entity fit?" which we use as a heuristic for generating Space Views, and populating them with any new entity.

@emilk
Copy link
Member Author

emilk commented Aug 21, 2023

In the future we want to be able to expose to users a setting for "What ViewParts should be used to show this Entity?". For instance, given a mesh with a transform, should we show it as a mesh, a transform, or both? This would require a BTreeMap<EntityPath, EditableAutoValue<HashSet<ViewPartSystemId>>> stored in each SpaceView, and a UI for that. This would also replace the special checkbox for wether or not to show the transform-axis for an entity.

@emilk emilk added this to the 0.9 milestone Aug 21, 2023
@Wumpf Wumpf self-assigned this Aug 22, 2023
@Wumpf
Copy link
Member

Wumpf commented Aug 22, 2023

Got this quite a bit faster, however we figured out in the process that as long as auto-adding space views is enabled that takes a significant amount of time. @jleibs and me discussed how to improve the way we do space view heuristics in the future and came up with a new schema where each space view class would decide on its own heuristic (this has the added benefit of removing remaining knowledge of specific views from re_viewport. The heuristic method per class would look something like this:

    fn auto_spawn_heuristic(
        &self,
        ctx: &ViewerContext<'_>,
        ent_paths: &PerSystemEntities,
        max_num_candidates: usize,
    ) -> Vec<(AutoSpawnHeuristic, EntityPath)>;

meaning that candidate creation can be done in a much more sensible way. Some challenges there are dealing with disconnected-components.

@Wumpf
Copy link
Member

Wumpf commented Aug 22, 2023

to ilustrate the issue: this is what performance in examples/python/open_photogrammetry_format/main.py --no-frames without points looks like after my current batch of optimizations:
image
(once a space view is added or removed this becomes a lot faster since we no longer run auto space view heuristics)

@Wumpf
Copy link
Member

Wumpf commented Aug 22, 2023

separated this out into #3079

@emilk emilk modified the milestones: 0.9, 0.8.2 Aug 23, 2023
@emilk
Copy link
Member Author

emilk commented Aug 25, 2023

cargo r -p raw_mesh is also a pretty good benchmark

jleibs added a commit that referenced this issue Aug 29, 2023
### What

* Fixes #2285
(.. at least fixes it well enough for the moment I reckon)

Improves both the per space view rendering and the space view adding
heuristic. In particular on the later there's still significant gains to
be made.
Most importantly we should be scaling now much better with number of
added systems.

Went fairly early with the decision to have the primary datastructure to
pass around be a `Map<System, Vec<Entity>>` (conceptual) instead of the
other way round since this is what we need in the systems most of the
time.
This is another step towards a stronger contract of systems specifying
what components they will query ahead of time!

The way it is implemented on thr (rename from `BlueprintTree` 💥)
`SpaceViewContents` also paves the way for ui selection of systems for a
given entity path which complements the ongoing work on our new data
ingestion interfaces.

-----

Testcase: 
`examples/python/open_photogrammetry_format/main.py --no-frames`,
**fully reset viewer** and then hiding **NOT REMOVING** the points (the
world/pcl entity) and deselecting them.
(The bold marked pieces are very important as they have an influence on
what heuristics run - see #3077)

Before:

![image](https://github.com/rerun-io/rerun/assets/1220815/cda9d555-8839-4d78-8507-cf4ed34bad70)

After:

![image](https://github.com/rerun-io/rerun/assets/1220815/d34a00f8-05af-4160-85bf-b709a0d36ecb)


(release runs, averaged over a bunch of frames on my M1 Max)

Highlights:
* `AppState::show`: 24.0ms ➡️ 14.6ms 
   * `SpaceViewBlueprint::scene_ui`: 4.9ms ➡️ 2.6ms
      * this was the original primary optimization target!
   * `Viewport::on_frame_start`: 15.2ms ➡️ 7.4ms
       * `default_created_space_view`: 12.7ms ➡️ 5.1ms
* quite some time spent on this, but we want to do a paradigm shift
there: #3079
           
(slight number discrepancies from the screenshot are due to doing a
different run)

----
Draft todo: Code sanity check. Suspecting some heuristics might be
slightly broken, need to go through examples


### Checklist
* [x] I have read and agree to [Contributor
Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and
the [Code of
Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md)
* [x] I've included a screenshot or gif (if applicable)
* [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/3078) (if
applicable)

- [PR Build Summary](https://build.rerun.io/pr/3078)
- [Docs
preview](https://rerun.io/preview/fc5bb0b8908e2e8c666a6d5daa1dee105fe2aae8/docs)
<!--DOCS-PREVIEW-->
- [Examples
preview](https://rerun.io/preview/fc5bb0b8908e2e8c666a6d5daa1dee105fe2aae8/examples)
<!--EXAMPLES-PREVIEW--><!--EXAMPLES-PREVIEW--><!--EXAMPLES-PREVIEW--><!--EXAMPLES-PREVIEW--><!--EXAMPLES-PREVIEW--><!--EXAMPLES-PREVIEW--><!--EXAMPLES-PREVIEW--><!--EXAMPLES-PREVIEW--><!--EXAMPLES-PREVIEW--><!--EXAMPLES-PREVIEW--><!--EXAMPLES-PREVIEW--><!--EXAMPLES-PREVIEW--><!--EXAMPLES-PREVIEW--><!--EXAMPLES-PREVIEW--><!--EXAMPLES-PREVIEW-->
- [Recent benchmark results](https://ref.rerun.io/dev/bench/)
- [Wasm size tracking](https://ref.rerun.io/dev/sizes/)

---------

Co-authored-by: Jeremy Leibs <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🚀 performance Optimization, memory use, etc
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants