Update Access and FilteredAccess to use sorted vecs instead of FixedBitSet #14385

cBournhonesque · 2024-07-18T22:25:00Z

Objective

As described in the relations RFC: https://github.com/james-j-obrien/rfcs/blob/minimal-fragmenting-relationships/rfcs/79-minimal-fragmenting-relationships.md#access-bitsets-and-component-sparsesets

The reasons while the access bitsets are efficient currently is because the ComponentIds are dense: they are incremented from 0 and should remain small.
With relations, a ComponentId will have some upper bits 1, so the amount of memory allocated in the FixedBitSet to represent the value would be non-trivial.

Solution

One way to fix the issue would be to replace the FixedBitSets with sorted vectors.
The vectors should remain relatively small since queries usually don't involved hundreds of components.

These allow to do union, difference, intersection operations in O(1).
(however inserting a value is O(n))

Testing

Ran the running_systems benchmarks

group                                   main                                   pr
-----                                   ----                                   --
busy_systems/01x_entities_03_systems    1.00     27.7±3.44µs        ? ?/sec    1.02     28.2±5.25µs        ? ?/sec
busy_systems/01x_entities_06_systems    1.00     42.3±1.46µs        ? ?/sec    1.11     47.0±2.95µs        ? ?/sec
busy_systems/01x_entities_09_systems    1.08    67.3±13.90µs        ? ?/sec    1.00     62.1±1.42µs        ? ?/sec
busy_systems/01x_entities_12_systems    1.00     77.9±1.76µs        ? ?/sec    1.08    83.8±14.63µs        ? ?/sec
busy_systems/01x_entities_15_systems    1.00     94.2±1.92µs        ? ?/sec    1.08    101.8±5.94µs        ? ?/sec
busy_systems/02x_entities_03_systems    1.00     42.4±0.86µs        ? ?/sec    1.28    54.1±15.42µs        ? ?/sec
busy_systems/02x_entities_06_systems    1.00     74.0±3.57µs        ? ?/sec    1.15     85.0±5.44µs        ? ?/sec
busy_systems/02x_entities_09_systems    1.00    109.8±1.81µs        ? ?/sec    1.05    114.9±5.82µs        ? ?/sec
busy_systems/02x_entities_12_systems    1.00    142.3±1.42µs        ? ?/sec    1.11   157.5±27.83µs        ? ?/sec
busy_systems/02x_entities_15_systems    1.01   184.3±47.23µs        ? ?/sec    1.00   182.3±10.41µs        ? ?/sec
busy_systems/03x_entities_03_systems    1.00     59.7±0.74µs        ? ?/sec    1.06     63.0±5.44µs        ? ?/sec
busy_systems/03x_entities_06_systems    1.00     98.2±0.59µs        ? ?/sec    1.12    110.0±5.41µs        ? ?/sec
busy_systems/03x_entities_09_systems    1.00   156.0±12.20µs        ? ?/sec    1.06   165.5±37.93µs        ? ?/sec
busy_systems/03x_entities_12_systems    1.00   207.1±11.09µs        ? ?/sec    1.05   217.1±33.47µs        ? ?/sec
busy_systems/03x_entities_15_systems    1.00    252.2±1.91µs        ? ?/sec    1.20  301.6±162.98µs        ? ?/sec
busy_systems/04x_entities_03_systems    1.00     75.2±3.92µs        ? ?/sec    1.03     77.2±5.84µs        ? ?/sec
busy_systems/04x_entities_06_systems    1.00    127.2±2.50µs        ? ?/sec    1.14   145.0±16.45µs        ? ?/sec
busy_systems/04x_entities_09_systems    1.00    200.8±1.37µs        ? ?/sec    1.12   224.2±69.54µs        ? ?/sec
busy_systems/04x_entities_12_systems    1.00   277.7±16.88µs        ? ?/sec    1.02   282.3±16.00µs        ? ?/sec
busy_systems/04x_entities_15_systems    1.02   332.8±10.41µs        ? ?/sec    1.00    326.3±7.38µs        ? ?/sec
busy_systems/05x_entities_03_systems    1.04     97.4±4.89µs        ? ?/sec    1.00     93.3±3.49µs        ? ?/sec
busy_systems/05x_entities_06_systems    1.00    159.0±4.15µs        ? ?/sec    1.09    173.1±3.51µs        ? ?/sec
busy_systems/05x_entities_09_systems    1.00    251.8±6.01µs        ? ?/sec    1.00   252.3±13.01µs        ? ?/sec
busy_systems/05x_entities_12_systems    1.06   352.4±25.20µs        ? ?/sec    1.00   332.7±37.06µs        ? ?/sec
busy_systems/05x_entities_15_systems    1.00   415.6±11.30µs        ? ?/sec    1.03   426.7±54.68µs        ? ?/sec
contrived/01x_entities_03_systems       1.00     16.0±0.36µs        ? ?/sec    1.04     16.8±1.12µs        ? ?/sec
contrived/01x_entities_06_systems       1.00     28.5±1.21µs        ? ?/sec    1.00     28.4±0.42µs        ? ?/sec
contrived/01x_entities_09_systems       1.00     40.9±9.31µs        ? ?/sec    1.03     42.0±5.06µs        ? ?/sec
contrived/01x_entities_12_systems       1.00     50.5±0.77µs        ? ?/sec    1.08     54.4±1.75µs        ? ?/sec
contrived/01x_entities_15_systems       1.00     65.1±3.88µs        ? ?/sec    1.01     65.6±1.89µs        ? ?/sec
contrived/02x_entities_03_systems       1.00     25.4±1.16µs        ? ?/sec    1.01     25.7±0.58µs        ? ?/sec
contrived/02x_entities_06_systems       1.00     44.8±1.77µs        ? ?/sec    1.04     46.5±4.55µs        ? ?/sec
contrived/02x_entities_09_systems       1.04     62.7±4.50µs        ? ?/sec    1.00     60.6±0.91µs        ? ?/sec
contrived/02x_entities_12_systems       1.00     76.0±2.91µs        ? ?/sec    1.03     78.1±7.32µs        ? ?/sec
contrived/02x_entities_15_systems       1.00     89.9±3.65µs        ? ?/sec    1.07     96.0±5.51µs        ? ?/sec
contrived/03x_entities_03_systems       1.00     33.7±1.93µs        ? ?/sec    1.10     37.0±3.53µs        ? ?/sec
contrived/03x_entities_06_systems       1.00     58.7±1.16µs        ? ?/sec    1.02     59.6±4.95µs        ? ?/sec
contrived/03x_entities_09_systems       1.08    89.1±55.77µs        ? ?/sec    1.00     82.2±6.06µs        ? ?/sec
contrived/03x_entities_12_systems       1.00    105.6±4.30µs        ? ?/sec    1.01    106.4±6.91µs        ? ?/sec
contrived/03x_entities_15_systems       1.00    125.3±6.35µs        ? ?/sec    1.01    126.9±7.29µs        ? ?/sec
contrived/04x_entities_03_systems       1.00    48.1±13.25µs        ? ?/sec    1.03    49.4±27.36µs        ? ?/sec
contrived/04x_entities_06_systems       1.00     70.3±5.20µs        ? ?/sec    1.06     74.6±5.89µs        ? ?/sec
contrived/04x_entities_09_systems       1.00   103.4±17.56µs        ? ?/sec    1.15   118.5±27.34µs        ? ?/sec
contrived/04x_entities_12_systems       1.00    128.5±2.82µs        ? ?/sec    1.02    131.4±3.46µs        ? ?/sec
contrived/04x_entities_15_systems       1.03    156.9±9.78µs        ? ?/sec    1.00    152.5±2.27µs        ? ?/sec
contrived/05x_entities_03_systems       1.01     52.0±2.36µs        ? ?/sec    1.00     51.4±0.86µs        ? ?/sec
contrived/05x_entities_06_systems       1.00     84.3±4.84µs        ? ?/sec    1.05    88.2±18.08µs        ? ?/sec
contrived/05x_entities_09_systems       1.00    118.9±7.57µs        ? ?/sec    1.02    120.8±3.71µs        ? ?/sec
contrived/05x_entities_12_systems       1.01   150.8±11.52µs        ? ?/sec    1.00    150.0±4.43µs        ? ?/sec
contrived/05x_entities_15_systems       1.00    183.7±4.40µs        ? ?/sec    1.02    186.6±3.80µs        ? ?/sec

crates/bevy_ecs/src/query/access.rs

hymm · 2024-07-25T16:18:13Z

I don't think the existing benchmarks are a good test of this pr as they don't use a lot of archetype components. We need a benchmark that creates 100's to 1000's of them or maybe more depending on how many active number of them we expect with relations.

cBournhonesque · 2024-07-27T21:27:17Z

I don't think the existing benchmarks are a good test of this pr as they don't use a lot of archetype components. We need a benchmark that creates 100's to 1000's of them or maybe more depending on how many active number of them we expect with relations.

I'm not sure that to merge this we need a benchmark with tons of archetype components. My thought process is more like: "the current design won't be sustainable with relations, so we want to replace it with something that has equivalent performance for the current usage pattern of bevy". So we need to prove that the change is acceptable for the current kind of queries I think?

Also, even with relations, systems would probably have the same amount of access, even though the ComponentIds themselves can get larger.
You probably wouldn't have systems with tons of queries, but instead systems like Query<&Color, HasPlanet<Mars>>, which doesn't have a ton of archetype components, but instead has a high ComponentId for HasPlanet<Mars>.

That being set more useful benchmarks are always welcome

cart · 2024-08-26T22:48:35Z

I'm not sure that to merge this we need a benchmark with tons of archetype components. My thought process is more like: "the current design won't be sustainable with relations, so we want to replace it with something that has equivalent performance for the current usage pattern of bevy". So we need to prove that the change is acceptable for the current kind of queries I think?

Given how "hot" access is, I think we absolutely need benchmarks here. The current multithreaded executor was implemented under the assumption that constructing and comparing access was very cheap.

Ex: the active_access field is rebuilt multiple (potentially many) times per frame. And that particular Access is potentially very large depending on the number of systems being run. I can see this being prohibitively expensive.

These allow to do union, difference, intersection operations in O(1).

Calling this O(1) is a bit of a stretch, as something like a union with fixedbitsets actually took O(1) within a block, whereas a union with the new approach is very clearly O(N). The O(1) is only in reference to the construction of the iterator. Resolving it is O(N).

Trashtalk217

This looks good.

The only thing I can recommend is to maybe extract the SortedSmallVec data structure into a separate file (maybe in bevy_utils).

cBournhonesque · 2024-08-26T23:28:01Z

I'm not sure that to merge this we need a benchmark with tons of archetype components. My thought process is more like: "the current design won't be sustainable with relations, so we want to replace it with something that has equivalent performance for the current usage pattern of bevy". So we need to prove that the change is acceptable for the current kind of queries I think?

Given how "hot" access is, I think we absolutely need benchmarks here. The current multithreaded executor was implemented under the assumption that constructing and comparing access was very cheap.

Ex: the active_access field is rebuilt multiple (potentially many) times per frame. And that particular Access is potentially very large depending on the number of systems being run. I can see this being prohibitively expensive.

These allow to do union, difference, intersection operations in O(1).

Calling this O(1) is a bit of a stretch, as something like a union with fixedbitsets actually took O(1) within a block, whereas a union with the new approach is very clearly O(N). The O(1) is only in reference to the construction of the iterator. Resolving it is O(N).

Would you like to see benchmarks focused on Access operations directly?

Trashtalk217 · 2024-08-26T23:29:27Z

Also, with regards to performance: More benchmarks are always nice, but is it also possible to see a slowdown in some of the examples? Maybe with regards to fps in complicated render scenes?

cart · 2024-08-26T23:38:09Z

Would you like to see benchmarks focused on Access operations directly?

I think the highest priority is seeing benchmarks of the executor running many systems with many component accesses. I'd like both a small bevy_ecs-scoped executor benchmark that generates thousands of components used by hundreds of systems. The challenge here is ensuring rebuild_active_access is actually getting called in a way that is reflective of real apps.

I'd also like to see if this has measurable effects on frame time in full bevy apps. "Big scenes" matter less than "executes systems in parallel that in combination reference many components". So even something like 3d_scene should be sufficient, as that will run all of the built in bevy systems.

cBournhonesque · 2024-08-27T03:03:11Z

Would you like to see benchmarks focused on Access operations directly?

I think the highest priority is seeing benchmarks of the executor running many systems with many component accesses. I'd like both a small bevy_ecs-scoped executor benchmark that generates thousands of components used by hundreds of systems. The challenge here is ensuring rebuild_active_access is actually getting called in a way that is reflective of real apps.

I'd also like to see if this has measurable effects on frame time in full bevy apps. "Big scenes" matter less than "executes systems in parallel that in combination reference many components". So even something like 3d_scene should be sufficient, as that will run all of the built in bevy systems.

Here are my results on 3d scene: (red is PR, yellow in main)

Not much difference overall, the median time is very similar.

Comparing the multithreaded executor span:

There is a sizable difference, the PR version is about twice as slow. However the overall difference adds up to less than 1us.
The executor accounts for 8% of total time:

so it's not insignificant by any means. We will still need to move away from the FixedBitSets if we want to have relations though.

In the PR, the executor accounts for 9.87% of total time:

hymm · 2024-08-27T03:54:58Z

You might have done it, but when you benchmark 3d_scene, you should change the present mode to immediate. That won't really change the multithreaded span, but should change the frame time significantly

# Objective We currently have no benchmarks for large worlds with many entities, components and systems. Having a benchmark for a world with many components is especially useful for the performance improvements needed for relations. This is also a response to this [comment from cart](#14385 (comment)). > I'd like both a small bevy_ecs-scoped executor benchmark that generates thousands of components used by hundreds of systems. ## Solution I use dynamic components and components to construct a benchmark with 2000 components, 4000 systems, and 10000 entities. ## Some notes - ~I use a lot of random entities, which creates unpredictable performance, I should use a seeded PRNG.~ - Not entirely sure if everything is ran concurrently currently. And there are many conflicts, meaning there's probably a lot of first-come-first-serve going on. Not entirely sure if these benchmarks are very reproducible. - Maybe add some more safety comments - Also component_reads_and_writes() is about to be deprecated #16339, but there's no other way to currently do what I'm trying to do. --------- Co-authored-by: Chris Russell <[email protected]> Co-authored-by: BD103 <[email protected]>

# Objective We currently have no benchmarks for large worlds with many entities, components and systems. Having a benchmark for a world with many components is especially useful for the performance improvements needed for relations. This is also a response to this [comment from cart](bevyengine#14385 (comment)). > I'd like both a small bevy_ecs-scoped executor benchmark that generates thousands of components used by hundreds of systems. ## Solution I use dynamic components and components to construct a benchmark with 2000 components, 4000 systems, and 10000 entities. ## Some notes - ~I use a lot of random entities, which creates unpredictable performance, I should use a seeded PRNG.~ - Not entirely sure if everything is ran concurrently currently. And there are many conflicts, meaning there's probably a lot of first-come-first-serve going on. Not entirely sure if these benchmarks are very reproducible. - Maybe add some more safety comments - Also component_reads_and_writes() is about to be deprecated bevyengine#16339, but there's no other way to currently do what I'm trying to do. --------- Co-authored-by: Chris Russell <[email protected]> Co-authored-by: BD103 <[email protected]>

cBournhonesque added 2 commits July 18, 2024 17:42

wip

bbb30d6

all tests pass

a9e83d3

alice-i-cecile added A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged labels Jul 18, 2024

try with lower small vec size

b763612

Victoronz reviewed Jul 19, 2024

View reviewed changes

crates/bevy_ecs/src/query/access.rs Show resolved Hide resolved

update bench

d24db5c

hymm mentioned this pull request Jul 25, 2024

Split Resource and Component Access #14472

Closed

cBournhonesque mentioned this pull request Aug 1, 2024

Separate component and resource access #14561

Merged

cBournhonesque added 2 commits August 26, 2024 12:47

Merge branch 'main' into cb/remove-access-bitset

f0436f9

update

1734862

cBournhonesque marked this pull request as ready for review August 26, 2024 17:31

cBournhonesque added 4 commits August 26, 2024 13:34

clippy

ec85941

fmt

5d68eae

remove bench

43f42a3

clippy

938e062

cBournhonesque mentioned this pull request Aug 26, 2024

Remove sparse set in Table storage #14928

Open

format

5ca360d

Trashtalk217 approved these changes Aug 26, 2024

View reviewed changes

Trashtalk217 mentioned this pull request Dec 1, 2024

Added stress test for large ecs worlds #16591

Merged

Trashtalk217 mentioned this pull request Dec 12, 2024

Replace FixedBitSet in Access and FilteredAccess with sorted vectors #16784

Closed

therealbnut mentioned this pull request Apr 28, 2025

Replace FixedBitSet with SortedVecSet in Access #18955

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Update Access and FilteredAccess to use sorted vecs instead of FixedBitSet #14385

Update Access and FilteredAccess to use sorted vecs instead of FixedBitSet #14385

Uh oh!

cBournhonesque commented Jul 18, 2024 •

edited

Loading

Uh oh!

Uh oh!

hymm commented Jul 25, 2024

Uh oh!

cBournhonesque commented Jul 27, 2024

Uh oh!

cart commented Aug 26, 2024

Uh oh!

Trashtalk217 left a comment

Uh oh!

cBournhonesque commented Aug 26, 2024

Uh oh!

Trashtalk217 commented Aug 26, 2024 •

edited

Loading

Uh oh!

cart commented Aug 26, 2024

Uh oh!

cBournhonesque commented Aug 27, 2024 •

edited

Loading

Uh oh!

hymm commented Aug 27, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Uh oh!

Update Access and FilteredAccess to use sorted vecs instead of FixedBitSet #14385

Are you sure you want to change the base?

Update Access and FilteredAccess to use sorted vecs instead of FixedBitSet #14385

Uh oh!

Conversation

cBournhonesque commented Jul 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Objective

Solution

Testing

Uh oh!

Uh oh!

hymm commented Jul 25, 2024

Uh oh!

cBournhonesque commented Jul 27, 2024

Uh oh!

cart commented Aug 26, 2024

Uh oh!

Trashtalk217 left a comment

Choose a reason for hiding this comment

Uh oh!

cBournhonesque commented Aug 26, 2024

Uh oh!

Trashtalk217 commented Aug 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cart commented Aug 26, 2024

Uh oh!

cBournhonesque commented Aug 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hymm commented Aug 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

cBournhonesque commented Jul 18, 2024 •

edited

Loading

Trashtalk217 commented Aug 26, 2024 •

edited

Loading

cBournhonesque commented Aug 27, 2024 •

edited

Loading

hymm commented Aug 27, 2024 •

edited

Loading