-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Update Access and FilteredAccess to use sorted vecs instead of FixedBitSet #14385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Update Access and FilteredAccess to use sorted vecs instead of FixedBitSet #14385
Conversation
I don't think the existing benchmarks are a good test of this pr as they don't use a lot of archetype components. We need a benchmark that creates 100's to 1000's of them or maybe more depending on how many active number of them we expect with relations. |
I'm not sure that to merge this we need a benchmark with tons of archetype components. My thought process is more like: "the current design won't be sustainable with relations, so we want to replace it with something that has equivalent performance for the current usage pattern of bevy". So we need to prove that the change is acceptable for the current kind of queries I think? Also, even with relations, systems would probably have the same amount of access, even though the ComponentIds themselves can get larger. That being set more useful benchmarks are always welcome |
Given how "hot" access is, I think we absolutely need benchmarks here. The current multithreaded executor was implemented under the assumption that constructing and comparing access was very cheap. Ex: the
Calling this O(1) is a bit of a stretch, as something like a union with fixedbitsets actually took O(1) within a block, whereas a union with the new approach is very clearly O(N). The O(1) is only in reference to the construction of the iterator. Resolving it is O(N). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good.
The only thing I can recommend is to maybe extract the SortedSmallVec data structure into a separate file (maybe in bevy_utils).
Would you like to see benchmarks focused on Access operations directly? |
Also, with regards to performance: More benchmarks are always nice, but is it also possible to see a slowdown in some of the examples? Maybe with regards to fps in complicated render scenes? |
I think the highest priority is seeing benchmarks of the executor running many systems with many component accesses. I'd like both a small bevy_ecs-scoped executor benchmark that generates thousands of components used by hundreds of systems. The challenge here is ensuring I'd also like to see if this has measurable effects on frame time in full bevy apps. "Big scenes" matter less than "executes systems in parallel that in combination reference many components". So even something like |
You might have done it, but when you benchmark 3d_scene, you should change the present mode to immediate. That won't really change the multithreaded span, but should change the frame time significantly |
# Objective We currently have no benchmarks for large worlds with many entities, components and systems. Having a benchmark for a world with many components is especially useful for the performance improvements needed for relations. This is also a response to this [comment from cart](#14385 (comment)). > I'd like both a small bevy_ecs-scoped executor benchmark that generates thousands of components used by hundreds of systems. ## Solution I use dynamic components and components to construct a benchmark with 2000 components, 4000 systems, and 10000 entities. ## Some notes - ~I use a lot of random entities, which creates unpredictable performance, I should use a seeded PRNG.~ - Not entirely sure if everything is ran concurrently currently. And there are many conflicts, meaning there's probably a lot of first-come-first-serve going on. Not entirely sure if these benchmarks are very reproducible. - Maybe add some more safety comments - Also component_reads_and_writes() is about to be deprecated #16339, but there's no other way to currently do what I'm trying to do. --------- Co-authored-by: Chris Russell <[email protected]> Co-authored-by: BD103 <[email protected]>
# Objective We currently have no benchmarks for large worlds with many entities, components and systems. Having a benchmark for a world with many components is especially useful for the performance improvements needed for relations. This is also a response to this [comment from cart](bevyengine#14385 (comment)). > I'd like both a small bevy_ecs-scoped executor benchmark that generates thousands of components used by hundreds of systems. ## Solution I use dynamic components and components to construct a benchmark with 2000 components, 4000 systems, and 10000 entities. ## Some notes - ~I use a lot of random entities, which creates unpredictable performance, I should use a seeded PRNG.~ - Not entirely sure if everything is ran concurrently currently. And there are many conflicts, meaning there's probably a lot of first-come-first-serve going on. Not entirely sure if these benchmarks are very reproducible. - Maybe add some more safety comments - Also component_reads_and_writes() is about to be deprecated bevyengine#16339, but there's no other way to currently do what I'm trying to do. --------- Co-authored-by: Chris Russell <[email protected]> Co-authored-by: BD103 <[email protected]>
# Objective We currently have no benchmarks for large worlds with many entities, components and systems. Having a benchmark for a world with many components is especially useful for the performance improvements needed for relations. This is also a response to this [comment from cart](bevyengine#14385 (comment)). > I'd like both a small bevy_ecs-scoped executor benchmark that generates thousands of components used by hundreds of systems. ## Solution I use dynamic components and components to construct a benchmark with 2000 components, 4000 systems, and 10000 entities. ## Some notes - ~I use a lot of random entities, which creates unpredictable performance, I should use a seeded PRNG.~ - Not entirely sure if everything is ran concurrently currently. And there are many conflicts, meaning there's probably a lot of first-come-first-serve going on. Not entirely sure if these benchmarks are very reproducible. - Maybe add some more safety comments - Also component_reads_and_writes() is about to be deprecated bevyengine#16339, but there's no other way to currently do what I'm trying to do. --------- Co-authored-by: Chris Russell <[email protected]> Co-authored-by: BD103 <[email protected]>
Objective
As described in the relations RFC: https://github.com/james-j-obrien/rfcs/blob/minimal-fragmenting-relationships/rfcs/79-minimal-fragmenting-relationships.md#access-bitsets-and-component-sparsesets
The reasons while the access bitsets are efficient currently is because the ComponentIds are dense: they are incremented from 0 and should remain small.
With relations, a ComponentId will have some upper bits 1, so the amount of memory allocated in the FixedBitSet to represent the value would be non-trivial.
Solution
One way to fix the issue would be to replace the FixedBitSets with sorted vectors.
The vectors should remain relatively small since queries usually don't involved hundreds of components.
These allow to do union, difference, intersection operations in O(1).
(however inserting a value is O(n))
Testing
Ran the
running_systems
benchmarks