Never Flush Entities #18577
Replies: 3 comments 16 replies
-
It may not be useful now, but I think querying for entities with no components won't be so strange once Bevy's editor workflow matures and defining queries at runtime becomes a common thing. (I can imagine users wanting to know if entities without components exist when they're not expected to.)
Hmm, I don't remember why Too many intermediate archetype/table moves when applying commands is a known problem. I think that problem is better solved by command "batching" (#10154) where we'd "preview" all commands queued on the same entity to first figure out its destination archetype and then move/place it only once.
All of the downsides you've listed sound quite alarming to me. More coupling between the entity metadata and archetype structures, special handling requirements for the empty archetype, and the inherent complexity of the proposed allocation process seem like a combo that'll have future contributors (and even maintainers) very anxious about messing with this code. So even if this is faster, I'm reluctant to support it because of its inherent complexity. Bevy hasn't even presented |
Beta Was this translation helpful? Give feedback.
-
If I understand this correctly, freeing an entity will write it to the What prevents an additional "free" from happening between the decrement and the read during the recycle? That would overwrite the entity in I believe this is why the concurrent queue implementations use a "stamp" value, although I don't see a straightforward way to make that work with a stack instead of a queue. More generally, it looks like you're describing a custom concurrent queue implementation, but I'm still not clear how it differs from other queues or how you expect to do better than them.
This is clever! It reminds me a bit of the data structure used in |
Beta Was this translation helpful? Give feedback.
-
|
@maniwani Your criticism of this design is really helpful. I've been thinking more about it, and I think I now agree with more of what you've been saying. Here's some improvements that should make this less complex and easier to justify:
The only downside is that we would be releasing entities to public APIs that still have an invalid archetype. But that doesn't worry me very much. We already do this now (it's just that flush will be a catch all that applies soon). In practice, as long as we queue a move from invalid archetype to the empty archetype with each command that just spawns an empty, I think this should be a non issue. Do you have any thoughts here? This fixes a lot of valid issues you've brought up, but it could have problems of its own. |
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Background
This is heavily motivated by remote entity reservation, but much of it can be applied without that context.
How
Entitiesworks nowRight now, there are 2 parts of
Entities. First, we keep aVec<EntityMeta>that is the source of truth for where an entity is, who freed it, etc. Second, we keep apendinglist that tracks which entities no longer exist and can be reused.Allocating and freeing entities are simple. To allocate, try to pull from the
pendinglist, and if it's empty, extend themetalist, and return a brand new entity. Once allocated,settheEntityLocationof the entity before releasing it though any public API. To free, make sure the entity exists, increase its generation, and add it to thependinglist.We also need to reserve entities, currently via
&Entitiesbut soon via anArc-like structure that can be shared between threads and operate fully concurrently with&mut Entities. Right now, (with&Entities), We keep afree_cursor: AtomicISizethat tracks the target length ofpending. When we reserve, we decrement thefree_cursoras if we were popping from the list (just like an allocation). Iffree_cursorgoes negative, we just need to extend themetalist by its absolute value. In other words, if we do an = free_cursor.fetch_sub(1), ifn >= 0, we reserved thepending[n]entity, otherwise we reserved the brand new entity at indexmeta.len() -n - 1. Now the reservation has given us anEntity. But right now, it's invalid; it has noEntityLocation. That's whereflushcomes in.flushsets theEntityLocationof newly reserved entities to point to the empty archetype.What's wrong with this
Even aside from remote entity reservation, a lot could be improved. First, let's clarify exactly what
flushdoes. It removes reserved entities from thependinglist, creates new, reserved entities on themetalist, and performs aninitcall on all of them. Technically,initis kept generic, but in practice, it alwayssets the entity location to the empty archetype. So what does that look like? We push the entity onto theArchetype::entities: Vec<ArchetypeEntity>.ArchetypeEntityis just anEntityand it'sTableRow.So let me get this strait. Let's say we use commands to spawn 100 entities with
MyComponent. And lets say there are somependingentities to reuse (which is expected after the app has been running for a bit). For each of those 100 entities: 1) We remove them from thependinglist. 2) We re-arrange them into aArchetypeEntityby addingTableRow::INVALID(the empty archetype has no table.) 3) WesettheirEntityLocation. Then, now thatflushis over, we 4) swap remove them from the empty archetype. 5) Insert them into theMyComponentarchetype, and 6)settheir newEntityLocation.What the heck? Steps 1-3 are completely irrelevant after steps 4-6. It's wasted work! And's it's not just "oh we need one more mem-copy calls" since we have to change
EntitytoArchetypeEntityat both steps. Plus, we're swap removing front to back, so that takes even longer too! Again, what the heck?Solution (WIP that needs discussion)
I'll cut to the chase: Let's remove the empty archetype and consider "invalid" (not yet flushed) entities as valid but empty entities. In other words:
ArchetypeId::INVALIDbecomesArchetypeId::Empty. As soon as an entity is reserved or allocated, it is fully valid and usable; it's just empty.That means, we never need to
flush. We save 50% of the work for most commands. We ditch expensiveflushcalls that really just existed to let usallocorfreean entity. Etc. This could be a really big win for performance!There are some downsides too:
Entitiesneeds to keep archetypeEdgesand have some extra moving parts.Query<Entity>can't just loop through every Archetype anymore. But in practice, who doesQuery<Entity>completely unqualified. The minute you add any component into the query at all, this goes away.Details
How does this work with remote entity reservation, and what does the new
Entitieslook like in practice.First, lets get some obvious things out of the way. 1) The
metalist needs to continue to be a normalVecdirectly onEntitiesfor performance. 2)reserve_entityandreserve_entitiesare now exactly the same asallocmechanically, so we can get rid of them. 3) As a resultallocand a newalloc_manyneed to be created with only&Entities. 4) Because of remote reservation, ideallyallocat least only needs some shared,Arc'd state, not full&Entities. This prevents remote reservation from being blocking or async waiting. This is bonus points, but it's ends up not costing anything extra.With that out of the way, the next big question is "how does reservation work?" The answer is complex, so let me break this down. At the end of the day, we need something like a
Vec<Entity>that stores a slice of empty archetype entities followed by a slice of pending/free entities. Let's call thisownedsince these entities are owned byEntities. We also need an atomic int to separate them,free_cursor, which holds the index intoownedof the first free/pending entity. And we need an atomic int to track the length of themetalist; let's call itmeta_len.Let's explore that model a little bit:
gettheEntityLocationof anEntity, we need a bit of work. If the index is valid inmeta, we can just get it like normal. If the index is greater than the atomicmeta_lenor the entity's generation is higher than that of a brand new entity, the entity is invalid/from a different world, so we returnNone. If neither of those conditions are met, the entity must be brand new, so it is not inownedand it'sArchetypeRowisArchetypeRow::INVALID. (Originally I thought usingArchetypeRow::INVALIDhere was dangerous, but it ends up being completely safe.) This takes 1 relaxed atomic operation for brand new entities.EntityMetaof an entity index (internally), we just accessmetaat the proper index. If that index is currently out of bounds, we just extend it with theEntityMetafor a brand new entity. A brand new entity hasArchetypeId::EMPTYandArchetypeRow::INVALID, so we just fill that in. (No atomic operations)settheEntityLocationof anEntity, there's a bit more work too. First, we update itsEntityMetainmeta. If it's previous location isArchetypeId::EMPTYand it is notArchetypeRow::INVALID, then we need to remove it from ourownedlist. This is the hardest part since a remote thread might be doingallocat the same time as us. First, if thefree_cursorindex is not valid inowned, we have theownedlist all to ourselves, so we can just swap remove and be done. Otherwise, we start by incrementingfree_cursorto get an index that we own. Then we swap the entity to remove with that index, and swap theirArchetypeRows accordingly. Now, the entity leaving the empty archetype is right before the currentfree_cursor. So, we do an atomic compare exchange with thefree_cursorto try to decrement it. If that works, we're done. Otherwise, another one must have been reserved, so we need to do that swap again to put the entity leaving the empty archetype right beforefree_cursor. We try this again until the compare exchange works. Note that the compare exchange will pretty much always work the first time and that none of this will happen unless we're moving a freshly allocated reused entity. In practice, when the current archetype is empty, this has between 1 and 3 atomic operations, all of which can be relaxed I think.owned, and update itsEntityMetainmeta. We also need tosetits new location. Importantly, it is now in the empty archetype (though not yet spawned), so we set it'sEntityLocation::archetype_idtoArchetypeId::EMPTYand we set itsEntityLocation::archetype_rowto its index inowned. Iffree_cursoris greater than this index, that means this entity is now the only pending/free entity, and we need to setfree_cursorto its index inowned. This takes 2 relaxed atomic operations.alloc/reservean entity, we just increment thefree_cursor. If the previousfree_cursorwas a valid index inowned, we look it up inowned, and return it. Otherwise, we just return a generation 0 entity at indexmeta_len.fetch_add(1). This takes either 1 or 2 relaxed atomic operations, which we can amortize some in the future.owned[0..free_cursor], the new entities with indicesmeta.len()..meta_len, and any new indices ranges we missed because of recentmetaresizes. We'll have to do a tiny bit if book keeping for this, but not much. (Or we can just not iterate this because what would be the point?)If these atomic operations are bothering you, remember that all of them are relaxed, which is pretty much free.
To do this model, what we need is a
Vec<Entity>-like data structure forownedthat we canpush,pop, and index without mutable access. Thankfully, this is a much simper problem that's already solved. We need some form of liked list, but to make things more compact, we need a linked list ofBox<[UnsafeCell<Entity>]>s where the length of each slice is doubling. We can also skip the first few powers of 2, and we can flatten the linked list into aVec<OwnedChunk>, where each chunk is theBox<[UnsafeCell<Entity>]>described, and the first one has length 256. We could even store a pointer to the biggest chunk inline since that's the most likely to be used. We also need an atomic integer for the length, and in reality, thoseBoxes will need to beAtomicPtr, but we can still useRelaxedordering, so there's no practical difference.Note that this structure lets us index easily (we don't have to do any searches), has the same memory footprint as a normal
Vec, has the same growing strategy as a normalVec, etc. The only downside here is that it uses more cache lanes if the chunks are not adjacent in memory. But remember that unless we're finallyseting theEntityLocationof anEntityweallocated ages ago, we're only going to be using the biggest chunk (1 cache lane). And worst case, we just need 2 for swaping between chunks. Just like the currentflushing system uses 2 (thependinglist and theentitiesin the empty archetype).Final notes
This could maybe be simplified a little bit without the remote reservations requirement. For example, we could maybe re-introduce
flushfor fulfilling remote reservations that are waiting in async. But this doesn't give us many wins for performance over this implementation.Part of me thinks I've missed something (otherwise we'd be doing this already), but I can't think of anything. Note that this would use a few atomic operations even during
alloccalls, etc. But they could all beRelaxed(no more costly than a normal operation just not as optimizable by the compiler). So I don't think this is a big deal.In any case, I think that this would be much much faster than the current
flushsystem, since it's doing double the work it needs to! Plus this solves a lot of design problems with remote reservation, etc.Beta Was this translation helpful? Give feedback.
All reactions