Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.x] Faster queue free #62444

Merged
merged 1 commit into from
Jan 10, 2023
Merged

Conversation

lawnjelly
Copy link
Member

@lawnjelly lawnjelly commented Jun 27, 2022

Calling queue_free() for large numbers of siblings could previously be very slow, with the time taken rising exponentially with number of children. This looked partly due to ordered_remove from the child list and notifications.

This PR identifies objects that are nodes, and sorts the deletion queue so that children are deleted in reverse child order. This minimizes the costs of reordering.

Fixes #61929
Supercedes #61932

Performance testing, number of child nodes deleted

50000 nodes before 137017 ms, after 422 ms seconds (325x faster)
20000 nodes before 19425 ms, after 250 ms seconds (78x faster)
5000 nodes before 1228 ms, after 169 ms seconds (7.3x faster)
1000 nodes before 194 ms, after 150 ms seconds (1.3x faster)

(at lower node numbers the measurement from the MRP in #61929 becomes less useful as there are some fixed delays, however the pattern is that the benefit really starts to show with a decent number of children)

Notes

  • On further research, it turned out the child id is already stored in nodes, in data->pos, and sorting by child id is a superior approach. It achieves the same upside of deleting in reverse order, but eliminates the possible downside of deleting small numbers of children from a parent that has a large number of children.
  • Compared to the previous PR, the technique can now be employed in all cases (rather than having an arbitrary cutoff), and will always yield some benefit (though exponentially more benefit as the number of children rises), and effectively should have negligible cost.
  • Uses an additional bit twiddling technique to combine the child id with a "group", to ensure that in most cases, the children of the same parent will be sorted together in the list. This could make things more cache friendly, and is super cheap to do, so why not.
  • I also evaluated using a dirty flag for NOTIFICATION_MOVED_IN_PARENT. While this increased performance over baseline, it wasn't nearly as effective as the reverse ordering in this PR.
  • Same principle can be applied in 4.x as the code is similar.

@lawnjelly lawnjelly requested a review from a team as a code owner June 27, 2022 08:32
@lawnjelly lawnjelly mentioned this pull request Jun 27, 2022
@lawnjelly lawnjelly force-pushed the faster_queue_free3 branch 2 times, most recently from caf404c to aea2d1c Compare June 27, 2022 09:13
@lawnjelly lawnjelly changed the title Faster queue free [3.x] Faster queue free Jun 27, 2022
@lawnjelly lawnjelly added this to the 3.x milestone Jun 27, 2022
@lawnjelly lawnjelly force-pushed the faster_queue_free3 branch 3 times, most recently from 98705d4 to b4a0351 Compare June 27, 2022 10:09
scene/main/scene_tree.cpp Outdated Show resolved Hide resolved
scene/main/scene_tree.cpp Outdated Show resolved Hide resolved
scene/main/scene_tree.cpp Show resolved Hide resolved
scene/main/scene_tree.cpp Outdated Show resolved Hide resolved
@lawnjelly lawnjelly force-pushed the faster_queue_free3 branch 2 times, most recently from 329e5fe to 2195681 Compare November 23, 2022 08:51
Copy link
Member

@timothyqiu timothyqiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

I believe the overflow when calculating child_list_id is destined to happen as long as the numbers are large enough. The bit operation removed quite a lot possibilities and it requires a huge number of sibling nodes to overflow now. It's relatively safe even when overflows (not undefined behavior), so I think this is okay :)

@lawnjelly
Copy link
Member Author

Made me wonder what the maximum number of children might be! Not sure this is correct but 😀

The group can be a maximum of 536,870,911 .. (29 bits)
We have 31 bits to play with before overflow to negative value (maximum value 2147483647 which is INT32_MAX) ..

so I think we'd need 2147483647 - 536870911 to overflow:
1,610,612,736 children?

So I think that's enough for 1 and a half billion children of a node. (It could happen!)

@RandomShaper
Copy link
Member

I'm wondering if we should consider this a compat breaking change since it can affect observable behavior (e.g., user scripts handling NOTIFICATION_PREDELETE that for some reason realy on the current order of deletion). In that case it may be good that in 3.x there's a project setting to opt-in sorting (I think if the "queue" is not sorted, the behavior is as formerly, right?). Such a setting could be disabled by default but enabled for new projects, as we've done with other compat breaking but expected to be universally wanted changes.

@RandomShaper
Copy link
Member

Also, I'm wondering if it may be better that, instead of sorting just when about to mass-delete, the delete queue is backed by an already sorted container, such as hash set.

@lawnjelly
Copy link
Member Author

I'm wondering if we should consider this a compat breaking change since it can affect observable behavior (e.g., user scripts handling NOTIFICATION_PREDELETE that for some reason realy on the current order of deletion). In that case it may be good that in 3.x there's a project setting to opt-in sorting (I think if the "queue" is not sorted, the behavior is as formerly, right?). Such a setting could be disabled by default but enabled for new projects, as we've done with other compat breaking but expected to be universally wanted changes.

Sure I can put this in. I would strongly suggest that any sort of guarantee of order should be discouraged - I'll add this in the classref for queue_free(). Without a written guarantee I wouldn't expect wise users to rely on this, but you are correct it is at least possible. Although if there are no reports we could consider removing the option after a couple of betas.

Also, I'm wondering if it may be better that, instead of sorting just when about to mass-delete, the delete queue is backed by an already sorted container, such as hash set.

I'm not super familiar with hash set - what advantage are you thinking in terms of hash set over sorting? I'm assuming the standard Godot sort is something like qsort (i.e. in place and not using dynamic allocation) .. I haven't looked in detail.

@akien-mga
Copy link
Member

akien-mga commented Nov 25, 2022

I'm wondering if we should consider this a compat breaking change since it can affect observable behavior (e.g., user scripts handling NOTIFICATION_PREDELETE that for some reason realy on the current order of deletion). In that case it may be good that in 3.x there's a project setting to opt-in sorting (I think if the "queue" is not sorted, the behavior is as formerly, right?). Such a setting could be disabled by default but enabled for new projects, as we've done with other compat breaking but expected to be universally wanted changes.

Sure I can put this in. I would strongly suggest that any sort of guarantee of order should be discouraged - I'll add this in the classref for queue_free(). Without a written guarantee I wouldn't expect wise users to rely on this, but you are correct it is at least possible. Although if there are no reports we could consider removing the option after a couple of betas.

Since we're speculating here, I prefer going with the "ask for forgiveness" approach. Change the behavior without opt-out, see if anyone complains, and consider adding an option for those users if they exist. Otherwise we're just adding configuration options for things which may not be needed.

Betas don't have to be production ready, so if this behavior change is a showstopper for some users, they should report it after testing the beta and stick to using 3.5.x until their feedback is addressed in later 3.6 betas.

@lawnjelly lawnjelly requested a review from a team as a code owner November 25, 2022 09:41
@lawnjelly
Copy link
Member Author

lawnjelly commented Nov 25, 2022

Since we're speculating here, I prefer going with the "ask for forgiveness" approach. Change the behavior without opt-out, see if anyone complains, and consider adding an option for those users if they exist. Otherwise we're just adding configuration options for things which may not be needed.

Betas don't have to be production ready, so if this behavior change is a showstopper for some users, they should report it after testing the beta and stick to using 3.5.x until their feedback is addressed in later 3.6 betas.

I actually agree here... and I'd just added the project setting lol. I'll change it back, but keep a comment in the queue_free classref to mention that the order is not guaranteed. 👍

EDIT: Done.

Calling queue_free() for large numbers of siblings could previously be very slow, with the time taken rising exponentially with number of children. This looked partly due to ordered_remove from the child list and notifications.

This PR identifies objects that are nodes, and sorts the deletion queue so that children are deleted in reverse child order. This minimizes the costs of reordering.
@RandomShaper
Copy link
Member

Fair enough.

@RandomShaper
Copy link
Member

I'm not super familiar with hash set - what advantage are you thinking in terms of hash set over sorting? I'm assuming the standard Godot sort is something like qsort (i.e. in place and not using dynamic allocation) .. I haven't looked in detail.

To be honest, it's hard to tell for sure without benchmarking, but my idea with a sorted container (it'd be RBSet actually) there may be some code complexity or maybe performance advantages. Just an idea if you wish to try and compare.

@eimfach
Copy link

eimfach commented Jan 9, 2023

+1 for this. I am developing a hex turn-based strategy game with thousands of thousands of tiles and need a fast queue_free and free for releasing tiles out of view. Would be very happy to see this fix soon ! :)

@akien-mga
Copy link
Member

We discussed this yesterday with @reduz and @lawnjelly.
@reduz wants to explore an alternative solution for 4.x, so we'll hold off merging the master PR for now, but for 3.x we agreed to merge this approach.

@akien-mga akien-mga merged commit 1ee7c5a into godotengine:3.x Jan 10, 2023
@akien-mga
Copy link
Member

Thanks!

@akien-mga akien-mga modified the milestones: 3.x, 3.6 Jan 10, 2023
@lawnjelly lawnjelly deleted the faster_queue_free3 branch January 10, 2023 10:01
@shafnaz
Copy link

shafnaz commented Apr 13, 2023

Nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

7 participants