Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Node children management #75627

Merged
merged 1 commit into from
Apr 8, 2023

Conversation

reduz
Copy link
Member

@reduz reduz commented Apr 3, 2023

  • Adding and removing child nodes is now constant time (including name validation), speed up should be huge.
  • Searching for node paths as in ("path/to/node") should be far faster too.
  • The rest of the operations will likely be a small bit slower and user memory, which is acceptable given that the goal of this PR is to produce more "all rounder" performance.

This changes the children management and makes it a hashmap, optimizing most StringName based operations. Most operations should be severe speed up without breaking compatibility.

This should fix many issues regarding to node access performance, and may also speed up editor start/end, but benchmarks are needed. So if you want to test, please make some benchmarks!

Further improvements in performance will be achieved with the removal of NOTIFICATION_MOVED_IN_PARENT, but this is left for a future PR. Done #75701.

@reduz reduz requested a review from a team as a code owner April 3, 2023 20:36
@KoBeWi KoBeWi added this to the 4.x milestone Apr 3, 2023
@Calinou
Copy link
Member

Calinou commented Apr 3, 2023

This may help in resolving #71182 and/or #75369, though the bottlenecks may be elsewhere.

Comment on lines +388 to +389
data.children_cache.remove_at(child_index);
data.children_cache.insert(p_index, p_child);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be made to be a single operation by adding relevant method in LocalVector (as currently it will needlessly shift elements after MAX(p_index, child_index) back and forth).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly a good idea, but beyond the scope of this PR, the original code did the same thing anyway.

scene/main/node.h Outdated Show resolved Hide resolved
scene/main/node.cpp Outdated Show resolved Hide resolved
@lawnjelly
Copy link
Member

The only compatibility breaking change is that NOTIFICATION_MOVED_IN_PARENT is no longer sent on node removal. I think sending this notification there was not correct anyway.

I haven't tested the PR but an immediate problem that might crop up: afaik this notification was used to update the draw order in VisualServer (at least in 3.x, but guessing the same in 4.x):

If you have 4 children, order 0, 1, 2, 3.

Delete the middle two, if the draw order is not updated in visual server, the order there is now 0, 3 (in the client is 0, 1). This works, because even though the numbers are out of sync the sorting is the same.

Add another child, and the order on the client is 0, 1, 2 (2 being the new child).
On the visual server if only the final added node draw order is updated, the draw order becomes: 0, 3, 2 (incorrect).

@reduz
Copy link
Member Author

reduz commented Apr 4, 2023

@lawnjelly makes sense, but then it sounds like this is just a matter of CanvasItem hooking to removed child notification on its own to fix this precise issue.

@reduz reduz force-pushed the faster-node-child-management branch 2 times, most recently from c4c6d63 to f676397 Compare April 4, 2023 13:41
@bruvzg bruvzg self-requested a review April 4, 2023 13:43
@TokageItLab
Copy link
Member

Ah indeed, these are exactly the things that need to be cached.... As an aside, I think AnimationTree systems needs an optimization similar to this one, so I'll keep it in mind for me.

Copy link
Member

@KoBeWi KoBeWi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some testing and there doesn't seem to be any obvious regression.
The code looks fine.

Copy link
Member

@RedworkDE RedworkDE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes deleting nodes go from linear in the number of children to quadratic in the number of children

var hash = Engine.get_version_info()["hash"]
for count in range(1000, 10001, 1000):
    var start = Time.get_ticks_msec()
    var root = Node.new()
    for i in count:
        root.add_child(Node.new())
    root.free()
    var end = Time.get_ticks_msec()
    print("%s\t%s\t%d" % [hash, count, end - start])
Long table of benchmark numbers
commit count time in ms
5fbbe3b 1000 4
5fbbe3b 2000 7
5fbbe3b 3000 10
5fbbe3b 4000 15
5fbbe3b 5000 18
5fbbe3b 6000 23
5fbbe3b 7000 27
5fbbe3b 8000 33
5fbbe3b 9000 35
5fbbe3b 10000 40
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 1000 7
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 2000 21
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 3000 40
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 4000 71
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 5000 135
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 6000 228
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 7000 413
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 8000 562
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 9000 719
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 10000 888
5fbbe3b 1000 3
5fbbe3b 2000 6
5fbbe3b 3000 11
5fbbe3b 4000 16
5fbbe3b 5000 19
5fbbe3b 6000 24
5fbbe3b 7000 29
5fbbe3b 8000 32
5fbbe3b 9000 35
5fbbe3b 10000 40
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 1000 8
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 2000 20
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 3000 60
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 4000 90
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 5000 144
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 6000 229
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 7000 370
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 8000 527
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 9000 699
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 10000 892
5fbbe3b 1000 4
5fbbe3b 2000 7
5fbbe3b 3000 11
5fbbe3b 4000 14
5fbbe3b 5000 17
5fbbe3b 6000 22
5fbbe3b 7000 27
5fbbe3b 8000 30
5fbbe3b 9000 43
5fbbe3b 10000 39
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 1000 7
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 2000 20
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 3000 40
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 4000 72
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 5000 134
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 6000 240
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 7000 360
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 8000 519
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 9000 697
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 10000 884

image

Without the free this PR is about a 10% slowdown:

Details
commit count time in ms (without free)
5fbbe3b 10000 35
5fbbe3b 20000 70
5fbbe3b 30000 121
5fbbe3b 40000 149
5fbbe3b 50000 224
5fbbe3b 60000 222
5fbbe3b 70000 278
5fbbe3b 80000 341
5fbbe3b 90000 393
5fbbe3b 100000 473
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 10000 42
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 20000 75
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 30000 116
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 40000 164
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 50000 218
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 60000 275
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 70000 305
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 80000 371
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 90000 432
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 100000 520
5fbbe3b 10000 35
5fbbe3b 20000 68
5fbbe3b 30000 115
5fbbe3b 40000 155
5fbbe3b 50000 189
5fbbe3b 60000 276
5fbbe3b 70000 306
5fbbe3b 80000 351
5fbbe3b 90000 393
5fbbe3b 100000 490
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 10000 38
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 20000 76
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 30000 118
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 40000 163
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 50000 220
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 60000 252
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 70000 299
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 80000 374
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 90000 430
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 100000 539
5fbbe3b 10000 40
5fbbe3b 20000 86
5fbbe3b 30000 129
5fbbe3b 40000 171
5fbbe3b 50000 261
5fbbe3b 60000 265
5fbbe3b 70000 291
5fbbe3b 80000 354
5fbbe3b 90000 424
5fbbe3b 100000 479
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 10000 39
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 20000 108
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 30000 120
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 40000 179
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 50000 210
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 60000 266
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 70000 305
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 80000 361
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 90000 420
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 100000 513

image

(note that there are 10x more nodes in this example to get the times larger)

@reduz
Copy link
Member Author

reduz commented Apr 4, 2023

@RedworkDE ah I just realized that my latest commit re-adding NOTIFICATION_MOVED_IN_PARENT in remove_child really messes things up. I am not sure I will be able to get the optimization done properly until I remove this notification.

So I guess ultimately, this PR has to be merged first, then I can remove that notification and the optimization will work as intended :(

@reduz
Copy link
Member Author

reduz commented Apr 5, 2023

Opened #75701 which needs to be merged before this one can be fixed.

@arkology
Copy link
Contributor

arkology commented Apr 5, 2023

@RedworkDE could you please also check memory usage difference? I saw somewhere that the new approach with hashmap will consume much more RAM.

@RedworkDE
Copy link
Member

Memory usage of a node with lots of children goes up about 10%:

First rows are bytes, second rows are the increase over master.

commit 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000
5fbbe3b 11401336 22605304 33907448 45733944 59422456 68685304 84763896 91929592 103289592 123038200
  1 1 1 1 1 1 1 1 1 1
6e78200 12605452 25013116 37355260 50549532 65278044 75580892 92175196 101560444 113960444 134749052
  1.11 1.11 1.10 1.11 1.10 1.10 1.09 1.10 1.10 1.10
c4c6d63a51a3af094115121228bfc30ef02e9a9d 12605452 25013116 37355260 50549532 65278044 75580892 92175196 101560444 113960444 134749052
  1.11 1.11 1.10 1.11 1.10 1.10 1.09 1.10 1.10 1.10
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 12605452 25013116 37355260 50549532 65278044 75580892 92175196 101560444 113960444 134749052
  1.11 1.11 1.10 1.11 1.10 1.10 1.09 1.10 1.10 1.10

And since I am now posting here anyways: some performance numbers as well with some more scenarios and a version with the other PR included:

First rows are ms for the operations with 10'000 children (see code).

commit adding deleting insert_second move_first removing_random removing_reverse shuffle
5fbbe3b 35.00 52.00 1204.33 1185.33 361.33 738.00 596.33
  1.00 1.00 1.00 1.00 1.00 1.00 1.00
6e78200 41.67 61.00 1637.67 6969.00 1589.67 1375.33 205.33
  1.19 1.17 1.36 5.88 4.40 1.86 0.34
c4c6d63a51a3af094115121228bfc30ef02e9a9d 44.33 57.33 2349.00 7404.00 1642.00 1380.33 413.67
  1.27 1.10 1.95 6.25 4.54 1.87 0.69
f6763970aaed5f9f1bfc4b9890af81fe2307a75c 38.67 789.33 2287.00 7681.33 2439.67 2055.00 396.33
  1.10 15.18 1.90 6.48 6.75 2.78 0.66
Code for the operations
	var adding = func(count : int):
		perf_check("adding\t%d" % [count], func():
			var root = Node.new()
			for i in count:
				root.add_child(Node.new()))
	var deleting = func(count : int):
		perf_check("deleting\t%d" % [count], func():
			var root = Node.new()
			for i in count:
				root.add_child(Node.new())
			root.free())
	var move_first = func(count : int):
		perf_check("move_first\t%d" % [count], func():
			var root = Node.new()
			for i in count:
				var node = Node.new()
				root.add_child(node)
				root.move_child(node, 0))
	var insert_second = func(count : int):
		perf_check("insert_second\t%d" % [count], func():
			var root = Node.new()
			var first = Node.new()
			root.add_child(first)
			for i in count:
				first.add_sibling(Node.new()))
	var removing_reverse = func(count : int):
		perf_check("removing_reverse\t%d" % [count], func():
			var root = Node.new()
			for i in count:
				root.add_child(Node.new())
			for i in count:
				root.remove_child(root.get_child(0)))
	var removing_random = func(count : int):
		perf_check("removing_random\t%d" % [count], func():
			var root = Node.new()
			for i in count:
				root.add_child(Node.new())
			var rnd = RandomNumberGenerator.new()
			rnd.seed = 0
			for i in count:
				root.remove_child(root.get_child(rnd.randi_range(0, count - i - 1))))
	var shuffle = func(count : int):
		perf_check("shuffle\t%d" % [count], func():
			var root = Node.new()
			for i in count:
				root.add_child(Node.new())
			var rnd = RandomNumberGenerator.new()
			rnd.seed = 0
			for i in count:
				root.move_child(root.get_child(rnd.randi_range(0, count - i - 1)), rnd.randi_range(0, count - i - 1)))

No changes in algorithmic complexity, but everything other than shuffling the nodes children seems to be a lot slower in this PR, but i may have just some changes that should be made when integrating the notification PR.


Are there any scenarios I should be testing where this is expected to be good/bad or that someone needs in particular?

@reduz
Copy link
Member Author

reduz commented Apr 5, 2023

@RedworkDE Simple addition/removal cases are not going to be much of a difference, and increase in memory usage is expected. That said, #75701 needs to be merged and I need to rebase this one upon it, otherwise there will not be much of a point in benchmarking for now.

@lawnjelly
Copy link
Member

lawnjelly commented Apr 6, 2023

Are there any scenarios I should be testing where this is expected to be good/bad or that someone needs in particular?

Hash table is for finding children by name faster. Operations that don't use name are likely same or (hopefully only slightly) slower. How much hash table helps presumably will depend how often named operations are used, and whether they are bottlenecks.

For the other PR #75701, the original issue I noticed the problem is #61929 , this was when I was writing mesh merging, which does a lot of node sorting. See also #65581 and #74672 which contain more explanations / info, these were fixes for the same problem in 3.x. Also #62444 deals with the same problem in the specific case of queue_delete(). Note that this problem is most noticeable (becomes pathological) with large numbers of the children - the problem tends to scale with the square of the number of children, so in "normal" situation you may not notice it.

Things that are likely to have a lot of improvement, all when there are large number of children in a parent node (say 10,000 or more):

  • Deleting large numbers of children (first to last) in the same parent (not just deleting the parent node)
  • Moving large numbers of children particularly at the start of the child list
  • Detaching / attaching children, same as above

The reason is that each deletion / move / attach / detach potentially invalidates the draw index of every child following the one changed. At present each time an operation is done, the entire remaining list of children is updated with a NOTIFICATION_MOVED_IN_PARENT, and a call to VisualServer (in the case of canvas items / layers). This can happen repeatedly for every child moved.

So if you have 10,000 children, deleting the first 10 children (1 by 1) can result in approx 100,000 notifications.

With the flush once approach these notifications are deferred and only occur once per frame or tick.

@akien-mga
Copy link
Member

Thanks to everyone involved!

@akien-mga akien-mga modified the milestones: 4.x, 4.1 Apr 8, 2023
@arkology
Copy link
Contributor

arkology commented Apr 8, 2023

Adding and removing child nodes is now constant time (including name validation), speed up should be huge.

and may also speed up editor start/end, but benchmarks are needed

Is it OK that nodes addition, which is one of the most common use cases, became 20% slower? The other like nodes creation, get_parent(), free() etc. also doesn't look promising.
Of course I could miss something from the data in comparison table. Interesting how this will influence real use cases like editor start/end and user projects.

(I'm not dissatisfied and don't want to be rude, just want Godot to get faster and more stable :) )

@reduz
Copy link
Member Author

reduz commented Apr 8, 2023

@akien-mga Ah, I was expecting more review/testing before merging it. Well YOLO.

@arkology Its a tradeoff. Keep in mind that this went together with #75701, #75760 and #75797, which already hugely optimize children addition and other operations. Adding children is already very fast now, and far faster than 3.x even with this PR.

Still, this PR does add a small penalty in base cost for performance and memory usage, but in exchange it optimizes many other common use cases that had abysmal performance (as in, adding named children, get_node() or removing children in random order -again, all very common use cases- were very slow with large number of nodes and this makes them fast).

@seppoday
Copy link

seppoday commented Apr 8, 2023

What is the difference between "adding" and "adding_named". Why such huge gap?

@arkology
Copy link
Contributor

arkology commented Apr 8, 2023

(Just some thoughts).
Also from table it's interesting that adding one node and than use add_sibling() will be faster than before these pull requests with optimizations. But usual add_node() (how ofter anyone used named nodes? I don't think it was often) became 27% slower which is really huge. As I see, most common operations I uses 90% of time got slower and I should set name to nodes so it doesn't get slower.
Anyway more proper testing from users will say final word. I think it's worth specially highlight these changes in the next post so users can give more feedback if things get worse. Can't wait to try it all in the new RC.

@KoBeWi
Copy link
Member

KoBeWi commented Apr 8, 2023

What is the difference between "adding" and "adding_named"

Named children need to be validated for name conflicts. Unnamed (these with bunch of @s in name) are auto-generated and they are faster to insert.

how ofter anyone used named nodes? I don't think it was often

Any scene you instantiate has named nodes...

@reduz
Copy link
Member Author

reduz commented Apr 8, 2023

@seppoday Adding named means adding a child that has a name, this is currently very slow. As an example, if you have something you brought via preload/packed_scene its most likely going to be named, hence if you add too many its going to start slowing down. This PR fixes that.

@arkology Named nodes is very common, if you do add_child (preload("res://some_scene.tscn") ) this is a named node. As I said also, nothing is slower than before because of other optimizations (that made it these past days) had a lot more significant impact on improving this situation than the drawbacks this creates.

@seppoday
Copy link

seppoday commented Apr 8, 2023

Named children need to be validated for name conflicts. Unnamed (these with bunch of @s in name) are auto-generated and they are faster to insert.

Looking at test values. Adding named are now almost 2x faster than adding (no named)... It was waaaaaay slower before PR.

(PR is amazing overall, just trying to understand why adding named gained such boost and also it became faster than adding (no named))

Adding was 83 ms. Now is 105 ms.
Adding Named was 17777 ms. Now is 65.8 ms (!)

@arkology
Copy link
Contributor

arkology commented Apr 8, 2023

Any scene you instantiate has named nodes...

Named nodes is very common, if you do add_child (preload("res://some_scene.tscn") ) this is a named node.

Oh, my fault, I fully forgot about scenes instantiation... Sorry about that. I should stop creating most of the nodes in runtime via scripts 😅 Or set their names to get speedup.
Anyway I'm happy with any optimizations in Godot and looking forward to test results from other users! (Based on the results of what I tested in my project, the times are about the same, maybe a bit slower. But since I did not test properly (just runtime nodes creation in some corner case), the results cannot be called reliable.)

Adding was 83 ms. Now is 105 ms.
Adding Named was 17777 ms. Now is 65.8 ms (!)

Interesting, could something more be done with addition of unnamed nodes (above already existed PRs)...

@reduz
Copy link
Member Author

reduz commented Apr 8, 2023

Interesting, could something be done with addition of unnamed nodes...

@arkology Why do I bother writing :( As I just mentioned, it was already heavily optimized, if you compare with 4.0 its most likely much faster already even after this PR was merged. It will not be faster than the approach it replaces, but it is far faster than 4.0 mainline or 3.5.

@seppoday
Copy link

seppoday commented Apr 8, 2023

I'll just cut conversation. Overall PR is amazing in some cases. There just was some question (probably unnecesary) from noobs (including me). I think we can just close disscusion and move on :P

Good job everyone!

@L4Vo5
Copy link
Contributor

L4Vo5 commented Apr 8, 2023

Even if they're both faster than before due to other PR's, it would seem that if you create a node with .new(), it is now a valid optimization technique to give it a name before calling add_child()? That seems odd, so I'm not surprised they asked several times. It'd also be an important thing to note if people somehow still run into bottllenecks with adding nodes.

@akien-mga
Copy link
Member

akien-mga commented Apr 8, 2023

There are three scenarios for node naming:

  • Give the node a name which will be unique and not cause conflict (fastest). This requires you to know all possible siblings and assumes that the logic you use in your script to set and create that unique name String/StringName is free/cheap.
  • Give the node no name, in which case its name is auto-generated not to conflict with siblings, and this is fast, but a bit less so than not having to do anything (first case).
  • Give the node a name which conflicts with an existing sibling, in which case the logic to de-duplicate the name will be a bit slower (but has been greatly optimized). This is the slowest case, and also a very common one (e.g. create 10000 instances of a packed scene with "Bullet" root node).

@reduz
Copy link
Member Author

reduz commented Apr 9, 2023

@L4Vo5 Giving it a name and then adding is not necesarily faster, what is faster is adding it once it has a name. This is very useful when using PackedScene and creating instances of it. Node::add_child() is already very fast, so I would not worry too much trying to find out how to use it most efficiently.

@RedworkDE
Copy link
Member

I was curious about the "optimization" of giving nodes a unique name before adding them: The very simple name generator of just using a counter is 50% slower than just adding the node unnamed, so there is no point in doing that.

Also the second row from my benchmarks is before reduz two other optimzation prs, so adding unnamed nodes is really still 20% faster than before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.