Skip to content

Conversation

@Joy-less
Copy link
Contributor

I optimized the performance of recursive find_children by 24%. Currently, it creates a new array for each descendant, only to append each descendant to the original array. This pull request makes it use the same array, using a lambda function.

Benchmark:

extends Node3D

func _ready()->void:
	while true:
		bench_old()
		bench_new()
		await get_tree().process_frame

func bench_old()->void:
	var start:float = Time.get_unix_time_from_system()
	var results:Array[Node]
	for i:int in 100_000:
		results = find_children_old("*")
	var end:float = Time.get_unix_time_from_system()
	print("old: ", (end - start) * 1000, "ms")
	results.clear()

func bench_new()->void:
	var start:float = Time.get_unix_time_from_system()
	var results:Array[Node]
	for i:int in 100_000:
		results = find_children("*")
	var end:float = Time.get_unix_time_from_system()
	print("new: ", (end - start) * 1000, "ms")
	results.clear()

Result:

old: 839.999914169312ms
new: 636.000156402588ms
old: 832.000017166138ms
new: 635.999917984009ms
old: 833.000183105469ms
new: 638.000011444092ms

This means each benchmark call used to take 0.0084ms but now takes 0.00636ms.

The benchmarks were run in-editor with 41 descendants that I created and scattered randomly to emulate a real scene tree.

@Joy-less Joy-less requested a review from a team as a code owner November 17, 2025 23:48
@Joy-less Joy-less force-pushed the optimize-find_children branch from c4fe3d9 to 61d470b Compare November 17, 2025 23:53
@Joy-less Joy-less force-pushed the optimize-find_children branch from 61d470b to b6d30a4 Compare November 18, 2025 00:11
Copy link
Member

@Ivorforce Ivorforce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that using lambdas in Godot code is discouraged, as per our guidelines.

It should be possible to make this function completely iterative (for example, by using a single LocalVector<Node *> todo). I'm not sure how this would compare performance wise to the lambda, but I expect it to be faster than the original implementation. A comparison would be nice.

@Joy-less
Copy link
Contributor Author

@Ivorforce I attempted to replace it with an iterative solution, but that actually turned out to be slightly slower than my recursive lambda solution (65ms -> 69ms). I'm not an expert in C++ unfortunately, so I'm not sure why it would be slower.

Iterative solution:

TypedArray<Node> Node::find_children(const String &p_pattern, const String &p_type, bool p_recursive, bool p_owned) const {
	ERR_THREAD_GUARD_V(TypedArray<Node>());

	TypedArray<Node> ret;
	ERR_FAIL_COND_V(p_pattern.is_empty() && p_type.is_empty(), ret);

	LocalVector<Pair<const Node *, uint32_t>> to_visit;
	to_visit.push_back(Pair<const Node *, uint32_t>(this, 0u));

	while (!to_visit.is_empty()) {
		Pair<const Node *, uint32_t> &check = to_visit[to_visit.size() - 1];
		const Node *current_node = check.first;
		uint32_t &child_index = check.second;

		if (child_index == 0) {
			current_node->_update_children_cache();
		}

		Node *const *child_ptr = current_node->data.children_cache.ptr();
		uint32_t child_count = current_node->data.children_cache.size();

		bool pushed_child = false;

		while (child_index < child_count) {
			Node *child = child_ptr[child_index];
			child_index++;

			if (p_owned && !child->data.owner) {
				continue;
			}

			if (p_pattern.is_empty() || child->data.name.operator String().match(p_pattern)) {
				if (p_type.is_empty() || child->is_class(p_type)) {
					ret.append(child);
				} else if (child->get_script_instance()) {
					Ref<Script> scr = child->get_script_instance()->get_script();
					while (scr.is_valid()) {
						if ((ScriptServer::is_global_class(p_type) && ScriptServer::get_global_class_path(p_type) == scr->get_path()) || p_type == scr->get_path()) {
							ret.append(child);
							break;
						}

						scr = scr->get_base_script();
					}
				}
			}

			if (p_recursive) {
				to_visit.push_back(Pair<const Node *, uint32_t>(child, 0u));
				pushed_child = true;
				break;
			}
		}

		if (!pushed_child) {
			to_visit.resize(to_visit.size() - 1);
		}
	}

	return ret;
}

@Ivorforce
Copy link
Member

Ivorforce commented Nov 18, 2025

Ok, the regression is probably overhead from to_visit.

I've got another idea, one that doesn't involve to_visit. Instead, you could use the 'walk' pattern, by using the node.get_index() function.

Quick explainer:
Essentially, you set a current_node to this. On each iteration step, you add the current node to the ret (if it fits), and then set current_node to its first child. When there are no children, find the next sibling likeparent.children_cache[current_node to get_index() + 1]. If there is no next sibling (index == parent.children_cache.size()), walk to the next sibling of the parent instead (and repeat the test). If current_node reaches back to this, return ret.

I can't guarantee this would be noticeably faster than your lambda solution (needs another test), but it's worth a try. Let me know if the proposed solution makes sense to you.

@Joy-less
Copy link
Contributor Author

Ok, the regression is probably overhead from to_visit.

I've got another idea, one that doesn't involve to_visit. Instead, you could use the 'walk' pattern, by using the node.get_index() function.

Quick explainer: Essentially, you set a current_node to this. On each iteration step, you add the current node to the ret (if it fits), and then set current_node to its first child. When there are no children, find the next sibling likeparent.children_cache[current_node to get_index() + 1]. If there is no next sibling (index == parent.children_cache.size()), walk to the next sibling of the parent instead (and repeat the test). If current_node reaches back to this, return ret.

I can't guarantee this would be noticeably faster than your lambda solution (needs another test), but it's worth a try. Let me know if the proposed solution makes sense to you.

Thank you, I was able to get identical (if not slightly better) performance using the new iterative approach.

first: 164.000034332275ms // original approach
second: 64.000129699707ms // lambda approach
third: 62.999963760376ms // new iterative approach

@Joy-less Joy-less force-pushed the optimize-find_children branch from 35cfb9a to 1d28573 Compare November 18, 2025 15:01
@Repiteo Repiteo modified the milestones: 4.6, 4.x Nov 19, 2025
if (current_node->data.index + 1 < (int)siblings.size()) {
// Go to next sibling
current_node = siblings[current_node->data.index + 1];
break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
break;
continue;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this suggestion.

@Joy-less
Copy link
Contributor Author

@Ivorforce Sorry, your suggestions became hard to follow with the commits so I marked them as resolved. Please re-add the suggestions if there are more issues.

@Ivorforce
Copy link
Member

Ivorforce commented Nov 19, 2025

@Joy-less Please re-open the comments, I don't want to review the same code again with the same comments.
Viewing them on the "Conversation" page shows the old context of when the comments were written, which is useful especially for comments on "Outdated" code. You can also look at the revision the code was at when I submitted my review (63dd5af) for additional context.

@Joy-less
Copy link
Contributor Author

@Ivorforce I've now tested both non-recursive and recursive and they work fine with the new commits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants