-
-
Notifications
You must be signed in to change notification settings - Fork 23.7k
Optimize Node.find_children by 24%
#112896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
c4fe3d9 to
61d470b
Compare
61d470b to
b6d30a4
Compare
Ivorforce
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note that using lambdas in Godot code is discouraged, as per our guidelines.
It should be possible to make this function completely iterative (for example, by using a single LocalVector<Node *> todo). I'm not sure how this would compare performance wise to the lambda, but I expect it to be faster than the original implementation. A comparison would be nice.
|
@Ivorforce I attempted to replace it with an iterative solution, but that actually turned out to be slightly slower than my recursive lambda solution (65ms -> 69ms). I'm not an expert in C++ unfortunately, so I'm not sure why it would be slower. Iterative solution: TypedArray<Node> Node::find_children(const String &p_pattern, const String &p_type, bool p_recursive, bool p_owned) const {
ERR_THREAD_GUARD_V(TypedArray<Node>());
TypedArray<Node> ret;
ERR_FAIL_COND_V(p_pattern.is_empty() && p_type.is_empty(), ret);
LocalVector<Pair<const Node *, uint32_t>> to_visit;
to_visit.push_back(Pair<const Node *, uint32_t>(this, 0u));
while (!to_visit.is_empty()) {
Pair<const Node *, uint32_t> &check = to_visit[to_visit.size() - 1];
const Node *current_node = check.first;
uint32_t &child_index = check.second;
if (child_index == 0) {
current_node->_update_children_cache();
}
Node *const *child_ptr = current_node->data.children_cache.ptr();
uint32_t child_count = current_node->data.children_cache.size();
bool pushed_child = false;
while (child_index < child_count) {
Node *child = child_ptr[child_index];
child_index++;
if (p_owned && !child->data.owner) {
continue;
}
if (p_pattern.is_empty() || child->data.name.operator String().match(p_pattern)) {
if (p_type.is_empty() || child->is_class(p_type)) {
ret.append(child);
} else if (child->get_script_instance()) {
Ref<Script> scr = child->get_script_instance()->get_script();
while (scr.is_valid()) {
if ((ScriptServer::is_global_class(p_type) && ScriptServer::get_global_class_path(p_type) == scr->get_path()) || p_type == scr->get_path()) {
ret.append(child);
break;
}
scr = scr->get_base_script();
}
}
}
if (p_recursive) {
to_visit.push_back(Pair<const Node *, uint32_t>(child, 0u));
pushed_child = true;
break;
}
}
if (!pushed_child) {
to_visit.resize(to_visit.size() - 1);
}
}
return ret;
} |
|
Ok, the regression is probably overhead from I've got another idea, one that doesn't involve Quick explainer: I can't guarantee this would be noticeably faster than your lambda solution (needs another test), but it's worth a try. Let me know if the proposed solution makes sense to you. |
Thank you, I was able to get identical (if not slightly better) performance using the new iterative approach. |
35cfb9a to
1d28573
Compare
scene/main/node.cpp
Outdated
| if (current_node->data.index + 1 < (int)siblings.size()) { | ||
| // Go to next sibling | ||
| current_node = siblings[current_node->data.index + 1]; | ||
| break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| break; | |
| continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this suggestion.
|
@Ivorforce Sorry, your suggestions became hard to follow with the commits so I marked them as resolved. Please re-add the suggestions if there are more issues. |
|
@Joy-less Please re-open the comments, I don't want to review the same code again with the same comments. |
|
@Ivorforce I've now tested both non-recursive and recursive and they work fine with the new commits. |
I optimized the performance of recursive
find_childrenby 24%. Currently, it creates a new array for each descendant, only to append each descendant to the original array. This pull request makes it use the same array, using a lambda function.Benchmark:
Result:
This means each benchmark call used to take
0.0084msbut now takes0.00636ms.The benchmarks were run in-editor with 41 descendants that I created and scattered randomly to emulate a real scene tree.