-
-
Notifications
You must be signed in to change notification settings - Fork 21.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize octree and fix leak #41123
Optimize octree and fix leak #41123
Conversation
19a9239
to
99443b4
Compare
@clayjohn Just for reference, I've done some profiling of the godot physics octree and it looks like the lack of improvement with larger octant limits may be due to the pairing, which I haven't altered (and am not planning on changing). For this reason I'm now thinking in terms of removing the project setting for the physics octree and just sticking with old version functionality for the physics octree and probably the visibility notifier octree (for safety), as the performance improvements are probably limited to the rendering octree. I'm also trying to get some non-ugly compile time switching of whether to use the cached lists too (yes for rendering, no for physics as it will save memory and makes little difference to performance). |
That sounds like a pragmatic approach. Enable the optimization only for where we know it helps in all cases. That way we can reduce the risk of the change as much as possible. |
I tested out this PR a little bit this evening. Unfortunately, I spotted a regression. In the TPS demo, just moving the character at full speed causes the character to disappear until they are moved back to the original position, and then if you walk slowly they will not disappear. I am guess something is getting lost when objects move too fast. The performance was slightly better with |
Ah that's not good I'll try and track that down. Possibly somewhere I'm not refreshing the cached list. The limit per octant should not (in an obvious way at least) be able to introduce regressions, unless there is an existing bug in e.g. the pairing. There is actually a small existing bug in the PR in the brackets which I've noticed while trying to come up with an elegant refactor to allow optionally removing the cached lists. I doubt it is causing the regression but I'll know soon.
Yes I think despite the hype in vblanco's thread #23998, optimizing the octree seems to only give a large overall improvement in certain scenarios - like you say lots of moving objects (say an asteroid type game). The octree is often running significantly faster, but when everything else is also slowing things down, it isn't always that much of a bottleneck in the overall frame time. I don't think this is a good reason not to sort these problems though. I get the impression there are multiple key bottlenecks in Godot currently (including things like octree, gdscript, variants, nodes, threading overheads using things like visual server / physics, GPU fill rate, occlusion) and they will all need to be improved in order to get great performance overall. Improving any one of them will give good improvements in specific benchmarks, but won't massively improve an 'average' game, only their cumulative effect. Edit: Ok there are now 2 versions of octree, one with the cached list. I had to spend a bit of time coming up with a non ugly way of turning off the cached lists at compile time. Usually this kind of thing can be done purely with templates (and may be able to be in a way I haven't figured out) but here I've just used macros. I did consider using some template methods such as CRTP but that could have made the code hard to read, it should be easily understandable using the macro approach. I still have to find this regression, being difficult because I've had limited success installing the tps demo project so far. |
747c8ac
to
23ee913
Compare
If anyone needs the tps-demo, you can download it from here (link valid until 2020-09-13): https://0x0.st/iY8e.zip |
Ok, regression hopefully fixed. I missed out setting the dirty flag on the octant in Move(). I've put in an optional debug check to catch any of these cases in future (not enabled by default as it slows performance). Edit: Ah no finally got TPS demo working and the player is still disappearing sometimes. I'll see if I can find out what is causing it. It's something in the cached lists, when I turn them off it works fine. So it is probably another simple fix with them getting dirty. FPS goes from 13 -> 17 with cached lists on for me so well worth having them working (presuming that it isn't because it is rendering less). Ah got it, it is moving the elements is changing the AABB, so all the owner octants need to get made dirty. |
5a024bb
to
7dd5d73
Compare
I tested the TPS demo on the latest commit and it appears to work great! :) |
Prevents adding new octants until a limiting number of elements have been added to the current octant. This enables balancing the benefits of brute force against the benefits of spatial partitioning. The limit can be set per octree. Project settings are added for rendering octree to set the best balance per project depending on number of tests per frame / tick, and the amount of editing of the octree. Fixes octants being leaked when removing elements. Optimize octree with cached linear lists Storing elements in octants using linked lists is efficient for housekeeping but very slow for testing. This optimization stores additional local_vectors with Element pointers and AABBs which are cached and only updated when a dirty flag is set on the octant. This is selectable with 2 versions of Octree : Octree and Octree_CL, Octree being the old behaviour. At present the cached list version is only used for the visual server octree (rendering) as it has only been demonstrated to be faster there so far. This uses slightly more memory (probably a few kb in most cases) but can be significantly faster during testing (culling etc). Co-authored-by: Sergey Minakov <[email protected]>
For future reference: Although I think this PR should be merged as is, based on some discussion about this maybe fixing #38142, in the future it may be worth investigating whether tiny objects can also cause problems due to runaway octants in the visibility notifier and godot physics octrees. There are now 2 versions of the octree, the old one and the new optimized one, and the new one is only applied in the visual server octree, where it was shown to give performance benefit - the visibility notifier and the godot physics still use the old one. However if it turns out the other octrees are susceptible to the runaway behaviour (I hesitate to call it bug, but it is an unwanted feature) we should look at solving this, either by swapping to the new octree, or modifying the old one to guard against this without affecting performance, perhaps with a simple depth limit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell this looks good. I like the approach to splitting the octree implementations. But this is really going to need @reduz' approval before merging. Hopefully we can steal his attention for a bit once he finishes with particles.
@@ -1168,6 +1168,10 @@ | |||
<member name="rendering/quality/shadows/filter_mode.mobile" type="int" setter="" getter="" default="0"> | |||
Lower-end override for [member rendering/quality/shadows/filter_mode] on mobile devices, due to performance concerns or driver support. | |||
</member> | |||
<member name="rendering/quality/spatial_partitioning/render_tree_balance" type="float" setter="" getter="" default="0.17"> | |||
The rendering octree balance can be changed to favor smaller ([code]0[/code]), or larger ([code]1[/code]) branches. | |||
Larger branches can increase performance significantly in some projects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the downside about having large branches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the downside about having large branches?
It is a complicated relationship from what I could see. On a very basic level, having larger branches is quicker for editing (moving objects around, inserting, erasing), but can be slower in certain types of test (particularly those in physics).
However, the difference due to editing is a one off cost within a frame (or tick), whereas the cost for testing depends how many tests are done.
For instance for visibility, it most situations I have tested there is benefit from having larger branches. It could be argued to have the default set higher, however at the moment I am tending towards a setting which can give a performance increase when tweaked, rather than one that can remove a performance drop when tweaked.
For visibility, in a lot of cases there may only be one or a few tests performed on the octree, hence the balance for editing speed is usually better. However, if you were making a game that made a large number of visibility tests the balance could shift the other way.
For godot physics on the other hand, with a large number of objects (thousands) and thus many thousands of tests, smaller branches always seemed to be better. Hence removing the new tree from there unless evidence shows it can lead to better performance.
Alright, time to merge and get this battle tested in 3.2.4 beta 1 :) @reduz says the changes seem fine but he's not so familiar with this code anymore to give an in-depth review, yet he trusts the extensive work done by @lawnjelly and @naithar. So let's merge and see what happens, we should have plenty of time until 3.2.4-stable to catch potential regressions, if any. |
Thanks hugely! |
The pairing is used for e.g. linking a light with objects, and the code for it is quite convoluted, and would need quite a bit more time for me personally to invest as I didn't write it (and I have many higher priority tasks). It is known to be inefficient, that's why it is planned to be removed in 4.x, with a bit of a paradigm shift. For 3.x the emphasis is on compatibility and not breaking existing functionality, so it is difficult to remove pairing, it would either need to be made more efficient as is or rewritten in a 'black box' fashion. If someone else has time to spend on this I'm sure it would be welcome. Incidentally make sure to try changing the |
I'm still getting the 30ms frame times when setting the visibility of these lights. I have a workaround in mind where I will use a proxy script to relocate the actual light nodes to the root so they do not have to have their visibility toggled when the geometry is hidden. Hopefully that will smooth it over until Godot 4 |
If you create an issue with a minimum reproduction project, it is more likely to get looked at (I might be able to profile it and find something obvious for example). |
Done #42563 I'm be curious at what your method is for profiling? I've only hacked in some statistics to the object visibility calls. I guess with it toggling every frame any old profiling will do? (I was trying to catch stuff that only happens every once in a while, which is harder) |
Octant Limit
Prevents adding new octants until a limiting number of elements have been added to the current octant. This enables balancing the benefits of brute force against the benefits of spatial partitioning. The limit can be set per octree.
Project settings are added for the rendering octree only to set the best balance per project depending on number of tests per frame / tick, and the amount of editing of the octree.
Octant Leak
Fixes octants being leaked when removing elements.
This is mirroring @naithar fix for the leak #41122, so it can be self contained.
On suggestion from @clayjohn I have updated the leak fix to be similar to the other PR, and added naithar as co-author in case this ends up getting merged together, to make things easier for Akien.
Cached linear local_vectors
(second commit) Instead of using linked lists at runtime this maintains a dirty flag per octant and builds on demand a linear list of Element pointers and AABBs which can be used for quicker cull tests. This can lead to significantly better test performance.
This is offered as a second version of the octree, Octree_CL. It is only used for the rendering octree currently, as it has only been shown to offer performance increases there so far.
Background
As discussed with reduz, this prevents adding new octants until a limiting number of elements have been added to the current octant, and is probably a better option than directly limiting the depth of the tree. This enables balancing the benefits of brute force against the benefits of spatial partitioning. The limit can be set per octree.
Preliminary tests indicate that for visibility culling brute force is significantly faster than using the octree as is.
The implementation so far is very simple, it doesn't split octants until the limit is reached. This means that on splitting, some of the elements in the parent octant may have been better off in the children. I have written code to rebalance the tree on splitting, but it isn't significantly faster so is left out at this stage.
Scenes which particularly benefit are those containing large numbers of objects, especially large number of moving objects / created / erased. It can also significantly speed up cull tests with the cached lists. Most 3d games will see a modest improvement, but some may gain a 2x improvement in fps or more. This is highly dependent on to what degree octree is a bottleneck in particular game.
Fixes #38142
Fixes #41480
Related to #40059
Notes
Local Vector
This PR introduces
local_vector.h
from Godot 4.x. This might be better as a separate commit so the octree changes could be reverted without the local_vector being reverted, if something else is relying on local_vector. Note that PouleyKetchoupp's PR #40313 also contains local_vector.We could potentially merge local_vector as a separate commit first, then me and Pouley could modify our PRs. Leave this up to Akien though as to whatever works best.
Demo
To try it out, load this project, and try changing the project settings
rendering/quality/spatial_partitioning/render_tree_balance
.octree_test.zip