Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic BVH for rendering and godot physics [3.2] #44901

Merged
merged 1 commit into from
Jan 12, 2021

Conversation

lawnjelly
Copy link
Member

@lawnjelly lawnjelly commented Jan 3, 2021

Complete rewrite of spatial partitioning using a bounding volume hierarchy rather than octree.
Switchable in project settings between using octree or BVH for rendering and physics.

Background

Octree suffers from several problems, including 2 rather nasty bugs:

  1. Runaway depth recursion with small object sizes Game freezes and causes memory leak when changing spatial node's scale #38142 (when render tree balance is 0.0)
  2. A bug in the optimization for moving objects (which seems unfixable without badly affecting performance)
    Mesh flickering [Octo-tree]  #42895 (comment)

I'd already done a PR to try and patch up and work around some of the problems in 3.2 #41123, but overall I'm of the opinion that the current octree is groaning with multiple bugs and is not a good solution, so I think if we can replace it in the long run that will be best.

BVH is conceptually simpler than octree, especially in that items only appear in one node, whereas in octree the item has to keep track of being in multiple nodes, which is very slow and error prone (see bug (2)).

There are several pre-existing open source dynamic BVH implementations for reference, e.g.:

Unfortunately all of these only have one item stored per leaf node. This makes the code simpler in places, however if doesn't take good advantage of cache and data orientated design. So here I have written a template that can handle multiple items per leaf.

In particular this means that:

  • Splitting code is more complex
  • Calculating node bounds has to be done more carefully (deferred where possible)

As well as allowing culling tests, the Godot octree also handles collision pairing (for Godot physics, lights and many other situations). This had to be handled for the BVH to replace octree. Pairing is handled as an extra top layer, in bvh.h. The template can either be used with the pairing layer or without. In 3.2 pairing is always used (so far).

In addition to these advantages, the BVH also uses pools for nodes and leaves, assuring efficient memory use and O(1) allocation / deallocation, and good cache coherency. In general it uses less memory than octree for the same scene.

Correctness & Swappability

The main attraction for 3.2 is stability and correctness, neatly working around the octree problems. Of course being a large template this will need testing, it is possible there will be bugs. For this reason I have written the PR so that the user can swap between octree and BVH using project settings. The default is currently octree in both cases, but this can be changed.

The code to swap between the two in visual_server is slightly more verbose than I would like but it seemed a good compromise (any alternative ideas to do this welcome).

Internals

Internally the BVH maintains 2 trees, one for pairable items and one for non-pairable. This makes collision pairing faster for non-pairable moving items - they only need to be checked against the pairable tree.

I also considered separating static and dynamic trees, using a metric if an object hadn't moved in a while to move it to the static tree. This would be faster but the code is so fast already I think everything else has to catch up first (unless it were used for e.g. ray tracing).

It is written to potentially take advantage of SIMD but there's really been no need so far. The custom AABB class uses negative maximum which is a trick to allow AABB checks in one SIMD operation. To go down this route I would probably just add a couple of dummy values to the custom AABB (to make 8 checks), or use structure of arrays or similar.

The convex hull culling uses a neat trick - when entering a leaf node (which may contain e.g. 256 items) it precalculates which hull planes cut the node AABB. This means that only these planes need to be checked. This can more than double speed.

BVH parameters (node expansion and pairing expansion) are now determined internally from the world bound. They didn't seem to have a massive effect on performance, and for now I've decided against having them settable manually as project settings, as the trade off for confusion may not be worth it.

Performance

Although octree is not often a bottleneck, unless using games with large numbers of objects, the BVH implementation is considerably faster than octree.

  • In terms of frame rates, with Godot physics, in certain scenes, frame rate increases of 2-5x are possible.
  • Rendering is also considerably faster in bottlenecked scenes, 2-5x is also possible. (approx 0.2 milliseconds for 10,000 dynamic objects)

The actual performance increase of the system itself is far higher, but obviously frame rate increases will only be seen when spatial partitioning is a bottleneck. In Godot physics for example, the BVH is 50x faster than the physics code. So while the BVH is probably signifcantly faster than Bullet, Bullet is still faster in most cases than Godot physics. Hopefully @pouleyKetchoupp will be able to improve this in 4.0.

Future

Although written for 3.2, the template is mostly self contained and we can try it in Godot 4.0 as well. There are plans to use linear methods in some cases, but we need a BVH or similar for pairing etc.

Fixes #42895
Fixes #38142 (better than the octree fix, and the octree fix only works for render_tree_balance > 0.0).
May fix #42563
May fix #40059

@lawnjelly
Copy link
Member Author

lawnjelly commented Jan 6, 2021

I'm moving this out of draft now, as it is ready for some review. I've probably got a few more functions to change parameter names to include p_ etc but the gist should be correct.

As for reviewers, anyone who is interested, feel free to test it out and look through. It doesn't require a whole lot of specific Godot knowledge (e.g. in visual server or physics) as it is quite encapsulated. I'll try and put up some demo projects tomorrow.

A key aspect is the whole PR is made to be backward compatible, and the default is to use the octree (same as before). Users would actively have to set use_bvh to get the performance boost. This differs from the introduction of say, batching. This is partly because it is getting close to 3.2.4 release, and it would be nice to have it available in 3.2.4 final. If it were at the beginning of a release cycle having it as default wouldn't be so risky. If preferred we can easily switch the default to be BVH (say if a beta showed it worked well and there were no problems).

  • The 2 parameters for each BVH tree (node margin and pairing margin) I've now removed the project settings, and they default to be controlled by AI. I did originally have these as project settings but for now am content to leave them as automatic.
  • I've retained the separate .inc files for now, and if reviewers prefer I can merge it into one large header file for bvh_tree.h. This same problem occurred in the batching which is a large template. On balance I have a preference for separate .inc files as it is easier to work with, but I'll go with whatever is the majority opinion. Another possible alternative is .inl files with function definitions.
  • This PR makes the BVH available for rendering and godot physics. There is one more octree, the visibility notifier tree, which can be changed to BVH, however it is probably not that performance sensitive and I'm happy to convert that in a later PR. In fact, if BVH works in the wild, we may be able to get rid of octree altogether from 3.2, and remove the option to switch between them.
  • The pooled_list template could possibly be renamed and / or put in the files with the bvh. It's just that pools are generally useful (especially for high performance code), and I'm more a fan of having this available to other parts of code. But we could equally well make this specific to BVH and have another more general pool template if desired at a later date.

Demo Projects

Godot Physics

octree_physics_test.zip
1000 bouncing balls, small screen to minimize fill rate effects and concentrate on physics bottlenecks.
Run, and wait for a few seconds for frame rate to stabilize.

Godot physics - Octree 47fps
Bullet physics - 130fps
Godot physics - BVH 310fps (6.5x faster frame rate)

Rendering

Small objects, small screen size.
bvh_render_benchmark.zip
There are lines marked to uncomment in the gdscript to try the various tests.

10K non moving objects
Octree 200-290fps
BVH 780-1060fps (3.65x faster)

10K moving objects
Octree 36fps
BVH 131fps (3.65x faster)

Note that lots of other bottlenecks occur too. Just updating 10K objects in the scene tree can take a while, so overall frame rates don't measure how much faster BVH is relative to octree.

However quite part from performance the main criteria is correctness and solving the octree bugs.

@lawnjelly lawnjelly marked this pull request as ready for review January 6, 2021 19:18
@lawnjelly lawnjelly requested a review from a team as a code owner January 6, 2021 19:18
@lawnjelly lawnjelly changed the title [WIP] Dynamic BVH for rendering and godot physics [3.2] Dynamic BVH for rendering and godot physics [3.2] Jan 6, 2021
@lawnjelly lawnjelly changed the title Dynamic BVH for rendering and godot physics [3.2] [WIP] Dynamic BVH for rendering and godot physics [3.2] Jan 7, 2021
@lawnjelly lawnjelly changed the title [WIP] Dynamic BVH for rendering and godot physics [3.2] Dynamic BVH for rendering and godot physics [3.2] Jan 7, 2021
@lawnjelly
Copy link
Member Author

lawnjelly commented Jan 8, 2021

I will today try and convert some of the recursive functions to iterative. The performance difference does seem to be negligible (as expected, because of the large leaf sizes), but it does offer some protection against stack overflow, so it is probably better in some cases despite the extra complexity.

Particularly the convex cull test I hadn't noticed used an alloca function within the recursion, which could have increased stack usage when culling against complicated hulls, so it's good to have this potential problem closed.

I've put in a separate commit for now so it is easier to see the changes, can squash later or on request when Akien is ready for merging. The iterative stack is shamelessly copied from reduz BVH, so if there are any bugs, it's all his fault, honest! 😁

EDIT: Ok I think that's done now, I've converted I think all the non-debug recursive routines to iterative.

Unrelated, physics memory use

Moved discussion to #45024.

Complete rewrite of spatial partitioning using a bounding volume hierarchy rather than octree.

Switchable in project settings between using octree or BVH for rendering and physics.
@lawnjelly
Copy link
Member Author

On @akien-mga suggestion changed defaults to on for godot physics and rendering.

Settings are in:
rendering/quality/spatial_partitioning/use_bvh
physics/3d/godot_physics/use_bvh

These settings may end up being temporary in the long run, as once any problems are ironed out there should be no need to keep octree.

@akien-mga akien-mga merged commit d92f4c7 into godotengine:3.2 Jan 12, 2021
@akien-mga
Copy link
Member

Thanks!

@oeleo1
Copy link

oeleo1 commented Apr 22, 2021

@lawnjelly Could you please explain in simple terms how the BVH (ex. octree) tree balance parameter ranging from 0.0 to 1.0 may affect performance and why the default value is 0.0. And when we say performance, the word sounds too generic in this context. So I have a few specific questions:

  1. What is considered to be a small branch vs. a large branch, as mentioned in the tooltip
  2. What is the expected ballpark impact for rendering when playing with the Render Tree Balance parameter for 2d and 3d.
  3. Is there a demo project which may clarify and expose better this setting ?

I tried the bvh render benchmark project but it doesn't seem to make any difference whether the balance is at 0.0 or 1.0.

Thanks! Awesome work, as usual.

@lawnjelly
Copy link
Member Author

lawnjelly commented Apr 22, 2021

Ah that's probably something I should change the explanation slightly for in the classref. The tree balance is for the octree only (it was written before the BVH, and is unrelated to the BVH).

Default value is 0.0 to maintain compatibility with the old octree. Increasing it results in larger leaf nodes, to a maximum of 1.0, which often results in best peformance. The only snag turned out to be that it often exposes a bug in octree, which turned out to be unfixable, hence the need for BVH.

Relevant PRs are #41123 and #42927.

Also note that octree and BVH are 3D only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants