Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.x] Canvas item hierarchical culling #68738

Merged
merged 1 commit into from
Jun 27, 2023

Conversation

lawnjelly
Copy link
Member

@lawnjelly lawnjelly commented Nov 16, 2022

Adds optional hierarchical culling to the 2D rendering (within VisualServer).

Each canvas item maintains a bound in local space of the item itself and all child / grandchild items. This allows branches to be culled at once when they don't intersect a viewport.

Background

  • @BimDav noticed in [3.x] Add option in VisibilityEnabler2D to hide the parent for better performance (reverted) #63193 that culling in 2D is incredibly inefficient, in fact, it still does a lot of work for each item that is off screen.
  • I noted in that PR that in addition to fixing the VisibilityEnabler to work with this, it might be possible to add some kind of automatic hierarchical culling, for instance using the scene graph, or a spatial partitioning structure such as BVH or similar.
  • It turns out that unlike in 3D, for 2D, the hierarchical structure of the scene tree is stored in VisualServer, allowing the possibility for using this directly as spatial partitioning.

How it works

  • It stores one extra (non negligible) piece of data on each Item - the local bound. This is a Rect2 indicating the bound in local space of the Item and all its non-hidden children and grandchildren.
  • Additionally a dirty flag is stored to indicate whether the bound is dirty. This uses 1 bit and will combine with the other bitflags, so not using more memory.

Housekeeping and Rendering

  1. When changing the transform, or almost anything, about an Item, the bound of the item itself must be marked dirty (to be calculated next time). Additionally, the bounds of all parent items are marked dirty, as they may be modified.
  2. During rendering, if a local bound is up to date (not dirty), it can be used for an intersection test with the viewport. If the bound is completely outside, all of the children can be culled. If the bound is completely inside the viewport, none of the children need be tested, as they are all inside the viewport. If there is a partial intersection, the rendering proceeds as normal.
  3. During rendering, any dirty local bounds are recalculated.

Costs and Benefits

There is thus a small housekeeping cost to the technique - probably around 2% (of the time taken by the preparation / culling code). In return the wins are quite significant. Overall the preparation phase is typically 4-10x faster.

In cases where a lot is off screen (and can thus be culled) the gains can be large. In @BimDav 's test project with 300,000 canvas items, the preparation code runs in the region of 16,000x faster, with a similar huge improvement to frame rate.

In the editor there are also speed improvements to the preparation / culling.

However, note that the preparation / culling is not always a major bottleneck, so even though there are huge improvements in the efficiency of preparation code, the overall boosts to frame rate are usually more modest.

Testing in jetpaca, I was typically getting increases from around 350 to 400fps, so about 15%.

Special cases

Most canvas items are only altered by calling functions in VisualServer, and these are thus easy to flag the bounds as dirty when such a change occurs. There are exceptions though for "dynamic" items, where changes are "pulled" rather than "pushed" to the server.

Skinned 2D Polygons

Skinned polygons pull their vertex transforms from a Skeleton each time the Skeleton changes. But there is a chicken and egg problem: In order to know whether the skeleton has changed, we need to call get_rect() on the Polygon2D, and this only occurs immediately prior to rendering, well after the time we expect to mass reject the Polygon2D using hierarchical culling.

The solution used here is instead of having a one way relationship where Polygon2D has a dependency to the Skeleton, the RID of the linked Polygon2D is now stored on the skeleton. Whenever the skeleton moves, the dependent polygons are informed, and their bounds made dirty.

This should always work, but is not ideal efficiency wise - it is advisable to use VisibilityEnabler2D for each skinned character, which will prevent animation when off screen, and thus the bound will not need updating.

Particles

Particle bounds are not actually currently dynamic in 3.x. Turns out GLES2 returns Rect2(), and GLES3 can only return a custom rect. So they should work as is without modification for hierarchical culling.

Vertex Shaders (that move verts)

These would probably need the user to make a custom rect or apply expansion margin.

Notes

  • As this is something that could potentially have regressions (particularly in y sorting), I have added it as an optional extra, and included the legacy path. This are now switchable in project_settings/rendering/2d/options/cull_mode, between Item mode (old style) and Node mode (which is now the default).
  • There are some extra debugging functionality added. In particular, you can now switch a define to pass canvas_item names to the VisualServer, which enables you to identify nodes when printing the tree. This is normally switched off to save memory and performance. This can also be helpful for general 2D debugging in the VisualServerCanvas.

Optional defines (in visual_server_constants.h)

  • VISUAL_SERVER_CANVAS_TIME_NODE_CULLING - every 100 frames it runs both Item culling and Node culling, timing both, and displaying the timings using print_line. This enables direct comparison in different projects / scenes, and can be used in release.
  • VISUAL_SERVER_CANVAS_DEBUG_ITEM_NAMES - pass canvas item names to VisualServer for debugging.
  • VISUAL_SERVER_CANVAS_CHECK_BOUNDS - performs verification checks on all bounds to make sure they are correct and up to date, in order to detect bugs.

@reduz
Copy link
Member

reduz commented Dec 4, 2022

I thought about this for a while, but I couldn't find a situation where doing this can happen transparently and always be a win. Will have to check the PR in detail.

@lawnjelly lawnjelly marked this pull request as draft April 3, 2023 07:42
@lawnjelly lawnjelly force-pushed the faster_canvas_item branch 2 times, most recently from 09b3b4f to 63146aa Compare April 12, 2023 15:48
@lawnjelly
Copy link
Member Author

lawnjelly commented Apr 12, 2023

Example timings with VISUAL_SERVER_CANVAS_TIME_NODE_CULLING defined:

Jetpaca (10-20x faster)

old : 1082, new : 47
old : 12, new : 2
old : 899, new : 43
old : 8, new : 2
old : 1103, new : 37
old : 13, new : 2

Where "old" is legacy item culling and "new" is hierarchical culling.
(The reason for the two differring timings is probably 2 canvas layers, one only containing a few items.)

Project Manager (3-5x faster)

old : 506, new : 165
old : 1081, new : 177
old : 1144, new : 185

Editor with Jetpaca loaded (4-8x faster)

old : 23, new : 7
old : 1038, new : 216
old : 40, new : 4
old : 816, new : 93
old : 24, new : 4
old : 939, new : 130
old : 20, new : 4
old : 429, new : 124
old : 17, new : 4
old : 909, new : 146
old : 24, new : 7
old : 908, new : 194
old : 20, new : 5
old : 989, new : 125
old : 14, new : 9
old : 626, new : 290

@lawnjelly lawnjelly marked this pull request as ready for review April 12, 2023 17:40
@BimDav
Copy link
Contributor

BimDav commented Apr 13, 2023

Don't know why I did not see the previous posts, but this is awesome, congrats! I had something like this in mind since noticing the lack of performance, but it seemed really hard to make it work, so I am very thankful that you took a crack at it, very promising

Adds optional hierarchical culling to the 2D rendering (within VisualServer).

Each canvas item maintains a bound in local space of the item itself and all child / grandchild items. This allows branches to be culled at once when they don't intersect a viewport.
Copy link
Member

@clayjohn clayjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I can't see anything that would obviously cause issues. The only concern is that users may stumble onto a set of conditions that we haven't considered. But at this point the best way forward is to merge this and get broader coverage.

I am very enthusiastic about this approach and hopeful that we can polish it, prove the performance benefits, and then add the same or similar to 4.x.

Let's go ahead and merge and let users' batteries and CPUs rejoice

@akien-mga akien-mga merged commit 29eeb46 into godotengine:3.x Jun 27, 2023
@akien-mga
Copy link
Member

Thanks!

@lawnjelly
Copy link
Member Author

The only concern is that users may stumble onto a set of conditions that we haven't considered.

Absolutely, I'm fully expecting one or two special circumstances that need a slight tweak, but it's easily turn off-able. 👍

@djrain
Copy link

djrain commented Apr 7, 2024

Was this ever implemented in 4.x? I don't see it anywhere and I'm concerned about visibility enabler performance as in #63193

@lawnjelly
Copy link
Member Author

lawnjelly commented Apr 8, 2024

Was this ever implemented in 4.x? I don't see it anywhere and I'm concerned about visibility enabler performance as in #63193

I mentioned to reduz while implementing, but as far as I remember, he wasn't super convinced about having it in 4.x (I think he tried to get this working long ago, but had problems where it was a win for some cases but to the detriment of others). But if there was demand it might be possible to get through politically - there are a lot of non-obvious considerations here.

For instance, it does admittedly complicate the 2D culling code which affects maintenance. 3.x is fairly stable (so not such a problem), whereas 4.x is in flux.

But if there is enough interest we may be able to get this into 4.x.

@LeeWannacott
Copy link

LeeWannacott commented Jul 5, 2024

But if there is enough interest we may be able to get this into 4.x.

I did see a PR for 2D sprite batching in 4.x, so maybe that will help performance for rendering stuff off-screen. But idealistically if something is off screen it shouldn't be rendered (not just the animation). Like as a user I can't set the visibility to false of the root node while its off screen because its visibility is used for calculating the screen_exited and screen_entered signals 🤦

image

What about a scenario like this where you have 3D output as 2D using a sprite2D. The 2D would get culled, but what about the 3D in the nested viewport? would you need an Enabler3D that emits signal off the Enabler2Ds signals?

(honestly I might have to just use pre rendered 2D because performance of doing this kind of thing (3D->2D), is really bad; although I like the flexibility that 3D provides 😿 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

8 participants