[3.x] Canvas item hierarchical culling #68738

lawnjelly · 2022-11-16T16:10:55Z

Adds optional hierarchical culling to the 2D rendering (within VisualServer).

Each canvas item maintains a bound in local space of the item itself and all child / grandchild items. This allows branches to be culled at once when they don't intersect a viewport.

Background

@BimDav noticed in [3.x] Add option in VisibilityEnabler2D to hide the parent for better performance (reverted) #63193 that culling in 2D is incredibly inefficient, in fact, it still does a lot of work for each item that is off screen.
I noted in that PR that in addition to fixing the VisibilityEnabler to work with this, it might be possible to add some kind of automatic hierarchical culling, for instance using the scene graph, or a spatial partitioning structure such as BVH or similar.
It turns out that unlike in 3D, for 2D, the hierarchical structure of the scene tree is stored in VisualServer, allowing the possibility for using this directly as spatial partitioning.

How it works

It stores one extra (non negligible) piece of data on each Item - the local bound. This is a Rect2 indicating the bound in local space of the Item and all its non-hidden children and grandchildren.
Additionally a dirty flag is stored to indicate whether the bound is dirty. This uses 1 bit and will combine with the other bitflags, so not using more memory.

Housekeeping and Rendering

When changing the transform, or almost anything, about an Item, the bound of the item itself must be marked dirty (to be calculated next time). Additionally, the bounds of all parent items are marked dirty, as they may be modified.
During rendering, if a local bound is up to date (not dirty), it can be used for an intersection test with the viewport. If the bound is completely outside, all of the children can be culled. If the bound is completely inside the viewport, none of the children need be tested, as they are all inside the viewport. If there is a partial intersection, the rendering proceeds as normal.
During rendering, any dirty local bounds are recalculated.

Costs and Benefits

There is thus a small housekeeping cost to the technique - probably around 2% (of the time taken by the preparation / culling code). In return the wins are quite significant. Overall the preparation phase is typically 4-10x faster.

In cases where a lot is off screen (and can thus be culled) the gains can be large. In @BimDav 's test project with 300,000 canvas items, the preparation code runs in the region of 16,000x faster, with a similar huge improvement to frame rate.

In the editor there are also speed improvements to the preparation / culling.

However, note that the preparation / culling is not always a major bottleneck, so even though there are huge improvements in the efficiency of preparation code, the overall boosts to frame rate are usually more modest.

Testing in jetpaca, I was typically getting increases from around 350 to 400fps, so about 15%.

Special cases

Most canvas items are only altered by calling functions in VisualServer, and these are thus easy to flag the bounds as dirty when such a change occurs. There are exceptions though for "dynamic" items, where changes are "pulled" rather than "pushed" to the server.

Skinned 2D Polygons

Skinned polygons pull their vertex transforms from a Skeleton each time the Skeleton changes. But there is a chicken and egg problem: In order to know whether the skeleton has changed, we need to call get_rect() on the Polygon2D, and this only occurs immediately prior to rendering, well after the time we expect to mass reject the Polygon2D using hierarchical culling.

The solution used here is instead of having a one way relationship where Polygon2D has a dependency to the Skeleton, the RID of the linked Polygon2D is now stored on the skeleton. Whenever the skeleton moves, the dependent polygons are informed, and their bounds made dirty.

This should always work, but is not ideal efficiency wise - it is advisable to use VisibilityEnabler2D for each skinned character, which will prevent animation when off screen, and thus the bound will not need updating.

Particles

Particle bounds are not actually currently dynamic in 3.x. Turns out GLES2 returns Rect2(), and GLES3 can only return a custom rect. So they should work as is without modification for hierarchical culling.

Vertex Shaders (that move verts)

These would probably need the user to make a custom rect or apply expansion margin.

Notes

As this is something that could potentially have regressions (particularly in y sorting), I have added it as an optional extra, and included the legacy path. This are now switchable in project_settings/rendering/2d/options/cull_mode, between Item mode (old style) and Node mode (which is now the default).
There are some extra debugging functionality added. In particular, you can now switch a define to pass canvas_item names to the VisualServer, which enables you to identify nodes when printing the tree. This is normally switched off to save memory and performance. This can also be helpful for general 2D debugging in the VisualServerCanvas.

Optional defines (in `visual_server_constants.h`)

VISUAL_SERVER_CANVAS_TIME_NODE_CULLING - every 100 frames it runs both Item culling and Node culling, timing both, and displaying the timings using print_line. This enables direct comparison in different projects / scenes, and can be used in release.
VISUAL_SERVER_CANVAS_DEBUG_ITEM_NAMES - pass canvas item names to VisualServer for debugging.
VISUAL_SERVER_CANVAS_CHECK_BOUNDS - performs verification checks on all bounds to make sure they are correct and up to date, in order to detect bugs.

reduz · 2022-12-04T18:52:08Z

I thought about this for a while, but I couldn't find a situation where doing this can happen transparently and always be a win. Will have to check the PR in detail.

lawnjelly · 2023-04-12T17:31:49Z

Example timings with VISUAL_SERVER_CANVAS_TIME_NODE_CULLING defined:

Jetpaca (10-20x faster)

old : 1082, new : 47
old : 12, new : 2
old : 899, new : 43
old : 8, new : 2
old : 1103, new : 37
old : 13, new : 2

Where "old" is legacy item culling and "new" is hierarchical culling.
(The reason for the two differring timings is probably 2 canvas layers, one only containing a few items.)

Project Manager (3-5x faster)

old : 506, new : 165
old : 1081, new : 177
old : 1144, new : 185

Editor with Jetpaca loaded (4-8x faster)

old : 23, new : 7
old : 1038, new : 216
old : 40, new : 4
old : 816, new : 93
old : 24, new : 4
old : 939, new : 130
old : 20, new : 4
old : 429, new : 124
old : 17, new : 4
old : 909, new : 146
old : 24, new : 7
old : 908, new : 194
old : 20, new : 5
old : 989, new : 125
old : 14, new : 9
old : 626, new : 290

BimDav · 2023-04-13T08:10:20Z

Don't know why I did not see the previous posts, but this is awesome, congrats! I had something like this in mind since noticing the lack of performance, but it seemed really hard to make it work, so I am very thankful that you took a crack at it, very promising

Adds optional hierarchical culling to the 2D rendering (within VisualServer). Each canvas item maintains a bound in local space of the item itself and all child / grandchild items. This allows branches to be culled at once when they don't intersect a viewport.

clayjohn

This looks good to me. I can't see anything that would obviously cause issues. The only concern is that users may stumble onto a set of conditions that we haven't considered. But at this point the best way forward is to merge this and get broader coverage.

I am very enthusiastic about this approach and hopeful that we can polish it, prove the performance benefits, and then add the same or similar to 4.x.

Let's go ahead and merge and let users' batteries and CPUs rejoice

akien-mga · 2023-06-27T06:40:29Z

Thanks!

lawnjelly · 2023-06-27T06:40:49Z

The only concern is that users may stumble onto a set of conditions that we haven't considered.

Absolutely, I'm fully expecting one or two special circumstances that need a slight tweak, but it's easily turn off-able. 👍

djrain · 2024-04-07T20:11:01Z

Was this ever implemented in 4.x? I don't see it anywhere and I'm concerned about visibility enabler performance as in #63193

lawnjelly · 2024-04-08T14:24:20Z

Was this ever implemented in 4.x? I don't see it anywhere and I'm concerned about visibility enabler performance as in #63193

I mentioned to reduz while implementing, but as far as I remember, he wasn't super convinced about having it in 4.x (I think he tried to get this working long ago, but had problems where it was a win for some cases but to the detriment of others). But if there was demand it might be possible to get through politically - there are a lot of non-obvious considerations here.

For instance, it does admittedly complicate the 2D culling code which affects maintenance. 3.x is fairly stable (so not such a problem), whereas 4.x is in flux.

But if there is enough interest we may be able to get this into 4.x.

LeeWannacott · 2024-07-05T10:38:45Z

But if there is enough interest we may be able to get this into 4.x.

I did see a PR for 2D sprite batching in 4.x, so maybe that will help performance for rendering stuff off-screen. But idealistically if something is off screen it shouldn't be rendered (not just the animation). Like as a user I can't set the visibility to false of the root node while its off screen because its visibility is used for calculating the screen_exited and screen_entered signals 🤦

What about a scenario like this where you have 3D output as 2D using a sprite2D. The 2D would get culled, but what about the 3D in the nested viewport? would you need an Enabler3D that emits signal off the Enabler2Ds signals?

(honestly I might have to just use pre rendered 2D because performance of doing this kind of thing (3D->2D), is really bad; although I like the flexibility that 3D provides 😿 )

Calinou added enhancement topic:rendering topic:2d performance labels Nov 16, 2022

Calinou added this to the 3.6 milestone Nov 16, 2022

lawnjelly force-pushed the faster_canvas_item branch from 53cfd61 to 2188002 Compare November 16, 2022 16:38

lawnjelly marked this pull request as ready for review November 16, 2022 17:31

lawnjelly requested review from a team as code owners November 16, 2022 17:32

timothyqiu mentioned this pull request Dec 30, 2022

Add show flag propagation godotengine/godot-proposals#6018

Open

lawnjelly marked this pull request as draft April 3, 2023 07:42

lawnjelly force-pushed the faster_canvas_item branch 2 times, most recently from 09b3b4f to 63146aa Compare April 12, 2023 15:48

lawnjelly marked this pull request as ready for review April 12, 2023 17:40

lawnjelly mentioned this pull request Apr 19, 2023

[3.x] 2D Fixed Timestep Interpolation #76252

Merged

lawnjelly force-pushed the faster_canvas_item branch from 63146aa to 7041786 Compare April 25, 2023 19:08

lawnjelly force-pushed the faster_canvas_item branch from 7041786 to b777a9e Compare April 25, 2023 19:17

This was referenced May 30, 2023

[3.x] Add option in VisibilityEnabler2D to hide the parent for better performance (reverted) #63193

Merged

[3.x] VisibilityEnabler2D - rename hide_parent and change default #77656

Closed

akien-mga changed the title ~~Canvas item hierarchical culling~~ [3.x] Canvas item hierarchical culling Jun 7, 2023

akien-mga mentioned this pull request Jun 13, 2023

Revert "Add option in VisibilityEnabler2D to hide the parent for better performance" #78182

Merged

clayjohn approved these changes Jun 27, 2023

View reviewed changes

akien-mga merged commit 29eeb46 into godotengine:3.x Jun 27, 2023

lawnjelly deleted the faster_canvas_item branch June 27, 2023 06:41

This was referenced Jul 31, 2023

[3.x] Add debug_canvas_item_get_local_bound() function to VisualServer #80084

Merged

[3.x] 2D Particle hierarchical culling not working correctly #80086

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[3.x] Canvas item hierarchical culling #68738

[3.x] Canvas item hierarchical culling #68738

lawnjelly commented Nov 16, 2022 •

edited

Loading

reduz commented Dec 4, 2022

lawnjelly commented Apr 12, 2023 •

edited

Loading

BimDav commented Apr 13, 2023

clayjohn left a comment

akien-mga commented Jun 27, 2023

lawnjelly commented Jun 27, 2023

djrain commented Apr 7, 2024

lawnjelly commented Apr 8, 2024 •

edited

Loading

LeeWannacott commented Jul 5, 2024 •

edited

Loading

[3.x] Canvas item hierarchical culling #68738

[3.x] Canvas item hierarchical culling #68738

Conversation

lawnjelly commented Nov 16, 2022 • edited Loading

Background

How it works

Housekeeping and Rendering

Costs and Benefits

Special cases

Skinned 2D Polygons

Particles

Vertex Shaders (that move verts)

Notes

Optional defines (in visual_server_constants.h)

reduz commented Dec 4, 2022

lawnjelly commented Apr 12, 2023 • edited Loading

Jetpaca (10-20x faster)

Project Manager (3-5x faster)

Editor with Jetpaca loaded (4-8x faster)

BimDav commented Apr 13, 2023

clayjohn left a comment

Choose a reason for hiding this comment

akien-mga commented Jun 27, 2023

lawnjelly commented Jun 27, 2023

djrain commented Apr 7, 2024

lawnjelly commented Apr 8, 2024 • edited Loading

LeeWannacott commented Jul 5, 2024 • edited Loading

lawnjelly commented Nov 16, 2022 •

edited

Loading

Optional defines (in `visual_server_constants.h`)

lawnjelly commented Apr 12, 2023 •

edited

Loading

lawnjelly commented Apr 8, 2024 •

edited

Loading

LeeWannacott commented Jul 5, 2024 •

edited

Loading