Implement GPU frustum culling. #12889

pcwalton · 2024-04-06T02:19:45Z

This commit implements opt-in GPU frustum culling, built on top of the infrastructure in #12773. To enable it on a camera, add the GpuCulling component to it. To additionally disable CPU frustum culling, add the NoCpuCulling component. Note that adding GpuCulling without NoCpuCulling currently does nothing useful. The reason why GpuCulling doesn't automatically imply NoCpuCulling is that I intend to follow this patch up with GPU two-phase occlusion culling, and CPU frustum culling plus GPU occlusion culling seems like a very commonly-desired mode.

Adding the GpuCulling component to a view puts that view into indirect mode. This mode makes all drawcalls indirect, relying on the mesh preprocessing shader to allocate instances dynamically. In indirect mode, the PreprocessWorkItem output_index points not to a MeshUniform instance slot but instead to a set of wgpu IndirectParameters, from which it allocates an instance slot dynamically if frustum culling succeeds. Batch building has been updated to allocate and track indirect parameter slots, and the AABBs are now supplied to the GPU as MeshCullingData.

A small amount of code relating to the frustum culling has been borrowed from meshlets and moved into maths.wgsl. Note that standard Bevy frustum culling uses AABBs, while meshlets use bounding spheres; this means that not as much code can be shared as one might think.

This patch doesn't provide any way to perform GPU culling on shadow maps, to avoid making this patch bigger than it already is. That can be a followup.

Changelog

Added

Frustum culling can now optionally be done on the GPU. To enable it, add the GpuCulling component to a camera.
To disable CPU frustum culling, add NoCpuCulling to a camera. Note that GpuCulling doesn't automatically imply NoCpuCulling.

BD103 · 2024-04-06T18:24:35Z

This seems worthy of the release notes. Thoughts?

preparation for GPU occlusion culling. Two-phase occlusion culling [1], which is generally considered the state-of-the-art occlusion culling technique. We already use two-phase occlusion culling for meshlets, but we don't for other 3D objects. Two-phase occlusion culling requires the construction of a *hierarchical Z-buffer*. This patch implements an opt-in pass to generate that and so is a step along the way to implementing two-phase occlusion culling, alongside GPU frustum culling (bevyengine#12889). This commit copies the hierarchical Z-buffer building code from meshlets into `bevy_core_pipeline`. Adding the new `HierarchicalDepthBuffer` component to a camera enables the feature. This code should be usable as-is for third-party plugins that might want to implement two-phase occlusion culling, but of course we would like to have two-phase occlusion culling implemented directly in Bevy in the near future. Unlike meshlets, we have to handle the case in which the depth buffer is multisampled. This is the source of most of the extra complexity, since we can't use the Vulkan extension [2] that allows us to easily resolve multisampled depth buffers using the min operation. At Jasmine's request, I haven't touched the meshlet code except to do some very minor refactoring; the code is generally copied in. [1]: https://medium.com/@mil_kru/two-pass-occlusion-culling-4100edcad501 [2]: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VkSubpassDescriptionDepthStencilResolveKHR.html

preparation for GPU occlusion culling. Two-phase occlusion culling [1], which is generally considered the state-of-the-art occlusion culling technique. We already use two-phase occlusion culling for meshlets, but we don't for other 3D objects. Two-phase occlusion culling requires the construction of a *hierarchical Z-buffer*. This patch implements an opt-in set of passes to generate that and so is a step along the way to implementing two-phase occlusion culling, alongside GPU frustum culling (bevyengine#12889). This commit copies the hierarchical Z-buffer building code from meshlets into `bevy_core_pipeline`. Adding the new `HierarchicalDepthBuffer` component to a camera enables the feature. This code should be usable as-is for third-party plugins that might want to implement two-phase occlusion culling, but of course we would like to have two-phase occlusion culling implemented directly in Bevy in the near future. Two-phase occlusion culling will be implemented using the following procedure: 1. Render all meshes that would have been visible in the previous frame to the depth buffer (with no fragment shader), using the previous frame's hierarchical Z-buffer, the previous frame's view matrix (cf. bevyengine#12902), and each model's previous view input uniform. 2. Downsample the Z-buffer to produce a hierarchical Z-buffer ("early", in the language of this patch). 3. Perform occlusion culling of all meshes against the Hi-Z buffer, using a screen space AABB test. 4. If a prepass is in use, render it now, using the occlusion culling results from (3). Note that if *only* a depth prepass is in use, then we can avoid rendering meshes that we rendered in phase (1), since they're already in the depth buffer. 5. Render main passes, using the occlusion culling results from (3). 6. Downsample the Z-buffer to produce a hierarchical Z-buffer again ("late", in the language of this patch). This readies the Z-buffer for step (1) of the next frame. It differs from the hierarchical Z-buffer produced in (2) because it includes meshes that weren't visible last frame, but became visible this frame. This commit adds steps (1), (2), and (6) to the pipeline, when the `HierarchicalDepthBuffer` component is present. It doesn't add step (3), because step (3) depends on bevyengine#12889 which in turn depends on bevyengine#12773, and both of those patches are still in review. Unlike meshlets, we have to handle the case in which the depth buffer is multisampled. This is the source of most of the extra complexity, since we can't use the Vulkan extension [2] that allows us to easily resolve multisampled depth buffers using the min operation. At Jasmine's request, I haven't touched the meshlet code except to do some very minor refactoring; the code is generally copied in. [1]: https://medium.com/@mil_kru/two-pass-occlusion-culling-4100edcad501 [2]: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VkSubpassDescriptionDepthStencilResolveKHR.html

This commit implements opt-in GPU frustum culling, built on top of the infrastructure in bevyengine#12773. To enable it on a camera, add the `GpuCulling` component to it. To additionally disable CPU frustum culling, add the `NoCpuCulling` component. Note that adding `GpuCulling` without `NoCpuCulling` *currently* does nothing useful. The reason why `GpuCulling` doesn't automatically imply `NoCpuCulling` is that I intend to follow this patch up with GPU two-phase occlusion culling, and CPU frustum culling plus GPU occlusion culling seems like a very commonly-desired mode. Adding the `GpuCulling` component frustum to a view puts that view into *indirect mode*. This mode makes all drawcalls indirect, relying on the mesh preprocessing shader to allocate instances dynamically. In indirect mode, the `PreprocessWorkItem` `output_index` points not to a `MeshUniform` instance slot but instead to a set of `wgpu` `IndirectParameters`, from which it allocates an instance slot dynamically if frustum culling succeeds. Batch building has been updated to allocate and track indirect parameter slots, and the AABBs are now supplied to the GPU as `MeshCullingData`. A small amount of code relating to the frustum culling has been borrowed from meshlets and moved into `maths.wgsl`. Note that standard Bevy frustum culling uses AABBs, while meshlets use bounding spheres; this means that not as much code can be shared as one might think. This patch doesn't provide any way to perform GPU culling on shadow maps, to avoid making this patch bigger than it already is. That can be a followup.

pcwalton · 2024-04-10T22:55:46Z

This should be ready for review now. I've rebased it on top of #12773.

examples/stress_tests/many_cubes.rs

IceSentry

This all looks broadly correct. I left a few comments but I'm not approving yet because I'm in the process of testing everything and making sure everything works.

crates/bevy_pbr/src/render/gpu_preprocess.rs

crates/bevy_pbr/src/render/mesh.rs

crates/bevy_pbr/src/render/mesh_preprocess.wgsl

crates/bevy_render/src/view/visibility/mod.rs

JMS55 · 2024-04-25T23:47:11Z

crates/bevy_pbr/src/render/gpu_preprocess.rs

-    view_query: QueryState<(Entity, Read<PreprocessBindGroup>)>,
+    view_query: QueryState<(
+        Entity,
+        Read<PreprocessBindGroup>,


Not a new change, but why is Read used here? What is it even?

Read<T>, part of the lifetimeless module, is just a typedef for &'static T for use in these situations where the &'static might be confusing, since there's nothing static about it. We aren't very consistent about using Read.

examples/3d/3d_shapes.rs

crates/bevy_render/src/render_phase/mod.rs

superdump

This basically looks good. I left a few comments. When the ones that sound blocking are addressed, I'll approve and merge. Thanks for the PR! <3

pcwalton · 2024-04-28T10:00:01Z

Comments addressed.

In bevyengine#12889, I mistakenly started dropping unbatchable sorted items on the floor instead of giving them solitary batches. This caused the objects in the `shader_instancing` demo to stop showing up. This patch fixes the issue by giving those items their own batches as expected. Fixes bevyengine#13130.

In #12889, I mistakenly started dropping unbatchable sorted items on the floor instead of giving them solitary batches. This caused the objects in the `shader_instancing` demo to stop showing up. This patch fixes the issue by giving those items their own batches as expected. Fixes #13130.

alice-i-cecile · 2024-06-03T20:41:07Z

Thank you to everyone involved with the authoring or reviewing of this PR! This work is relatively important and needs release notes! Head over to bevyengine/bevy-website#1307 if you'd like to help out.

[12889](bevyengine/bevy#12889) Gpu Frustum Culling removed the dynamic_offset of Transparent2d and it became `extra_index` with the special value `PhaseItemExtraIndex::NONE`, which indicates the `None` that was here previously

bevyengine/bevy#12889

Addresses Bevy [12889](bevyengine/bevy#12889)

github-actions · 2024-06-28T03:50:22Z

It looks like your PR is a breaking change, but you didn't provide a migration guide.

Could you add some context on what users should update when this change get released in a new version of Bevy?
It will be used to help writing the migration guide for the version. Putting it after a ## Migration Guide will help it get automatically picked up by our tooling.

* Update to 0.14.0-rc.2 * [12997](bevyengine/bevy#12997): rename `multi-threaded` to `multi_threaded` * RenderAssets<Image> is now RenderAssets<GpuImage> Implemented in [12827](bevyengine/bevy#12827) * FloatOrd is now in bevy_math implemented in [12732](bevyengine/bevy#12732) * convert Transparent2d::dynamic_offset to extra_index [12889](bevyengine/bevy#12889) Gpu Frustum Culling removed the dynamic_offset of Transparent2d and it became `extra_index` with the special value `PhaseItemExtraIndex::NONE`, which indicates the `None` that was here previously * RenderPhase<Transparent2d> -> ViewSortedRenderPhases<Transparent2d> [12453](https://github.com/StarArawn/bevy_ecs_tilemap/pull/bevyengine/bevy#12453): Render phases are now binned or sorted. Following the changes in the `mesh2d_manual` [example](https://github.com/bevyengine/bevy/blob/ecdd1624f302c5f71aaed95b0984cbbecf8880b7/examples/2d/mesh2d_manual.rs#L357-L358): use the `ViewSortedRenderPhases` resource. * get_sub_app_mut is now an Option in [9202](https://github.com/StarArawn/bevy_ecs_tilemap/pull/bevyengine/bevy/pull/9202) SubApp access has changed * GpuImage::size f32 -> u32 via UVec2 [11698](bevyengine/bevy#11698) changed `GpuImage::size` to `UVec2`. Right above this, `Extent3d` does the same thing, so I'm taking a small leap and assuming can `as`. * GpuMesh::primitive_topology -> key_bits/BaseMeshPipeline [12791](bevyengine/bevy#12791) the `primitive_topology` field on `GpuMesh` was removed in favor of `key_bits` which can be constructed using `BaseMeshPipeline::from_primitive_topology` * RenderChunk2d::prepare requires &mut MeshVertexBufferLayouts now [12216](bevyengine/bevy#12216) introduced an argument `&mut MeshVertexBufferLayouts` to `get_mesh_vertex_buffer_layout`, which bevy_ecs_tilemap calls in `RenderChunk2d::prepare` * into_linear_f32 -> color.0.linear().to_f32_array(), [12163](bevyengine/bevy#12163) bevy_color was created and Color handling has changed. Specifically Color::as_linear_rgba_f32 has been removed. LinearRgba is now its own type that can be accessed via [`linear()`](https://docs.rs/bevy/0.14.0-rc.2/bevy/color/enum.Color.html#method.linear) and then converted. * Must specify type of VisibleEntities when accessing [12582](bevyengine/bevy#12582) divided `VisibleEntities` into separate lists. So now we have to specify which kind of entity we want. I think we want the Mesh here, and I think we can get rid of the `.index` calls on Entity since Entity [already compares bits](https://docs.rs/bevy_ecs/0.14.0-rc.2/src/bevy_ecs/entity/mod.rs.html#173) for optimized codegen purposes. Waiting to do that until the other changes are in though so as to not change functionality until post-upgrade. * app.world access is functions now - [9202](bevyengine/bevy#9202) changed world access to functions. [relevent line](https://github.com/bevyengine/bevy/pull/9202/files#diff-b2fba3a0c86e496085ce7f0e3f1de5960cb754c7d215ed0f087aa556e529f97fR640) - This also surfaced [12655](bevyengine/bevy#12655) which removed `Into<AssetId<T>>` for `Handle<T>`. using a reference or .id() is the solution here. * We don't need `World::cell`, and it doesn't exist anymore In [12551](bevyengine/bevy#12551) `WorldCell` was removed. ...but it turns out we don't need it or its replacement anyway. * examples error out unless this bevy bug is addressed with these features being added bevyengine/bevy#13728 * check_visibility is required for the entity that is renderable As a result of [12582](bevyengine/bevy#12582) `check_visibility` must be implemented for the "renderable" tilemap entities. Doing this is trivial by taking advantage of the existing `check_visibility` type arguments, which accept a [`QF: QueryFilter + 'static`](https://docs.rs/bevy/0.14.0-rc.2/bevy/render/view/fn.check_visibility.html). The same `QueryFilter`` is used when checking `VisibleEntities`. I've chosen `With<TilemapRenderSettings` because presumably if the entity doesn't have a `TilemapRenderSettings` then it will not be rendering, but this could be as sophisticated or simple as we want. For example `WithLight` is currently implemented as ```rust pub type WithLight = Or<(With<PointLight>, With<SpotLight>, With<DirectionalLight>)>; ``` * view.view_proj -> view.clip_from_world [13289](bevyengine/bevy#13489) introduced matrix naming changes, including `view_proj` which becomes `clip_from_world` * color changes to make tests runnable * clippy fix * Update Cargo.toml Co-authored-by: Rob Parrett <[email protected]> * Update Cargo.toml Co-authored-by: Rob Parrett <[email protected]> * final clippy fixes * Update Cargo.toml Co-authored-by: Rob Parrett <[email protected]> * Simplify async loading in ldtk/tiled helpers See Bevy #12550 * remove second allow lint * rc.3 bump * bump version for major release * remove unused features --------- Co-authored-by: Rob Parrett <[email protected]>

pcwalton mentioned this pull request Apr 6, 2024

Implement minimal GPU culling for cameras. #12673

Closed

pcwalton force-pushed the gpu-culling-3 branch from 2c32b4b to 5d9c236 Compare April 6, 2024 04:10

BD103 added C-Feature A new feature, making something new possible A-Rendering Drawing game state to the screen labels Apr 6, 2024

pcwalton mentioned this pull request Apr 7, 2024

Add an optional pass that generates a hierarchical Z-buffer, in preparation for GPU occlusion culling. #12899

Closed

pcwalton force-pushed the gpu-culling-3 branch 3 times, most recently from 92678c7 to 2a579d2 Compare April 10, 2024 22:51

pcwalton force-pushed the gpu-culling-3 branch from 2a579d2 to 37ab4de Compare April 10, 2024 22:54

pcwalton marked this pull request as ready for review April 10, 2024 22:57

pcwalton requested review from superdump, james7132, IceSentry and JMS55 April 10, 2024 22:57

Merge remote-tracking branch 'origin/main' into gpu-culling-3

2816333

pcwalton added this to the 0.14 milestone Apr 19, 2024

alice-i-cecile added the M-Needs-Release-Note Work that should be called out in the blog due to impact label Apr 23, 2024

alice-i-cecile reviewed Apr 23, 2024

View reviewed changes

examples/stress_tests/many_cubes.rs Show resolved Hide resolved

pcwalton added 2 commits April 23, 2024 16:09

Merge remote-tracking branch 'origin/main' into gpu-culling-3

808358c

Merge remote-tracking branch 'origin/main' into gpu-culling-3

295ebb8

IceSentry reviewed Apr 25, 2024

View reviewed changes

crates/bevy_render/src/view/visibility/mod.rs Show resolved Hide resolved

JMS55 reviewed Apr 25, 2024

View reviewed changes

superdump reviewed Apr 28, 2024

View reviewed changes

examples/3d/3d_shapes.rs Outdated Show resolved Hide resolved

superdump reviewed Apr 28, 2024

View reviewed changes

crates/bevy_render/src/render_phase/mod.rs Outdated Show resolved Hide resolved

superdump reviewed Apr 28, 2024

View reviewed changes

pcwalton added 4 commits April 28, 2024 02:32

Merge remote-tracking branch 'origin/main' into gpu-culling-3

60ed39c

Revert example modifications

e4954ad

Address review comments

4fa60ea

Rustfmt police

af60424

superdump approved these changes Apr 28, 2024

View reviewed changes

superdump added this pull request to the merge queue Apr 28, 2024

Merged via the queue into bevyengine:main with commit 16531fb Apr 28, 2024
27 checks passed

mockersf mentioned this pull request Apr 28, 2024

example shader_instancing doesn't work after GPU frustum culling #13130

Closed

pcwalton mentioned this pull request Apr 30, 2024

Don't ignore unbatchable sorted items. #13144

Merged

arcashka mentioned this pull request May 2, 2024

Crash in write_indirect_parameters_buffer on start of application #13175

Open

alice-i-cecile mentioned this pull request Jun 3, 2024

Write release notes for PR #12889: Implement GPU frustum culling. bevyengine/bevy-website#1307

Closed

ChristopherBiscardi mentioned this pull request Jun 7, 2024

Update to 0.14.0 StarArawn/bevy_ecs_tilemap#537

Merged

alice-i-cecile mentioned this pull request Jun 24, 2024

Migration notes for GPU frustum culling need work bevyengine/bevy-website#1454

Closed

zhaop added a commit to zhaop/bevy-website that referenced this pull request Jun 24, 2024

Create migration notes for 12889

fd5e2c9

bevyengine/bevy#12889

zhaop mentioned this pull request Jun 24, 2024

Add migration notes for 12889 Implement GPU frustum culling bevyengine/bevy-website#1455

Merged

zhaop added a commit to zhaop/transform-gizmo that referenced this pull request Jun 24, 2024

Rename Transparent3d dynamic_offset to extra_index

9c053ad

Addresses Bevy [12889](bevyengine/bevy#12889)

zhaop mentioned this pull request Jun 24, 2024

Upgrade to Bevy 0.14.0-rc.* urholaukkarinen/transform-gizmo#67

Closed

BD103 added the M-Needs-Migration-Guide A breaking change to Bevy's public API that needs to be noted in a migration guide label Jun 28, 2024

zhaop mentioned this pull request Jul 6, 2024

Upgrade to Bevy 0.14 urholaukkarinen/transform-gizmo#69

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement GPU frustum culling. #12889

Implement GPU frustum culling. #12889

pcwalton commented Apr 6, 2024 •

edited

Loading

BD103 commented Apr 6, 2024

pcwalton commented Apr 10, 2024

IceSentry left a comment

JMS55 Apr 25, 2024

pcwalton Apr 26, 2024

superdump left a comment

pcwalton commented Apr 28, 2024

alice-i-cecile commented Jun 3, 2024

github-actions bot commented Jun 28, 2024

Implement GPU frustum culling. #12889

Implement GPU frustum culling. #12889

Conversation

pcwalton commented Apr 6, 2024 • edited Loading

Changelog

Added

BD103 commented Apr 6, 2024

pcwalton commented Apr 10, 2024

IceSentry left a comment

Choose a reason for hiding this comment

JMS55 Apr 25, 2024

Choose a reason for hiding this comment

pcwalton Apr 26, 2024

Choose a reason for hiding this comment

superdump left a comment

Choose a reason for hiding this comment

pcwalton commented Apr 28, 2024

alice-i-cecile commented Jun 3, 2024

github-actions bot commented Jun 28, 2024

pcwalton commented Apr 6, 2024 •

edited

Loading