Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement GPU frustum culling. #12889

Merged
merged 13 commits into from
Apr 28, 2024
Merged

Conversation

pcwalton
Copy link
Contributor

@pcwalton pcwalton commented Apr 6, 2024

This commit implements opt-in GPU frustum culling, built on top of the infrastructure in #12773. To enable it on a camera, add the GpuCulling component to it. To additionally disable CPU frustum culling, add the NoCpuCulling component. Note that adding GpuCulling without NoCpuCulling currently does nothing useful. The reason why GpuCulling doesn't automatically imply NoCpuCulling is that I intend to follow this patch up with GPU two-phase occlusion culling, and CPU frustum culling plus GPU occlusion culling seems like a very commonly-desired mode.

Adding the GpuCulling component to a view puts that view into indirect mode. This mode makes all drawcalls indirect, relying on the mesh preprocessing shader to allocate instances dynamically. In indirect mode, the PreprocessWorkItem output_index points not to a MeshUniform instance slot but instead to a set of wgpu IndirectParameters, from which it allocates an instance slot dynamically if frustum culling succeeds. Batch building has been updated to allocate and track indirect parameter slots, and the AABBs are now supplied to the GPU as MeshCullingData.

A small amount of code relating to the frustum culling has been borrowed from meshlets and moved into maths.wgsl. Note that standard Bevy frustum culling uses AABBs, while meshlets use bounding spheres; this means that not as much code can be shared as one might think.

This patch doesn't provide any way to perform GPU culling on shadow maps, to avoid making this patch bigger than it already is. That can be a followup.

Changelog

Added

  • Frustum culling can now optionally be done on the GPU. To enable it, add the GpuCulling component to a camera.
  • To disable CPU frustum culling, add NoCpuCulling to a camera. Note that GpuCulling doesn't automatically imply NoCpuCulling.

@BD103 BD103 added C-Feature A new feature, making something new possible A-Rendering Drawing game state to the screen labels Apr 6, 2024
@BD103
Copy link
Member

BD103 commented Apr 6, 2024

This seems worthy of the release notes. Thoughts?

pcwalton added a commit to pcwalton/bevy that referenced this pull request Apr 7, 2024
preparation for GPU occlusion culling.

Two-phase occlusion culling [1], which is generally considered the
state-of-the-art occlusion culling technique. We already use two-phase
occlusion culling for meshlets, but we don't for other 3D objects.
Two-phase occlusion culling requires the construction of a *hierarchical
Z-buffer*. This patch implements an opt-in pass to generate that and so
is a step along the way to implementing two-phase occlusion culling,
alongside GPU frustum culling (bevyengine#12889).

This commit copies the hierarchical Z-buffer building code from meshlets
into `bevy_core_pipeline`. Adding the new `HierarchicalDepthBuffer`
component to a camera enables the feature. This code should be usable
as-is for third-party plugins that might want to implement two-phase
occlusion culling, but of course we would like to have two-phase
occlusion culling implemented directly in Bevy in the near future.

Unlike meshlets, we have to handle the case in which the depth buffer is
multisampled. This is the source of most of the extra complexity, since
we can't use the Vulkan extension [2] that allows us to easily resolve
multisampled depth buffers using the min operation.

At Jasmine's request, I haven't touched the meshlet code except to do
some very minor refactoring; the code is generally copied in.

[1]: https://medium.com/@mil_kru/two-pass-occlusion-culling-4100edcad501

[2]: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VkSubpassDescriptionDepthStencilResolveKHR.html
pcwalton added a commit to pcwalton/bevy that referenced this pull request Apr 7, 2024
preparation for GPU occlusion culling.

Two-phase occlusion culling [1], which is generally considered the
state-of-the-art occlusion culling technique. We already use two-phase
occlusion culling for meshlets, but we don't for other 3D objects.
Two-phase occlusion culling requires the construction of a *hierarchical
Z-buffer*. This patch implements an opt-in set of passes to generate
that and so is a step along the way to implementing two-phase occlusion
culling, alongside GPU frustum culling (bevyengine#12889).

This commit copies the hierarchical Z-buffer building code from meshlets
into `bevy_core_pipeline`. Adding the new `HierarchicalDepthBuffer`
component to a camera enables the feature. This code should be usable
as-is for third-party plugins that might want to implement two-phase
occlusion culling, but of course we would like to have two-phase
occlusion culling implemented directly in Bevy in the near future.

Two-phase occlusion culling will be implemented using the following
procedure:

1. Render all meshes that would have been visible in the previous frame
   to the depth buffer (with no fragment shader), using the previous
   frame's hierarchical Z-buffer, the previous frame's view matrix (cf.
   bevyengine#12902), and each model's previous view input uniform.

2. Downsample the Z-buffer to produce a hierarchical Z-buffer ("early",
   in the language of this patch).

3. Perform occlusion culling of all meshes against the Hi-Z buffer,
   using a screen space AABB test.

4. If a prepass is in use, render it now, using the occlusion culling
   results from (3). Note that if *only* a depth prepass is in use, then
   we can avoid rendering meshes that we rendered in phase (1), since
   they're already in the depth buffer.

5. Render main passes, using the occlusion culling results from (3).

6. Downsample the Z-buffer to produce a hierarchical Z-buffer again
   ("late", in the language of this patch). This readies the Z-buffer
   for step (1) of the next frame. It differs from the hierarchical
   Z-buffer produced in (2) because it includes meshes that weren't
   visible last frame, but became visible this frame.

This commit adds steps (1), (2), and (6) to the pipeline, when the
`HierarchicalDepthBuffer` component is present. It doesn't add step (3),
because step (3) depends on bevyengine#12889 which in turn depends on bevyengine#12773, and
both of those patches are still in review.

Unlike meshlets, we have to handle the case in which the depth buffer is
multisampled. This is the source of most of the extra complexity, since
we can't use the Vulkan extension [2] that allows us to easily resolve
multisampled depth buffers using the min operation.

At Jasmine's request, I haven't touched the meshlet code except to do
some very minor refactoring; the code is generally copied in.

[1]: https://medium.com/@mil_kru/two-pass-occlusion-culling-4100edcad501

[2]: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VkSubpassDescriptionDepthStencilResolveKHR.html
pcwalton added a commit to pcwalton/bevy that referenced this pull request Apr 7, 2024
preparation for GPU occlusion culling.

Two-phase occlusion culling [1], which is generally considered the
state-of-the-art occlusion culling technique. We already use two-phase
occlusion culling for meshlets, but we don't for other 3D objects.
Two-phase occlusion culling requires the construction of a *hierarchical
Z-buffer*. This patch implements an opt-in set of passes to generate
that and so is a step along the way to implementing two-phase occlusion
culling, alongside GPU frustum culling (bevyengine#12889).

This commit copies the hierarchical Z-buffer building code from meshlets
into `bevy_core_pipeline`. Adding the new `HierarchicalDepthBuffer`
component to a camera enables the feature. This code should be usable
as-is for third-party plugins that might want to implement two-phase
occlusion culling, but of course we would like to have two-phase
occlusion culling implemented directly in Bevy in the near future.

Two-phase occlusion culling will be implemented using the following
procedure:

1. Render all meshes that would have been visible in the previous frame
   to the depth buffer (with no fragment shader), using the previous
   frame's hierarchical Z-buffer, the previous frame's view matrix (cf.
   bevyengine#12902), and each model's previous view input uniform.

2. Downsample the Z-buffer to produce a hierarchical Z-buffer ("early",
   in the language of this patch).

3. Perform occlusion culling of all meshes against the Hi-Z buffer,
   using a screen space AABB test.

4. If a prepass is in use, render it now, using the occlusion culling
   results from (3). Note that if *only* a depth prepass is in use, then
   we can avoid rendering meshes that we rendered in phase (1), since
   they're already in the depth buffer.

5. Render main passes, using the occlusion culling results from (3).

6. Downsample the Z-buffer to produce a hierarchical Z-buffer again
   ("late", in the language of this patch). This readies the Z-buffer
   for step (1) of the next frame. It differs from the hierarchical
   Z-buffer produced in (2) because it includes meshes that weren't
   visible last frame, but became visible this frame.

This commit adds steps (1), (2), and (6) to the pipeline, when the
`HierarchicalDepthBuffer` component is present. It doesn't add step (3),
because step (3) depends on bevyengine#12889 which in turn depends on bevyengine#12773, and
both of those patches are still in review.

Unlike meshlets, we have to handle the case in which the depth buffer is
multisampled. This is the source of most of the extra complexity, since
we can't use the Vulkan extension [2] that allows us to easily resolve
multisampled depth buffers using the min operation.

At Jasmine's request, I haven't touched the meshlet code except to do
some very minor refactoring; the code is generally copied in.

[1]: https://medium.com/@mil_kru/two-pass-occlusion-culling-4100edcad501

[2]: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VkSubpassDescriptionDepthStencilResolveKHR.html
pcwalton added a commit to pcwalton/bevy that referenced this pull request Apr 7, 2024
preparation for GPU occlusion culling.

Two-phase occlusion culling [1], which is generally considered the
state-of-the-art occlusion culling technique. We already use two-phase
occlusion culling for meshlets, but we don't for other 3D objects.
Two-phase occlusion culling requires the construction of a *hierarchical
Z-buffer*. This patch implements an opt-in set of passes to generate
that and so is a step along the way to implementing two-phase occlusion
culling, alongside GPU frustum culling (bevyengine#12889).

This commit copies the hierarchical Z-buffer building code from meshlets
into `bevy_core_pipeline`. Adding the new `HierarchicalDepthBuffer`
component to a camera enables the feature. This code should be usable
as-is for third-party plugins that might want to implement two-phase
occlusion culling, but of course we would like to have two-phase
occlusion culling implemented directly in Bevy in the near future.

Two-phase occlusion culling will be implemented using the following
procedure:

1. Render all meshes that would have been visible in the previous frame
   to the depth buffer (with no fragment shader), using the previous
   frame's hierarchical Z-buffer, the previous frame's view matrix (cf.
   bevyengine#12902), and each model's previous view input uniform.

2. Downsample the Z-buffer to produce a hierarchical Z-buffer ("early",
   in the language of this patch).

3. Perform occlusion culling of all meshes against the Hi-Z buffer,
   using a screen space AABB test.

4. If a prepass is in use, render it now, using the occlusion culling
   results from (3). Note that if *only* a depth prepass is in use, then
   we can avoid rendering meshes that we rendered in phase (1), since
   they're already in the depth buffer.

5. Render main passes, using the occlusion culling results from (3).

6. Downsample the Z-buffer to produce a hierarchical Z-buffer again
   ("late", in the language of this patch). This readies the Z-buffer
   for step (1) of the next frame. It differs from the hierarchical
   Z-buffer produced in (2) because it includes meshes that weren't
   visible last frame, but became visible this frame.

This commit adds steps (1), (2), and (6) to the pipeline, when the
`HierarchicalDepthBuffer` component is present. It doesn't add step (3),
because step (3) depends on bevyengine#12889 which in turn depends on bevyengine#12773, and
both of those patches are still in review.

Unlike meshlets, we have to handle the case in which the depth buffer is
multisampled. This is the source of most of the extra complexity, since
we can't use the Vulkan extension [2] that allows us to easily resolve
multisampled depth buffers using the min operation.

At Jasmine's request, I haven't touched the meshlet code except to do
some very minor refactoring; the code is generally copied in.

[1]: https://medium.com/@mil_kru/two-pass-occlusion-culling-4100edcad501

[2]: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VkSubpassDescriptionDepthStencilResolveKHR.html
@pcwalton pcwalton force-pushed the gpu-culling-3 branch 3 times, most recently from 92678c7 to 2a579d2 Compare April 10, 2024 22:51
This commit implements opt-in GPU frustum culling, built on top of the
infrastructure in bevyengine#12773. To enable it on a camera, add the `GpuCulling`
component to it. To additionally disable CPU frustum culling, add the
`NoCpuCulling` component. Note that adding `GpuCulling` without
`NoCpuCulling` *currently* does nothing useful. The reason why
`GpuCulling` doesn't automatically imply `NoCpuCulling` is that I intend
to follow this patch up with GPU two-phase occlusion culling, and CPU
frustum culling plus GPU occlusion culling seems like a very
commonly-desired mode.

Adding the `GpuCulling` component frustum to a view puts that view into
*indirect mode*. This mode makes all drawcalls indirect, relying on the
mesh preprocessing shader to allocate instances dynamically. In indirect
mode, the `PreprocessWorkItem` `output_index` points not to a
`MeshUniform` instance slot but instead to a set of `wgpu`
`IndirectParameters`, from which it allocates an instance slot
dynamically if frustum culling succeeds. Batch building has been updated
to allocate and track indirect parameter slots, and the AABBs are now
supplied to the GPU as `MeshCullingData`.

A small amount of code relating to the frustum culling has been borrowed
from meshlets and moved into `maths.wgsl`. Note that standard Bevy
frustum culling uses AABBs, while meshlets use bounding spheres; this
means that not as much code can be shared as one might think.

This patch doesn't provide any way to perform GPU culling on shadow
maps, to avoid making this patch bigger than it already is. That can be
a followup.
@pcwalton
Copy link
Contributor Author

This should be ready for review now. I've rebased it on top of #12773.

@pcwalton pcwalton marked this pull request as ready for review April 10, 2024 22:57
@pcwalton pcwalton added this to the 0.14 milestone Apr 19, 2024
@alice-i-cecile alice-i-cecile added the M-Needs-Release-Note Work that should be called out in the blog due to impact label Apr 23, 2024
Copy link
Contributor

@IceSentry IceSentry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks broadly correct. I left a few comments but I'm not approving yet because I'm in the process of testing everything and making sure everything works.

crates/bevy_pbr/src/render/gpu_preprocess.rs Outdated Show resolved Hide resolved
crates/bevy_pbr/src/render/mesh.rs Outdated Show resolved Hide resolved
crates/bevy_pbr/src/render/mesh.rs Outdated Show resolved Hide resolved
crates/bevy_pbr/src/render/mesh.rs Outdated Show resolved Hide resolved
crates/bevy_pbr/src/render/mesh.rs Outdated Show resolved Hide resolved
crates/bevy_pbr/src/render/mesh_preprocess.wgsl Outdated Show resolved Hide resolved
view_query: QueryState<(Entity, Read<PreprocessBindGroup>)>,
view_query: QueryState<(
Entity,
Read<PreprocessBindGroup>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a new change, but why is Read used here? What is it even?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read<T>, part of the lifetimeless module, is just a typedef for &'static T for use in these situations where the &'static might be confusing, since there's nothing static about it. We aren't very consistent about using Read.

examples/3d/3d_shapes.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@superdump superdump left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This basically looks good. I left a few comments. When the ones that sound blocking are addressed, I'll approve and merge. Thanks for the PR! <3

@pcwalton
Copy link
Contributor Author

Comments addressed.

@superdump superdump added this pull request to the merge queue Apr 28, 2024
Merged via the queue into bevyengine:main with commit 16531fb Apr 28, 2024
27 checks passed
pcwalton added a commit to pcwalton/bevy that referenced this pull request Apr 30, 2024
In bevyengine#12889, I mistakenly started dropping unbatchable sorted items on the
floor instead of giving them solitary batches. This caused the objects
in the `shader_instancing` demo to stop showing up. This patch fixes the
issue by giving those items their own batches as expected.

Fixes bevyengine#13130.
github-merge-queue bot pushed a commit that referenced this pull request Apr 30, 2024
In #12889, I mistakenly started dropping unbatchable sorted items on the
floor instead of giving them solitary batches. This caused the objects
in the `shader_instancing` demo to stop showing up. This patch fixes the
issue by giving those items their own batches as expected.

Fixes #13130.
@alice-i-cecile
Copy link
Member

Thank you to everyone involved with the authoring or reviewing of this PR! This work is relatively important and needs release notes! Head over to bevyengine/bevy-website#1307 if you'd like to help out.

ChristopherBiscardi added a commit to ChristopherBiscardi/bevy_ecs_tilemap that referenced this pull request Jun 7, 2024
[12889](bevyengine/bevy#12889) Gpu Frustum Culling removed the dynamic_offset of Transparent2d and it became `extra_index` with the special value `PhaseItemExtraIndex::NONE`, which indicates the `None` that was here previously
zhaop added a commit to zhaop/bevy-website that referenced this pull request Jun 24, 2024
zhaop added a commit to zhaop/transform-gizmo that referenced this pull request Jun 24, 2024
@BD103 BD103 added the M-Needs-Migration-Guide A breaking change to Bevy's public API that needs to be noted in a migration guide label Jun 28, 2024
Copy link
Contributor

It looks like your PR is a breaking change, but you didn't provide a migration guide.

Could you add some context on what users should update when this change get released in a new version of Bevy?
It will be used to help writing the migration guide for the version. Putting it after a ## Migration Guide will help it get automatically picked up by our tooling.

rparrett added a commit to StarArawn/bevy_ecs_tilemap that referenced this pull request Jul 5, 2024
* Update to 0.14.0-rc.2

* [12997](bevyengine/bevy#12997): rename `multi-threaded` to `multi_threaded`

* RenderAssets<Image> is now RenderAssets<GpuImage>

Implemented in [12827](bevyengine/bevy#12827)

* FloatOrd is now in bevy_math

implemented in [12732](bevyengine/bevy#12732)

* convert Transparent2d::dynamic_offset to extra_index

[12889](bevyengine/bevy#12889) Gpu Frustum Culling removed the dynamic_offset of Transparent2d and it became `extra_index` with the special value `PhaseItemExtraIndex::NONE`, which indicates the `None` that was here previously

* RenderPhase<Transparent2d> -> ViewSortedRenderPhases<Transparent2d>

[12453](https://github.com/StarArawn/bevy_ecs_tilemap/pull/bevyengine/bevy#12453): Render phases are now binned or sorted.

Following the changes in the `mesh2d_manual` [example](https://github.com/bevyengine/bevy/blob/ecdd1624f302c5f71aaed95b0984cbbecf8880b7/examples/2d/mesh2d_manual.rs#L357-L358): use the `ViewSortedRenderPhases` resource.

* get_sub_app_mut is now an Option

in [9202](https://github.com/StarArawn/bevy_ecs_tilemap/pull/bevyengine/bevy/pull/9202) SubApp access has changed

* GpuImage::size f32 -> u32 via UVec2

[11698](bevyengine/bevy#11698) changed `GpuImage::size` to `UVec2`.

Right above this, `Extent3d` does the same thing, so I'm taking a small leap and assuming can `as`.

* GpuMesh::primitive_topology -> key_bits/BaseMeshPipeline

[12791](bevyengine/bevy#12791) the `primitive_topology` field on `GpuMesh` was removed in favor of `key_bits` which can be constructed using `BaseMeshPipeline::from_primitive_topology`

* RenderChunk2d::prepare requires &mut MeshVertexBufferLayouts now

[12216](bevyengine/bevy#12216) introduced an argument `&mut MeshVertexBufferLayouts` to `get_mesh_vertex_buffer_layout`, which bevy_ecs_tilemap calls in `RenderChunk2d::prepare`

* into_linear_f32 -> color.0.linear().to_f32_array(),

[12163](bevyengine/bevy#12163) bevy_color was created and Color handling has changed. Specifically Color::as_linear_rgba_f32 has been removed.

LinearRgba is now its own type that can be accessed via [`linear()`](https://docs.rs/bevy/0.14.0-rc.2/bevy/color/enum.Color.html#method.linear) and then converted.

* Must specify type of VisibleEntities when accessing

[12582](bevyengine/bevy#12582) divided `VisibleEntities` into separate lists. So now we have to specify which kind of entity we want. I think we want the Mesh here, and I think we can get rid of the `.index` calls on Entity since Entity [already compares bits](https://docs.rs/bevy_ecs/0.14.0-rc.2/src/bevy_ecs/entity/mod.rs.html#173) for optimized codegen purposes. Waiting to do that until the other changes are in though so as to not change functionality until post-upgrade.

* app.world access is functions now

- [9202](bevyengine/bevy#9202) changed world access to functions. [relevent line](https://github.com/bevyengine/bevy/pull/9202/files#diff-b2fba3a0c86e496085ce7f0e3f1de5960cb754c7d215ed0f087aa556e529f97fR640)
- This also surfaced [12655](bevyengine/bevy#12655) which removed `Into<AssetId<T>>` for `Handle<T>`. using a reference or .id() is the solution here.

* We don't need `World::cell`, and it doesn't exist anymore

In [12551](bevyengine/bevy#12551) `WorldCell` was removed.

...but it turns out we don't need it or its replacement anyway.

* examples error out unless this bevy bug is addressed with these features being added

bevyengine/bevy#13728

* check_visibility is required for the entity that is renderable

As a result of [12582](bevyengine/bevy#12582) `check_visibility` must be implemented for the "renderable" tilemap entities. Doing this is trivial by taking advantage of the
existing `check_visibility` type arguments, which accept a [`QF: QueryFilter + 'static`](https://docs.rs/bevy/0.14.0-rc.2/bevy/render/view/fn.check_visibility.html).

The same `QueryFilter`` is used when checking `VisibleEntities`. I've chosen `With<TilemapRenderSettings` because presumably if the entity doesn't have a `TilemapRenderSettings` then it will not be rendering, but this could be as sophisticated or simple as we want.

For example `WithLight` is currently implemented as

```rust
pub type WithLight = Or<(With<PointLight>, With<SpotLight>, With<DirectionalLight>)>;
```

* view.view_proj -> view.clip_from_world

[13289](bevyengine/bevy#13489) introduced matrix naming changes, including `view_proj` which becomes `clip_from_world`

* color changes to make tests runnable

* clippy fix

* Update Cargo.toml

Co-authored-by: Rob Parrett <[email protected]>

* Update Cargo.toml

Co-authored-by: Rob Parrett <[email protected]>

* final clippy fixes

* Update Cargo.toml

Co-authored-by: Rob Parrett <[email protected]>

* Simplify async loading in ldtk/tiled helpers

See Bevy #12550

* remove second allow lint

* rc.3 bump

* bump version for major release

* remove unused features

---------

Co-authored-by: Rob Parrett <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Rendering Drawing game state to the screen C-Feature A new feature, making something new possible M-Needs-Migration-Guide A breaking change to Bevy's public API that needs to be noted in a migration guide M-Needs-Release-Note Work that should be called out in the blog due to impact
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants