-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement GPU frustum culling. #12889
Conversation
This seems worthy of the release notes. Thoughts? |
preparation for GPU occlusion culling. Two-phase occlusion culling [1], which is generally considered the state-of-the-art occlusion culling technique. We already use two-phase occlusion culling for meshlets, but we don't for other 3D objects. Two-phase occlusion culling requires the construction of a *hierarchical Z-buffer*. This patch implements an opt-in pass to generate that and so is a step along the way to implementing two-phase occlusion culling, alongside GPU frustum culling (bevyengine#12889). This commit copies the hierarchical Z-buffer building code from meshlets into `bevy_core_pipeline`. Adding the new `HierarchicalDepthBuffer` component to a camera enables the feature. This code should be usable as-is for third-party plugins that might want to implement two-phase occlusion culling, but of course we would like to have two-phase occlusion culling implemented directly in Bevy in the near future. Unlike meshlets, we have to handle the case in which the depth buffer is multisampled. This is the source of most of the extra complexity, since we can't use the Vulkan extension [2] that allows us to easily resolve multisampled depth buffers using the min operation. At Jasmine's request, I haven't touched the meshlet code except to do some very minor refactoring; the code is generally copied in. [1]: https://medium.com/@mil_kru/two-pass-occlusion-culling-4100edcad501 [2]: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VkSubpassDescriptionDepthStencilResolveKHR.html
preparation for GPU occlusion culling. Two-phase occlusion culling [1], which is generally considered the state-of-the-art occlusion culling technique. We already use two-phase occlusion culling for meshlets, but we don't for other 3D objects. Two-phase occlusion culling requires the construction of a *hierarchical Z-buffer*. This patch implements an opt-in set of passes to generate that and so is a step along the way to implementing two-phase occlusion culling, alongside GPU frustum culling (bevyengine#12889). This commit copies the hierarchical Z-buffer building code from meshlets into `bevy_core_pipeline`. Adding the new `HierarchicalDepthBuffer` component to a camera enables the feature. This code should be usable as-is for third-party plugins that might want to implement two-phase occlusion culling, but of course we would like to have two-phase occlusion culling implemented directly in Bevy in the near future. Two-phase occlusion culling will be implemented using the following procedure: 1. Render all meshes that would have been visible in the previous frame to the depth buffer (with no fragment shader), using the previous frame's hierarchical Z-buffer, the previous frame's view matrix (cf. bevyengine#12902), and each model's previous view input uniform. 2. Downsample the Z-buffer to produce a hierarchical Z-buffer ("early", in the language of this patch). 3. Perform occlusion culling of all meshes against the Hi-Z buffer, using a screen space AABB test. 4. If a prepass is in use, render it now, using the occlusion culling results from (3). Note that if *only* a depth prepass is in use, then we can avoid rendering meshes that we rendered in phase (1), since they're already in the depth buffer. 5. Render main passes, using the occlusion culling results from (3). 6. Downsample the Z-buffer to produce a hierarchical Z-buffer again ("late", in the language of this patch). This readies the Z-buffer for step (1) of the next frame. It differs from the hierarchical Z-buffer produced in (2) because it includes meshes that weren't visible last frame, but became visible this frame. This commit adds steps (1), (2), and (6) to the pipeline, when the `HierarchicalDepthBuffer` component is present. It doesn't add step (3), because step (3) depends on bevyengine#12889 which in turn depends on bevyengine#12773, and both of those patches are still in review. Unlike meshlets, we have to handle the case in which the depth buffer is multisampled. This is the source of most of the extra complexity, since we can't use the Vulkan extension [2] that allows us to easily resolve multisampled depth buffers using the min operation. At Jasmine's request, I haven't touched the meshlet code except to do some very minor refactoring; the code is generally copied in. [1]: https://medium.com/@mil_kru/two-pass-occlusion-culling-4100edcad501 [2]: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VkSubpassDescriptionDepthStencilResolveKHR.html
preparation for GPU occlusion culling. Two-phase occlusion culling [1], which is generally considered the state-of-the-art occlusion culling technique. We already use two-phase occlusion culling for meshlets, but we don't for other 3D objects. Two-phase occlusion culling requires the construction of a *hierarchical Z-buffer*. This patch implements an opt-in set of passes to generate that and so is a step along the way to implementing two-phase occlusion culling, alongside GPU frustum culling (bevyengine#12889). This commit copies the hierarchical Z-buffer building code from meshlets into `bevy_core_pipeline`. Adding the new `HierarchicalDepthBuffer` component to a camera enables the feature. This code should be usable as-is for third-party plugins that might want to implement two-phase occlusion culling, but of course we would like to have two-phase occlusion culling implemented directly in Bevy in the near future. Two-phase occlusion culling will be implemented using the following procedure: 1. Render all meshes that would have been visible in the previous frame to the depth buffer (with no fragment shader), using the previous frame's hierarchical Z-buffer, the previous frame's view matrix (cf. bevyengine#12902), and each model's previous view input uniform. 2. Downsample the Z-buffer to produce a hierarchical Z-buffer ("early", in the language of this patch). 3. Perform occlusion culling of all meshes against the Hi-Z buffer, using a screen space AABB test. 4. If a prepass is in use, render it now, using the occlusion culling results from (3). Note that if *only* a depth prepass is in use, then we can avoid rendering meshes that we rendered in phase (1), since they're already in the depth buffer. 5. Render main passes, using the occlusion culling results from (3). 6. Downsample the Z-buffer to produce a hierarchical Z-buffer again ("late", in the language of this patch). This readies the Z-buffer for step (1) of the next frame. It differs from the hierarchical Z-buffer produced in (2) because it includes meshes that weren't visible last frame, but became visible this frame. This commit adds steps (1), (2), and (6) to the pipeline, when the `HierarchicalDepthBuffer` component is present. It doesn't add step (3), because step (3) depends on bevyengine#12889 which in turn depends on bevyengine#12773, and both of those patches are still in review. Unlike meshlets, we have to handle the case in which the depth buffer is multisampled. This is the source of most of the extra complexity, since we can't use the Vulkan extension [2] that allows us to easily resolve multisampled depth buffers using the min operation. At Jasmine's request, I haven't touched the meshlet code except to do some very minor refactoring; the code is generally copied in. [1]: https://medium.com/@mil_kru/two-pass-occlusion-culling-4100edcad501 [2]: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VkSubpassDescriptionDepthStencilResolveKHR.html
preparation for GPU occlusion culling. Two-phase occlusion culling [1], which is generally considered the state-of-the-art occlusion culling technique. We already use two-phase occlusion culling for meshlets, but we don't for other 3D objects. Two-phase occlusion culling requires the construction of a *hierarchical Z-buffer*. This patch implements an opt-in set of passes to generate that and so is a step along the way to implementing two-phase occlusion culling, alongside GPU frustum culling (bevyengine#12889). This commit copies the hierarchical Z-buffer building code from meshlets into `bevy_core_pipeline`. Adding the new `HierarchicalDepthBuffer` component to a camera enables the feature. This code should be usable as-is for third-party plugins that might want to implement two-phase occlusion culling, but of course we would like to have two-phase occlusion culling implemented directly in Bevy in the near future. Two-phase occlusion culling will be implemented using the following procedure: 1. Render all meshes that would have been visible in the previous frame to the depth buffer (with no fragment shader), using the previous frame's hierarchical Z-buffer, the previous frame's view matrix (cf. bevyengine#12902), and each model's previous view input uniform. 2. Downsample the Z-buffer to produce a hierarchical Z-buffer ("early", in the language of this patch). 3. Perform occlusion culling of all meshes against the Hi-Z buffer, using a screen space AABB test. 4. If a prepass is in use, render it now, using the occlusion culling results from (3). Note that if *only* a depth prepass is in use, then we can avoid rendering meshes that we rendered in phase (1), since they're already in the depth buffer. 5. Render main passes, using the occlusion culling results from (3). 6. Downsample the Z-buffer to produce a hierarchical Z-buffer again ("late", in the language of this patch). This readies the Z-buffer for step (1) of the next frame. It differs from the hierarchical Z-buffer produced in (2) because it includes meshes that weren't visible last frame, but became visible this frame. This commit adds steps (1), (2), and (6) to the pipeline, when the `HierarchicalDepthBuffer` component is present. It doesn't add step (3), because step (3) depends on bevyengine#12889 which in turn depends on bevyengine#12773, and both of those patches are still in review. Unlike meshlets, we have to handle the case in which the depth buffer is multisampled. This is the source of most of the extra complexity, since we can't use the Vulkan extension [2] that allows us to easily resolve multisampled depth buffers using the min operation. At Jasmine's request, I haven't touched the meshlet code except to do some very minor refactoring; the code is generally copied in. [1]: https://medium.com/@mil_kru/two-pass-occlusion-culling-4100edcad501 [2]: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VkSubpassDescriptionDepthStencilResolveKHR.html
92678c7
to
2a579d2
Compare
This commit implements opt-in GPU frustum culling, built on top of the infrastructure in bevyengine#12773. To enable it on a camera, add the `GpuCulling` component to it. To additionally disable CPU frustum culling, add the `NoCpuCulling` component. Note that adding `GpuCulling` without `NoCpuCulling` *currently* does nothing useful. The reason why `GpuCulling` doesn't automatically imply `NoCpuCulling` is that I intend to follow this patch up with GPU two-phase occlusion culling, and CPU frustum culling plus GPU occlusion culling seems like a very commonly-desired mode. Adding the `GpuCulling` component frustum to a view puts that view into *indirect mode*. This mode makes all drawcalls indirect, relying on the mesh preprocessing shader to allocate instances dynamically. In indirect mode, the `PreprocessWorkItem` `output_index` points not to a `MeshUniform` instance slot but instead to a set of `wgpu` `IndirectParameters`, from which it allocates an instance slot dynamically if frustum culling succeeds. Batch building has been updated to allocate and track indirect parameter slots, and the AABBs are now supplied to the GPU as `MeshCullingData`. A small amount of code relating to the frustum culling has been borrowed from meshlets and moved into `maths.wgsl`. Note that standard Bevy frustum culling uses AABBs, while meshlets use bounding spheres; this means that not as much code can be shared as one might think. This patch doesn't provide any way to perform GPU culling on shadow maps, to avoid making this patch bigger than it already is. That can be a followup.
This should be ready for review now. I've rebased it on top of #12773. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks broadly correct. I left a few comments but I'm not approving yet because I'm in the process of testing everything and making sure everything works.
view_query: QueryState<(Entity, Read<PreprocessBindGroup>)>, | ||
view_query: QueryState<( | ||
Entity, | ||
Read<PreprocessBindGroup>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a new change, but why is Read used here? What is it even?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read<T>
, part of the lifetimeless
module, is just a typedef for &'static T
for use in these situations where the &'static
might be confusing, since there's nothing static about it. We aren't very consistent about using Read
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This basically looks good. I left a few comments. When the ones that sound blocking are addressed, I'll approve and merge. Thanks for the PR! <3
Comments addressed. |
In bevyengine#12889, I mistakenly started dropping unbatchable sorted items on the floor instead of giving them solitary batches. This caused the objects in the `shader_instancing` demo to stop showing up. This patch fixes the issue by giving those items their own batches as expected. Fixes bevyengine#13130.
Thank you to everyone involved with the authoring or reviewing of this PR! This work is relatively important and needs release notes! Head over to bevyengine/bevy-website#1307 if you'd like to help out. |
[12889](bevyengine/bevy#12889) Gpu Frustum Culling removed the dynamic_offset of Transparent2d and it became `extra_index` with the special value `PhaseItemExtraIndex::NONE`, which indicates the `None` that was here previously
Addresses Bevy [12889](bevyengine/bevy#12889)
It looks like your PR is a breaking change, but you didn't provide a migration guide. Could you add some context on what users should update when this change get released in a new version of Bevy? |
* Update to 0.14.0-rc.2 * [12997](bevyengine/bevy#12997): rename `multi-threaded` to `multi_threaded` * RenderAssets<Image> is now RenderAssets<GpuImage> Implemented in [12827](bevyengine/bevy#12827) * FloatOrd is now in bevy_math implemented in [12732](bevyengine/bevy#12732) * convert Transparent2d::dynamic_offset to extra_index [12889](bevyengine/bevy#12889) Gpu Frustum Culling removed the dynamic_offset of Transparent2d and it became `extra_index` with the special value `PhaseItemExtraIndex::NONE`, which indicates the `None` that was here previously * RenderPhase<Transparent2d> -> ViewSortedRenderPhases<Transparent2d> [12453](https://github.com/StarArawn/bevy_ecs_tilemap/pull/bevyengine/bevy#12453): Render phases are now binned or sorted. Following the changes in the `mesh2d_manual` [example](https://github.com/bevyengine/bevy/blob/ecdd1624f302c5f71aaed95b0984cbbecf8880b7/examples/2d/mesh2d_manual.rs#L357-L358): use the `ViewSortedRenderPhases` resource. * get_sub_app_mut is now an Option in [9202](https://github.com/StarArawn/bevy_ecs_tilemap/pull/bevyengine/bevy/pull/9202) SubApp access has changed * GpuImage::size f32 -> u32 via UVec2 [11698](bevyengine/bevy#11698) changed `GpuImage::size` to `UVec2`. Right above this, `Extent3d` does the same thing, so I'm taking a small leap and assuming can `as`. * GpuMesh::primitive_topology -> key_bits/BaseMeshPipeline [12791](bevyengine/bevy#12791) the `primitive_topology` field on `GpuMesh` was removed in favor of `key_bits` which can be constructed using `BaseMeshPipeline::from_primitive_topology` * RenderChunk2d::prepare requires &mut MeshVertexBufferLayouts now [12216](bevyengine/bevy#12216) introduced an argument `&mut MeshVertexBufferLayouts` to `get_mesh_vertex_buffer_layout`, which bevy_ecs_tilemap calls in `RenderChunk2d::prepare` * into_linear_f32 -> color.0.linear().to_f32_array(), [12163](bevyengine/bevy#12163) bevy_color was created and Color handling has changed. Specifically Color::as_linear_rgba_f32 has been removed. LinearRgba is now its own type that can be accessed via [`linear()`](https://docs.rs/bevy/0.14.0-rc.2/bevy/color/enum.Color.html#method.linear) and then converted. * Must specify type of VisibleEntities when accessing [12582](bevyengine/bevy#12582) divided `VisibleEntities` into separate lists. So now we have to specify which kind of entity we want. I think we want the Mesh here, and I think we can get rid of the `.index` calls on Entity since Entity [already compares bits](https://docs.rs/bevy_ecs/0.14.0-rc.2/src/bevy_ecs/entity/mod.rs.html#173) for optimized codegen purposes. Waiting to do that until the other changes are in though so as to not change functionality until post-upgrade. * app.world access is functions now - [9202](bevyengine/bevy#9202) changed world access to functions. [relevent line](https://github.com/bevyengine/bevy/pull/9202/files#diff-b2fba3a0c86e496085ce7f0e3f1de5960cb754c7d215ed0f087aa556e529f97fR640) - This also surfaced [12655](bevyengine/bevy#12655) which removed `Into<AssetId<T>>` for `Handle<T>`. using a reference or .id() is the solution here. * We don't need `World::cell`, and it doesn't exist anymore In [12551](bevyengine/bevy#12551) `WorldCell` was removed. ...but it turns out we don't need it or its replacement anyway. * examples error out unless this bevy bug is addressed with these features being added bevyengine/bevy#13728 * check_visibility is required for the entity that is renderable As a result of [12582](bevyengine/bevy#12582) `check_visibility` must be implemented for the "renderable" tilemap entities. Doing this is trivial by taking advantage of the existing `check_visibility` type arguments, which accept a [`QF: QueryFilter + 'static`](https://docs.rs/bevy/0.14.0-rc.2/bevy/render/view/fn.check_visibility.html). The same `QueryFilter`` is used when checking `VisibleEntities`. I've chosen `With<TilemapRenderSettings` because presumably if the entity doesn't have a `TilemapRenderSettings` then it will not be rendering, but this could be as sophisticated or simple as we want. For example `WithLight` is currently implemented as ```rust pub type WithLight = Or<(With<PointLight>, With<SpotLight>, With<DirectionalLight>)>; ``` * view.view_proj -> view.clip_from_world [13289](bevyengine/bevy#13489) introduced matrix naming changes, including `view_proj` which becomes `clip_from_world` * color changes to make tests runnable * clippy fix * Update Cargo.toml Co-authored-by: Rob Parrett <[email protected]> * Update Cargo.toml Co-authored-by: Rob Parrett <[email protected]> * final clippy fixes * Update Cargo.toml Co-authored-by: Rob Parrett <[email protected]> * Simplify async loading in ldtk/tiled helpers See Bevy #12550 * remove second allow lint * rc.3 bump * bump version for major release * remove unused features --------- Co-authored-by: Rob Parrett <[email protected]>
This commit implements opt-in GPU frustum culling, built on top of the infrastructure in #12773. To enable it on a camera, add the
GpuCulling
component to it. To additionally disable CPU frustum culling, add theNoCpuCulling
component. Note that addingGpuCulling
withoutNoCpuCulling
currently does nothing useful. The reason whyGpuCulling
doesn't automatically implyNoCpuCulling
is that I intend to follow this patch up with GPU two-phase occlusion culling, and CPU frustum culling plus GPU occlusion culling seems like a very commonly-desired mode.Adding the
GpuCulling
component to a view puts that view into indirect mode. This mode makes all drawcalls indirect, relying on the mesh preprocessing shader to allocate instances dynamically. In indirect mode, thePreprocessWorkItem
output_index
points not to aMeshUniform
instance slot but instead to a set ofwgpu
IndirectParameters
, from which it allocates an instance slot dynamically if frustum culling succeeds. Batch building has been updated to allocate and track indirect parameter slots, and the AABBs are now supplied to the GPU asMeshCullingData
.A small amount of code relating to the frustum culling has been borrowed from meshlets and moved into
maths.wgsl
. Note that standard Bevy frustum culling uses AABBs, while meshlets use bounding spheres; this means that not as much code can be shared as one might think.This patch doesn't provide any way to perform GPU culling on shadow maps, to avoid making this patch bigger than it already is. That can be a followup.
Changelog
Added
GpuCulling
component to a camera.NoCpuCulling
to a camera. Note thatGpuCulling
doesn't automatically implyNoCpuCulling
.