Pack multiple vertex and index arrays together into growable buffers. #14257

pcwalton · 2024-07-09T23:41:47Z

This commit uses the offset-allocator crate to combine vertex and index arrays from different meshes into single buffers. Since the primary source of wgpu overhead is from validation and synchronization when switching buffers, this significantly improves Bevy's rendering performance on many scenes.

This patch is a more flexible version of #13218, which also used slabs. Unlike #13218, which used slabs of a fixed size, this commit implements slabs that start small and can grow. In addition to reducing memory usage, supporting slab growth reduces the number of vertex and index buffer switches that need to happen during rendering, leading to improved performance. To prevent pathological fragmentation behavior, slabs are capped to a maximum size, and mesh arrays that are too large get their own dedicated slabs.

As an additional improvement over #13218, this commit allows the application to customize all allocator heuristics. The MeshAllocatorSettings resource contains values that adjust the minimum and maximum slab sizes, the cutoff point at which meshes get their own dedicated slabs, and the rate at which slabs grow. Hopefully-sensible defaults have been chosen for each value.

Unfortunately, WebGL 2 doesn't support the base vertex feature, which is necessary to pack vertex arrays from different meshes into the same buffer. wgpu represents this restriction as the downlevel flag BASE_VERTEX. This patch detects that bit and ensures that all vertex buffers get dedicated slabs on that platform. Even on WebGL 2, though, we can combine all index arrays into single buffers to reduce buffer changes, and we do so.

The following measurements are on Bistro:

Overall frame time improves from 8.74 ms to 5.53 ms (1.58x speedup):

Render system time improves from 6.57 ms to 3.54 ms (1.86x speedup):

Opaque pass time improves from 4.64 ms to 2.33 ms (1.99x speedup):

Migration Guide

Changed

Vertex and index buffers for meshes may now be packed alongside other buffers, for performance.
GpuMesh has been renamed to RenderMesh, to reflect the fact that it no longer directly stores handles to GPU objects.
Because meshes no longer have their own vertex and index buffers, the responsibility for the buffers has moved from GpuMesh (now called RenderMesh) to the MeshAllocator resource. To access the vertex data for a mesh, use MeshAllocator::mesh_vertex_slice. To access the index data for a mesh, use MeshAllocator::mesh_index_slice.

This commit uses the [`offset-allocator`] crate to combine vertex and index arrays from different meshes into single buffers. Since the primary source of `wgpu` overhead is from validation and synchronization when switching buffers, this significantly improves Bevy's rendering performance on many scenes. This patch is a more flexible version of bevyengine#13218, which also used slabs. Unlike bevyengine#13218, which used slabs of a fixed size, this commit implements slabs that start small and can grow. In addition to reducing memory usage, supporting slab growth reduces the number of vertex and index buffer switches that need to happen during rendering, leading to improved performance. To prevent pathological fragmentation behavior, slabs are capped to a maximum size, and mesh arrays that are too large get their own dedicated slabs. As an additional improvement over bevyengine#13218, this commit allows the application to customize all allocator heuristics. The `MeshAllocatorSettings` resource contains values that adjust the minimum and maximum slab sizes, the cutoff point at which meshes get their own dedicated slabs, and the rate at which slabs grow. Hopefully-sensible defaults have been chosen for each value. Unfortunately, WebGL 2 doesn't support the *base vertex* feature, which is necessary to pack vertex arrays from different meshes into the same buffer. `wgpu` represents this restriction as the downlevel flag `BASE_VERTEX`. This patch detects that bit and ensures that all vertex buffers get dedicated slabs on that platform. Even on WebGL 2, though, we can combine all *index* arrays into single buffers to reduce buffer changes, and we do so.

JMS55

Great docs as always. Looks generally good. If there any bugs or perf overhead from the allocator itself, we can iron them out later. I'm not confident that separate slabs for large buffers are needed, but we can try it out, I don't feel strongly about removing them either. Bevy really needs an example/benchmark for loading and unloading multiple assets in the background during gameplay.

The one thing I'm concerned about is that GpuMesh seems to be in an akward spot now, given that it doesn't actually hold any mesh data anymore. Not sure what we want to do with it now, I'll let @superdump weigh in.

I'd like to see:

Perf test on WebGL2 to see how much of a hit the allocator logic is vs main
A bit more detailed migration guide, specifically mentioning the GpuMesh change and how it now lives in MeshAllocator. Please mention the full types, not just the concepts.

crates/bevy_render/src/batching/gpu_preprocessing.rs

crates/bevy_render/src/mesh/mesh/mod.rs

crates/bevy_render/src/mesh/allocator.rs

JMS55 · 2024-07-11T17:10:42Z

crates/bevy_render/src/mesh/allocator.rs

+                Render,
+                allocate_and_free_meshes
+                    .in_set(RenderSet::PrepareAssets)
+                    .before(prepare_assets::<GpuMesh>),


Why before here, and not after? This stands out as very weird to me.

It's because prepare_assets clears the contents of the ExtractedAssets<RenderMesh> resource after processing it. Because allocate_and_free_meshes needs to use that data, it has to run first. (We don't want to merge allocate_and_free_meshes with RenderMesh::prepare_asset because, among other reasons, that would result in a lot of useless copying when a bunch of meshes load all at the same time and the buffer grows multiple times in a single frame, which happens a lot.)

Am I correct that this this system runs on a 1-frame delay then?

No. The order in a single frame is this:

extract_render_asset populates ExtractedAssets<RenderMesh> during the extract phase.

allocate_and_free_meshes examines the ExtractedAssets<RenderMesh> to perform the allocations during the asset prep phase.

prepare_asset processes the ExtractedAssets<RenderMesh> again, and clears it out.

So extract_render_asset produces the data, allocate_and_free_meshes inspects the data, and finally prepare_asset consumes the data. This pipeline happens entirely during the frame.

Ah ok, thanks for the explanation. This makes sense to me, and mimics how I have it setup in my own code.

crates/bevy_render/src/mesh/allocator.rs

crates/bevy_render/src/render_asset.rs

BD103 · 2024-07-11T22:32:27Z

The original PR was marked for release notes, so I'm going to add it here too!

crates/bevy_render/src/mesh/allocator.rs

pcwalton · 2024-07-14T22:51:50Z

I've addressed all the review comments as much as I feel is appropriate. From running the examples on WebGL the performance seems fine. I can't do very in-depth performance analysis there without better diagnostics, and I don't think we should block this PR on better WebGL diagnostics.

tychedelia

Looks good, thanks for the excellent documentation. Two small comments:

It might be nice to have some logging to warn when a mesh spills over into the large object slab if it's near the boundary.
We should follow up to rename the other RenderAsset e.g GpuImage -> `RenderImage for consistency per the changes here.

examples/shader/shader_instancing.rs

NthTensor · 2024-07-16T00:02:05Z

The issue with the bistro interior seems to have been resolved. I can replicate the opaque pass improvements. FPS did not improve in tests of low-end hardware (I am totally gpu bottlenecked on most scenes), but it also doesn't regress which is the main thing I was concerned about.

pcwalton · 2024-07-16T20:32:38Z

The merge fallout is fixed. This is ready to be merged again.

The "uberbuffers" PR bevyengine#14257 caused some examples to fail intermittently for different reasons: 1. `morph_targets` could fail because vertex displacements for morph targets are keyed off the vertex index. With buffer packing, the vertex index can vary based on the position in the buffer, which caused the morph targets to be potentially incorrect. The solution is to include the first vertex index with the `MeshUniform` (and `MeshInputUniform` if GPU preprocessing is in use), so that the shader can calculate the true vertex index before performing the morph operation. This results in wasted space in `MeshUniform`, which is unfortunate, but we'll soon be filling in the padding with the ID of the material when bindless textures land, so this had to happen sooner or later anyhow. Including the vertex index in the `MeshInputUniform` caused an ordering problem. The `MeshInputUniform` was created during the extraction phase, before the allocations occurred, so the extraction logic didn't know where the mesh vertex data was going to end up. The solution is to move the `MeshInputUniform` creation (the `collect_meshes_for_gpu_building` system) to after the allocations phase. This should be better for parallelism anyhow, because it allows the extraction phase to finish quicker. It's also something we'll have to do for bindless in any event. 2. The `lines` and `fog_volumes` examples could fail because their custom drawing nodes weren't updated to supply the vertex and index offsets in their `draw_indexed` and `draw` calls. This commit fixes this oversight. Fixes bevyengine#14366.

The "uberbuffers" PR #14257 caused some examples to fail intermittently for different reasons: 1. `morph_targets` could fail because vertex displacements for morph targets are keyed off the vertex index. With buffer packing, the vertex index can vary based on the position in the buffer, which caused the morph targets to be potentially incorrect. The solution is to include the first vertex index with the `MeshUniform` (and `MeshInputUniform` if GPU preprocessing is in use), so that the shader can calculate the true vertex index before performing the morph operation. This results in wasted space in `MeshUniform`, which is unfortunate, but we'll soon be filling in the padding with the ID of the material when bindless textures land, so this had to happen sooner or later anyhow. Including the vertex index in the `MeshInputUniform` caused an ordering problem. The `MeshInputUniform` was created during the extraction phase, before the allocations occurred, so the extraction logic didn't know where the mesh vertex data was going to end up. The solution is to move the `MeshInputUniform` creation (the `collect_meshes_for_gpu_building` system) to after the allocations phase. This should be better for parallelism anyhow, because it allows the extraction phase to finish quicker. It's also something we'll have to do for bindless in any event. 2. The `lines` and `fog_volumes` examples could fail because their custom drawing nodes weren't updated to supply the vertex and index offsets in their `draw_indexed` and `draw` calls. This commit fixes this oversight. Fixes #14366.

alice-i-cecile · 2024-10-20T14:19:06Z

Thank you to everyone involved with the authoring or reviewing of this PR! This work is relatively important and needs release notes! Head over to bevyengine/bevy-website#1662 if you'd like to help out.

See bevyengine/bevy#14257.

See: bevyengine/bevy#14257

pcwalton requested review from Elabajaba and IceSentry July 9, 2024 23:41

pcwalton added this to the 0.15 milestone Jul 9, 2024

pcwalton added A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Jul 9, 2024

pcwalton force-pushed the growable-uberbuffers branch from 067cd9e to 6403a60 Compare July 9, 2024 23:43

pcwalton force-pushed the growable-uberbuffers branch from 6403a60 to 6c17a07 Compare July 9, 2024 23:48

JMS55 suggested changes Jul 11, 2024

View reviewed changes

BD103 added the M-Release-Note Work that should be called out in the blog due to impact label Jul 11, 2024

pcwalton added 2 commits July 13, 2024 13:44

Partially address review comments

3e36b50

copy_mesh_data -> copy_element_data

b5382ff

JMS55 reviewed Jul 13, 2024

View reviewed changes

crates/bevy_render/src/mesh/allocator.rs Outdated Show resolved Hide resolved

pcwalton added 6 commits July 13, 2024 19:22

Partially address review comments

9e9b86f

Merge remote-tracking branch 'origin/main' into growable-uberbuffers

8b63f64

Address review comment

37316b3

Actually submit the reallocation render queue

994731b

Address review comment

9c53cac

Address review comment

4b71aad

JMS55 reviewed Jul 14, 2024

View reviewed changes

crates/bevy_render/src/mesh/allocator.rs Outdated Show resolved Hide resolved

pcwalton added 3 commits July 14, 2024 15:31

Address review comment

e4de1f3

Rustfmt police

938d6e8

Merge remote-tracking branch 'origin/main' into growable-uberbuffers

da0b353

pcwalton requested a review from JMS55 July 14, 2024 22:53

Rustdoc police

ed28861

IceSentry approved these changes Jul 14, 2024

View reviewed changes

tychedelia approved these changes Jul 15, 2024

View reviewed changes

examples/shader/shader_instancing.rs Show resolved Hide resolved

NthTensor approved these changes Jul 16, 2024

View reviewed changes

NthTensor added S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it and removed S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Jul 16, 2024

pcwalton added 2 commits July 15, 2024 19:29

Address review comment

9b1b969

Merge remote-tracking branch 'origin/main' into growable-uberbuffers

62cccb8

JMS55 approved these changes Jul 16, 2024

View reviewed changes

alice-i-cecile added this pull request to the merge queue Jul 16, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 16, 2024

pcwalton added 2 commits July 16, 2024 11:51

Merge remote-tracking branch 'origin/main' into growable-uberbuffers

50b398e

Update volumetric fog

63a8f11

alice-i-cecile added this pull request to the merge queue Jul 16, 2024

Merged via the queue into bevyengine:main with commit bc34216 Jul 16, 2024

mockersf mentioned this pull request Jul 17, 2024

Some examples are broken after packed growable buffers #14366

Closed

pcwalton mentioned this pull request Jul 18, 2024

Fix the example regressions from packed growable buffers. #14375

Merged

eero-lehtinen mentioned this pull request Jul 30, 2024

MeshAllocator panics when spawning and despawning lots of meshes #14540

Closed

ChristopherBiscardi mentioned this pull request Sep 11, 2024

uberbuffers selects single mesh when two are added #15154

Closed

alice-i-cecile mentioned this pull request Oct 20, 2024

Write release notes for PR #14257: Pack multiple vertex and index arrays together into growable buffers. bevyengine/bevy-website#1662

Closed

bas-ie added a commit to bas-ie/bevy_ecs_tilemap that referenced this pull request Oct 24, 2024

GpuMesh is now RenderMesh

ffd2feb

See bevyengine/bevy#14257.

bas-ie added a commit to bas-ie/bevy_ecs_tilemap that referenced this pull request Oct 25, 2024

First pass at converting GpuMesh to RenderMesh

b70f36a

See: bevyengine/bevy#14257

bas-ie mentioned this pull request Oct 25, 2024

0.15 (sort of) StarArawn/bevy_ecs_tilemap#568

Closed

Uh oh!

Pack multiple vertex and index arrays together into growable buffers. #14257

Pack multiple vertex and index arrays together into growable buffers. #14257

Uh oh!

Conversation

pcwalton commented Jul 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Migration Guide

Changed

Uh oh!

JMS55 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JMS55 Jul 11, 2024

Choose a reason for hiding this comment

Uh oh!

pcwalton Jul 14, 2024

Choose a reason for hiding this comment

Uh oh!

JMS55 Jul 16, 2024

Choose a reason for hiding this comment

Uh oh!

pcwalton Jul 16, 2024

Choose a reason for hiding this comment

Uh oh!

JMS55 Jul 16, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BD103 commented Jul 11, 2024

Uh oh!

Uh oh!

Uh oh!

pcwalton commented Jul 14, 2024

Uh oh!

tychedelia left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NthTensor commented Jul 16, 2024

Uh oh!

Uh oh!

pcwalton commented Jul 16, 2024

Uh oh!

alice-i-cecile commented Oct 20, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

pcwalton commented Jul 9, 2024 •

edited

Loading