Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sort by pipeline then mesh for non transparent passes for massively better batching #11671

Merged
merged 6 commits into from
Feb 5, 2024

Conversation

Elabajaba
Copy link
Contributor

Objective

Bevy does ridiculous amount of drawcalls, and our batching isn't very effective because we sort by distance and only batch if we get multiple of the same object in a row. This can give us slightly better GPU performance when not using the depth prepass (due to less overdraw), but ends up being massively CPU bottlenecked due to doing thousands of unnecessary drawcalls.

Solution

Change the sort functions to sort by pipeline key then by mesh id for large performance gains in more realistic scenes than our stress tests.

Pipelines changed:

  • Opaque3d
  • Opaque3dDeferred
  • Opaque3dPrepass

image


Changelog

  • Opaque3d drawing order is now sorted by pipeline and mesh, rather than by distance. This trades off a bit of GPU time in exchange for massively better batching in scenes that aren't only drawing huge amounts of a single object.

@JMS55 JMS55 added A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times labels Feb 3, 2024
@JMS55 JMS55 added this to the 0.13 milestone Feb 3, 2024
@JMS55 JMS55 added the S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it label Feb 3, 2024
@JMS55
Copy link
Contributor

JMS55 commented Feb 3, 2024

Actually we should test additionally batching/sorting by bind group, either before or after the mesh.

@robtfm
Copy link
Contributor

robtfm commented Feb 3, 2024

It seems like it should be good overall unless the frag overdraw cost is very expensive for a particular shader.

So expensive custom shaders might suffer perf loss but it would almost certainly be mitigated by the author implementing a prepass frag shader and adding depth prepass, giving better perf overall even then.

@@ -201,7 +203,8 @@ impl PhaseItem for Opaque3d {

#[inline]
fn sort_key(&self) -> Self::SortKey {
Reverse(FloatOrd(self.distance))
// Sort by pipeline, then by mesh to massively decrease drawcall counts in real scenes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to update this comment to mention bind groups. Also, was it not helpful to add bind group sorting for the prepasses?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the prepass and deferred it kept panicking on the unwrap when getting the material bindgroup.

We also may need to remove the unwrap for the opaque3d pass when getting the material bindgroup id, but I'm not sure if it makes sense that we'd be drawing things in the forward pass with no material bind group?

I'd like to remove the unwrap on getting the bind group id, but I'm not sure how the sorting would deal with a wrapped Option<NonZeroU32>.

Copy link
Contributor

@JMS55 JMS55 Feb 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can't figure it out, I'd just add a TODO comment and we can do a followup PR.

For sorting, you can make a wrapper type implementing Ord, and then do sort_by(pipeline, Wrapper(maybe_bind_group), mesh).

@JMS55 JMS55 removed the S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it label Feb 3, 2024
@Elabajaba
Copy link
Contributor Author

I've reverted sorting by bind group ID as I don't think I'll have time to fix the unwrap, and I really want to get this in for 0.13.

Sorting by the bindgroupid also probably needs more varied testing than just 2 scenes on my system, and should be pretty easy to add in a followup PR late.

@JMS55
Copy link
Contributor

JMS55 commented Feb 5, 2024

Sounds good to me. Can you open an issue and add it to the 0.14 milestone so that we don't forget please?

@JMS55 JMS55 added the S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it label Feb 5, 2024
auto-merge was automatically disabled February 5, 2024 21:57

Head branch was pushed to by a user without write access

@mockersf mockersf added this pull request to the merge queue Feb 5, 2024
Merged via the queue into bevyengine:main with commit 2a1ebc4 Feb 5, 2024
23 checks passed
tjamaan pushed a commit to tjamaan/bevy that referenced this pull request Feb 6, 2024
…etter batching (bevyengine#11671)

# Objective

Bevy does ridiculous amount of drawcalls, and our batching isn't very
effective because we sort by distance and only batch if we get multiple
of the same object in a row. This can give us slightly better GPU
performance when not using the depth prepass (due to less overdraw), but
ends up being massively CPU bottlenecked due to doing thousands of
unnecessary drawcalls.

## Solution

Change the sort functions to sort by pipeline key then by mesh id for
large performance gains in more realistic scenes than our stress tests.

Pipelines changed:
- Opaque3d
- Opaque3dDeferred
- Opaque3dPrepass


![image](https://github.com/bevyengine/bevy/assets/177631/8c355256-ad86-4b47-81a0-f3906797fe7e)


---

## Changelog

- Opaque3d drawing order is now sorted by pipeline and mesh, rather than
by distance. This trades off a bit of GPU time in exchange for massively
better batching in scenes that aren't only drawing huge amounts of a
single object.
@AlphaModder
Copy link

Any reason not to sort by distance within groups of the same mesh and pipeline? Seems like it would recover whatever performance this PR lost in scenes where there really are huge amounts of the same object.

github-merge-queue bot pushed a commit that referenced this pull request Feb 26, 2024
…#12117)

# Objective

- followup to #11671
- I forgot to change the alpha masked phases.

## Solution

- Change the sorting for alpha mask phases to sort by pipeline+mesh
instead of distance, for much better batching for alpha masked
materials.

I also fixed some docs that I missed in the previous PR.

---

## Changelog
- Alpha masked materials are now sorted by pipeline and mesh.
msvbg pushed a commit to msvbg/bevy that referenced this pull request Feb 26, 2024
…bevyengine#12117)

# Objective

- followup to bevyengine#11671
- I forgot to change the alpha masked phases.

## Solution

- Change the sorting for alpha mask phases to sort by pipeline+mesh
instead of distance, for much better batching for alpha masked
materials.

I also fixed some docs that I missed in the previous PR.

---

## Changelog
- Alpha masked materials are now sorted by pipeline and mesh.
msvbg pushed a commit to msvbg/bevy that referenced this pull request Feb 26, 2024
…bevyengine#12117)

# Objective

- followup to bevyengine#11671
- I forgot to change the alpha masked phases.

## Solution

- Change the sorting for alpha mask phases to sort by pipeline+mesh
instead of distance, for much better batching for alpha masked
materials.

I also fixed some docs that I missed in the previous PR.

---

## Changelog
- Alpha masked materials are now sorted by pipeline and mesh.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants