Fix performance regression in bind group processing#8519
Merged
cwfitzgerald merged 1 commit intogfx-rs:trunkfrom Nov 14, 2025
Merged
Fix performance regression in bind group processing#8519cwfitzgerald merged 1 commit intogfx-rs:trunkfrom
cwfitzgerald merged 1 commit intogfx-rs:trunkfrom
Conversation
Contributor
Author
|
In case it is useful, here is the diff from before the original change, to the version in this PR: b3d9431...andyleiserson:wgpu:binding-perf-alt |
Member
|
Ran the benchmarks with fairly long run time to try to get better information. This is the current benchmark report from v27 to the tip of this PR, with multi-threaded tests removed. Seems to be fairly minor, worst being ~12% in renderpass submit, which is acceptable. Will review this after work. Results |
Member
|
From the numbers, I would call both fixed (adjusted OP) |
andyleiserson
added a commit
that referenced
this pull request
Nov 15, 2025
andyleiserson
added a commit
to andyleiserson/wgpu
that referenced
this pull request
Nov 24, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In the change to defer some bind group processing until draw/dispatch (#8418), I was not careful enough to avoid extra work on this hot path.
This makes two fixes:
Vec<Option<Arc<T>>>is expensive even if we aren't immediately freeing the memory). This fix is largely a revert.Fixes #8499.
Fixes #8500.
Testing
Using
wgpu-benchmark. This recovers most of the lost performance. There is still a significant (10-50%) drop in the submit time benchmarks, but that figure is a bit misleading because the submission can't happen separately from encoding. There's also still a drop of ~10% in the compute pass encode benchmark, which is more than I would like, but I haven't been able to identify any specific changes to mitigate it. The performance drop seems to be associated with moving the init tracking fromset_bind_grouptodispatch, even though the amount of work is not changing (there is oneset_bind_groupperdispatchin this case). Unfortunately, deferring recording of init actions until we're certain we're actually using the surfaces in the dispatch is important. (Although we now have a check at submit time that the bind groups are still valid, this doesn't handle the case where the resources are alive at submit and then destroyed while the submission is in flight --destroy()only checks the tracker for presence of the resource, not the bind group.)The computepass bindless benchmark isn't supported on my test system, it is probably worth finding somewhere we can verify that one as well.
Squash or Rebase? Squash
Checklist
cargo fmt.taplo format.cargo clippy --tests. If applicable, add:--target wasm32-unknown-unknowncargo xtask testto run tests.CHANGELOG.mdentry.