Fix CpuWriteGpuReadBelt producing unaligned gpu buffer offsets #1716
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Turns out our eagerness to acquire aligned pointers for fast copy operation backfired and got us into an impossible situation:
By offsetting staging buffers to ensure cpu pointer alignment, we sometimes choose offsets that aren't allowed for copy operations. E.g. we get back a buffer that has a pointer alignment of 2 (that happens -.-) we therefore offset the pointer by 14 (our min alignment is 16!). We now can copy data into the buffer quickly and safely. But when scheduling e.g.
copy_buffer_to_texture
we get a wgpu crash! Wgpu requires the offset (we put 14) to be:wgpu::COPY_BUFFER_ALIGNMENT
==4Neither of which is true now! You might be asking why wgpu gives such oddly aligned buffers out to begin with, and the answer is sadly that the WebGL impl has issues + that the spec doesn't guarantee anything, so this is strictly speaking valid (although most other backends will give out 16 byte aligned pointers). See gfx-rs/wgpu#3508
Long story short, I changed (and simplified) the way we go about alignment on
CpuWriteGpuReadBelt
. The CPU pointer no longer has any alignment guarantees and offsets fullfill now the above guarantees. This is ok since we already wrapped all accesses to the cpu pointer and can do byte writes to them. The huge drawback of this is ofc thatcopy_from_slice
now has to do the heavy lifting of checking for alignment and then doing the right instructions for everything that is worth while doing so (that is, the thingsmemcpy
does when it deals with raw byte pointers)Testing:
Confirmed fix with crashing repro on the Web, then ran
just py-run-all
for native, renderer samples local and on web. Have not checked if this has any practical perf impact. Luckily our interface makes this very much a "optimize later" problem (copy operations withinCpuWriteGpuReadBuffer
can be made more clever in the future if need to be; unlikely necessary to be fair though)Checklist