Fix CpuWriteGpuReadBelt producing unaligned gpu buffer offsets #1716

Wumpf · 2023-03-27T13:29:14Z

Turns out our eagerness to acquire aligned pointers for fast copy operation backfired and got us into an impossible situation:
By offsetting staging buffers to ensure cpu pointer alignment, we sometimes choose offsets that aren't allowed for copy operations. E.g. we get back a buffer that has a pointer alignment of 2 (that happens -.-) we therefore offset the pointer by 14 (our min alignment is 16!). We now can copy data into the buffer quickly and safely. But when scheduling e.g. copy_buffer_to_texture we get a wgpu crash! Wgpu requires the offset (we put 14) to be:

a multiple of wgpu::COPY_BUFFER_ALIGNMENT==4
a multiple of the texel block size

Neither of which is true now! You might be asking why wgpu gives such oddly aligned buffers out to begin with, and the answer is sadly that the WebGL impl has issues + that the spec doesn't guarantee anything, so this is strictly speaking valid (although most other backends will give out 16 byte aligned pointers). See gfx-rs/wgpu#3508

Long story short, I changed (and simplified) the way we go about alignment on CpuWriteGpuReadBelt. The CPU pointer no longer has any alignment guarantees and offsets fullfill now the above guarantees. This is ok since we already wrapped all accesses to the cpu pointer and can do byte writes to them. The huge drawback of this is ofc that copy_from_slice now has to do the heavy lifting of checking for alignment and then doing the right instructions for everything that is worth while doing so (that is, the things memcpy does when it deals with raw byte pointers)

Testing:
Confirmed fix with crashing repro on the Web, then ran just py-run-all for native, renderer samples local and on web. Have not checked if this has any practical perf impact. Luckily our interface makes this very much a "optimize later" problem (copy operations within CpuWriteGpuReadBuffer can be made more clever in the future if need to be; unlikely necessary to be fair though)

Checklist

I have read and agree to Contributor Guide and the Code of Conduct

Turns out our eagerness to acquire aligned pointers for fast copy operation backfired and got us into an impossible situation: By offsetting staging buffers to ensure cpu pointer alignment, we sometimes choose offsets that aren't allowed for copy operations. E.g. we get back a buffer that has a pointer alignment of 2 (that happens -.-) we therefore offset the pointer by 14 (our min alignment is 16!). We now can copy data into the buffer quickly and safely. But when scheduling e.g. `copy_buffer_to_texture` we get a wgpu crash! Wgpu requires the offset (we put 14) to be: * a multiple of wgpu::COPY_BUFFER_ALIGNMENT * a multiple of the texel block size Neither of which is true now! You might be asking why wgpu gives such oddly aligned buffers out to begin with, and the answer is sadly that the WebGL impl has issues + that the spec doesn't guarantee anything, so this is strictly speaking valid (although most other backends will give out 16 byte aligned pointers). See gfx-rs/wgpu#3508 Long story short, I changed (and simplified) the way we go about alignment on `CpuWriteGpuReadBelt`. The CPU pointer no longer has *any* alignment guarantees and offsets fullfill now the above guarantees. This is _ok_ since we already wrapped all accesses to the cpu pointer and can do byte writes to them. The huge drawback of this is ofc that `copy_from_slice` now has to do the heavy lifting of checking for alignment and then doing the right instructions for everything that is worth while doing so (that is, the things `memcpy` does when it deals with raw byte pointers) Testing: Confirmed fix with crashing repro on the Web, then ran `just py-run-all` for native, renderer samples local and on web. Have not checked if this has any practical perf impact. Luckily our interface makes this very much a "optimize later" problem (copy operations within `CpuWriteGpuReadBuffer` can be made more clever in the future if need to be; unlikely necessary to be fair though)

emilk

ouch - good job!

Wumpf added 🪳 bug Something isn't working 🔺 re_renderer affects re_renderer itself labels Mar 27, 2023

Wumpf changed the title ~~Fix CpuWriteGpuReadBuffer producing unaligned gpu buffer offsets~~ Fix CpuWriteGpuReadBelt producing unaligned gpu buffer offsets Mar 27, 2023

emilk approved these changes Mar 27, 2023

View reviewed changes

emilk merged commit f0285c7 into main Mar 27, 2023

emilk deleted the andreas/re_renderer/fix-unaligned-offsets-on-cpuwritegpureadbelt branch March 27, 2023 13:50

This was referenced Mar 27, 2023

Add a script that generates a changelog from recent PRs and their labels #1718

Merged

Release 0.4.0 - Outlines, web viewer and performance improvements #1722

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CpuWriteGpuReadBelt producing unaligned gpu buffer offsets #1716

Fix CpuWriteGpuReadBelt producing unaligned gpu buffer offsets #1716

Wumpf commented Mar 27, 2023 •

edited

Loading

emilk left a comment

Fix CpuWriteGpuReadBelt producing unaligned gpu buffer offsets #1716

Fix CpuWriteGpuReadBelt producing unaligned gpu buffer offsets #1716

Conversation

Wumpf commented Mar 27, 2023 • edited Loading

Checklist

emilk left a comment

Choose a reason for hiding this comment

Wumpf commented Mar 27, 2023 •

edited

Loading