Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CpuWriteGpuReadBelt producing unaligned gpu buffer offsets #1716

Merged

Conversation

Wumpf
Copy link
Member

@Wumpf Wumpf commented Mar 27, 2023

Turns out our eagerness to acquire aligned pointers for fast copy operation backfired and got us into an impossible situation:
By offsetting staging buffers to ensure cpu pointer alignment, we sometimes choose offsets that aren't allowed for copy operations. E.g. we get back a buffer that has a pointer alignment of 2 (that happens -.-) we therefore offset the pointer by 14 (our min alignment is 16!). We now can copy data into the buffer quickly and safely. But when scheduling e.g. copy_buffer_to_texture we get a wgpu crash! Wgpu requires the offset (we put 14) to be:

  • a multiple of wgpu::COPY_BUFFER_ALIGNMENT==4
  • a multiple of the texel block size

Neither of which is true now! You might be asking why wgpu gives such oddly aligned buffers out to begin with, and the answer is sadly that the WebGL impl has issues + that the spec doesn't guarantee anything, so this is strictly speaking valid (although most other backends will give out 16 byte aligned pointers). See gfx-rs/wgpu#3508

Long story short, I changed (and simplified) the way we go about alignment on CpuWriteGpuReadBelt. The CPU pointer no longer has any alignment guarantees and offsets fullfill now the above guarantees. This is ok since we already wrapped all accesses to the cpu pointer and can do byte writes to them. The huge drawback of this is ofc that copy_from_slice now has to do the heavy lifting of checking for alignment and then doing the right instructions for everything that is worth while doing so (that is, the things memcpy does when it deals with raw byte pointers)

Testing:
Confirmed fix with crashing repro on the Web, then ran just py-run-all for native, renderer samples local and on web. Have not checked if this has any practical perf impact. Luckily our interface makes this very much a "optimize later" problem (copy operations within CpuWriteGpuReadBuffer can be made more clever in the future if need to be; unlikely necessary to be fair though)

Checklist

Turns out our eagerness to acquire aligned pointers for fast copy operation backfired and got us into an impossible situation: By offsetting staging buffers to ensure cpu pointer alignment, we sometimes choose offsets that aren't allowed for copy operations. E.g. we get back a buffer that has a pointer alignment of 2 (that happens -.-) we therefore offset the pointer by 14 (our min alignment is 16!). We now can copy data into the buffer quickly and safely. But when scheduling e.g. `copy_buffer_to_texture` we get a wgpu crash! Wgpu requires the offset (we put 14) to be:
* a multiple of wgpu::COPY_BUFFER_ALIGNMENT
* a multiple of the texel block size
Neither of which is true now! You might be asking why wgpu gives such oddly aligned buffers out to begin with, and the answer is sadly that the WebGL impl has issues + that the spec doesn't guarantee anything, so this is strictly speaking valid (although most other backends will give out 16 byte aligned pointers). See gfx-rs/wgpu#3508

Long story short, I changed (and simplified) the way we go about alignment on `CpuWriteGpuReadBelt`. The CPU pointer no longer has *any* alignment guarantees and offsets fullfill now the above guarantees. This is _ok_ since we already wrapped all accesses to the cpu pointer and can do byte writes to them. The huge drawback of this is ofc that `copy_from_slice` now has to do the heavy lifting of checking for alignment and then doing the right instructions for everything that is worth while doing so (that is, the things `memcpy` does when it deals with raw byte pointers)

Testing: Confirmed fix with crashing repro on the Web, then ran `just py-run-all` for native, renderer samples local and on web.
Have not checked if this has any practical perf impact. Luckily our interface makes this very much a "optimize later" problem (copy operations within `CpuWriteGpuReadBuffer` can be made more clever in the future if need to be; unlikely necessary to be fair though)
@Wumpf Wumpf added 🪳 bug Something isn't working 🔺 re_renderer affects re_renderer itself labels Mar 27, 2023
@Wumpf Wumpf changed the title Fix CpuWriteGpuReadBuffer producing unaligned gpu buffer offsets Fix CpuWriteGpuReadBelt producing unaligned gpu buffer offsets Mar 27, 2023
Copy link
Member

@emilk emilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ouch - good job!

@emilk emilk merged commit f0285c7 into main Mar 27, 2023
@emilk emilk deleted the andreas/re_renderer/fix-unaligned-offsets-on-cpuwritegpureadbelt branch March 27, 2023 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🪳 bug Something isn't working 🔺 re_renderer affects re_renderer itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants