-
Notifications
You must be signed in to change notification settings - Fork 373
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix CpuWriteGpuReadBuffer producing unaligned gpu buffer offsets (#1716)
Turns out our eagerness to acquire aligned pointers for fast copy operation backfired and got us into an impossible situation: By offsetting staging buffers to ensure cpu pointer alignment, we sometimes choose offsets that aren't allowed for copy operations. E.g. we get back a buffer that has a pointer alignment of 2 (that happens -.-) we therefore offset the pointer by 14 (our min alignment is 16!). We now can copy data into the buffer quickly and safely. But when scheduling e.g. `copy_buffer_to_texture` we get a wgpu crash! Wgpu requires the offset (we put 14) to be: * a multiple of wgpu::COPY_BUFFER_ALIGNMENT * a multiple of the texel block size Neither of which is true now! You might be asking why wgpu gives such oddly aligned buffers out to begin with, and the answer is sadly that the WebGL impl has issues + that the spec doesn't guarantee anything, so this is strictly speaking valid (although most other backends will give out 16 byte aligned pointers). See gfx-rs/wgpu#3508 Long story short, I changed (and simplified) the way we go about alignment on `CpuWriteGpuReadBelt`. The CPU pointer no longer has *any* alignment guarantees and offsets fullfill now the above guarantees. This is _ok_ since we already wrapped all accesses to the cpu pointer and can do byte writes to them. The huge drawback of this is ofc that `copy_from_slice` now has to do the heavy lifting of checking for alignment and then doing the right instructions for everything that is worth while doing so (that is, the things `memcpy` does when it deals with raw byte pointers) Testing: Confirmed fix with crashing repro on the Web, then ran `just py-run-all` for native, renderer samples local and on web. Have not checked if this has any practical perf impact. Luckily our interface makes this very much a "optimize later" problem (copy operations within `CpuWriteGpuReadBuffer` can be made more clever in the future if need to be; unlikely necessary to be fair though)
- Loading branch information
Showing
1 changed file
with
54 additions
and
82 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
f0285c7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rust Benchmark
datastore/insert/batch/rects/insert
605813
ns/iter (± 2916
)607999
ns/iter (± 3501
)1.00
datastore/latest_at/batch/rects/query
1853
ns/iter (± 9
)1847
ns/iter (± 7
)1.00
datastore/latest_at/missing_components/primary
287
ns/iter (± 1
)289
ns/iter (± 5
)0.99
datastore/latest_at/missing_components/secondaries
438
ns/iter (± 2
)444
ns/iter (± 2
)0.99
datastore/range/batch/rects/query
151846
ns/iter (± 1511
)152928
ns/iter (± 776
)0.99
mono_points_arrow/generate_message_bundles
46513917
ns/iter (± 622483
)43674525
ns/iter (± 2094613
)1.07
mono_points_arrow/generate_messages
123500647
ns/iter (± 1110759
)122673947
ns/iter (± 1466966
)1.01
mono_points_arrow/encode_log_msg
156246217
ns/iter (± 984562
)153248336
ns/iter (± 1169571
)1.02
mono_points_arrow/encode_total
327457744
ns/iter (± 1935988
)322940902
ns/iter (± 2003700
)1.01
mono_points_arrow/decode_log_msg
179138730
ns/iter (± 1094507
)176465542
ns/iter (± 1591891
)1.02
mono_points_arrow/decode_message_bundles
54574594
ns/iter (± 644866
)53000928
ns/iter (± 931076
)1.03
mono_points_arrow/decode_total
231466995
ns/iter (± 1461577
)227298270
ns/iter (± 1971502
)1.02
batch_points_arrow/generate_message_bundles
286488
ns/iter (± 2428
)285537
ns/iter (± 2423
)1.00
batch_points_arrow/generate_messages
6111
ns/iter (± 43
)6018
ns/iter (± 80
)1.02
batch_points_arrow/encode_log_msg
385179
ns/iter (± 2164
)376585
ns/iter (± 2817
)1.02
batch_points_arrow/encode_total
694213
ns/iter (± 4347
)690810
ns/iter (± 3373
)1.00
batch_points_arrow/decode_log_msg
353613
ns/iter (± 1783
)350176
ns/iter (± 1466
)1.01
batch_points_arrow/decode_message_bundles
1567
ns/iter (± 9
)1613
ns/iter (± 33
)0.97
batch_points_arrow/decode_total
357799
ns/iter (± 1669
)358649
ns/iter (± 2838
)1.00
arrow_mono_points/insert
6204320834
ns/iter (± 17905183
)6120830809
ns/iter (± 19096063
)1.01
arrow_mono_points/query
1724079
ns/iter (± 17922
)1721371
ns/iter (± 18940
)1.00
arrow_batch_points/insert
3033644
ns/iter (± 15959
)3013481
ns/iter (± 30029
)1.01
arrow_batch_points/query
15406
ns/iter (± 99
)15257
ns/iter (± 209
)1.01
arrow_batch_vecs/insert
42562
ns/iter (± 216
)43113
ns/iter (± 440
)0.99
arrow_batch_vecs/query
479101
ns/iter (± 5927
)469903
ns/iter (± 5425
)1.02
tuid/Tuid::random
34
ns/iter (± 0
)34
ns/iter (± 0
)1
This comment was automatically generated by workflow using github-action-benchmark.