`CpuWriteGpuReadBelt` for fast frame by frame memory transfers #1382

Wumpf · 2023-02-23T08:18:13Z

Rewrite of #594
Introduces a convenient mechanism to allow writing directly to gpu readable memory which gpu then can copy into its internally gpu-only memory.

Goal is faster (on less memcpy), safer (alignment guarantee) and easier (no need to create buffers manually) transfers.

An unexpected ripple effect we got was that in order to have much higher memory use on Web, we need to limit the number of frames (technically: queue submissions) in flight. This makes it a lot less memory hungry as we accumlate a lot less per-frame data! Empirically, it looks a bit like wgpu handles resource cleanup a bit differently on native than on web which might just come down to the present-mode it chooses automatically. Given that the introduced limiter & device.poll makes sense regardless I haven't investigated this further.

Only used as a proof of concept in a single uniform buffer as well as point colors.

Nevertheless, some early perf numbers, need to repeat this test when we have it it more places:
(Acquired by selecting a small timeline range range and letting it loop, then opening profiler, waiting a bit, going to profiler, pause it, select a range of frames where the app was still in focus)

Nyud

DEV:
                            before   |   after (still in dev)
--------------------------------------------------------------
PointsBuilder::colors         2.280 |  2.281
PointCloudDrawData::new()     2.170 |  1.372
Overall                      12.715 | 11.762

RELEASE:
                            before   |   after (still in dev)
--------------------------------------------------------------
PointsBuilder::colors        1.784 |  2.060
PointCloudDrawData::new()    1.179 |  0.949
Overall                      9.961 |  9.748

===============================================================

colmap with lots of history

DEV:
                            before   |   after
--------------------------------------------------------------
PointsBuilder::colors         10.160 | 10.486
PointCloudDrawData::new()      6.533 |  4.209
Overall                       46.420 | 43.941

RELEASE:
                            before   |   after
--------------------------------------------------------------
PointsBuilder::colors         13.770 |  6.766
PointCloudDrawData::new()      3.682 |  2.921
Overall                       42.257 | 34.285

Btw. there is a deeper rabbit hole we ignore here: One could leverage integrated gpus (or the cache of dedicated ones) where the gpu would use the memory directly, eliminating the need to schedule a gpu memcpy, however this is behind a wgpu feature flag anyways (MAPPABLE_PRIMARY_BUFFERS).

Checklist

I have read and agree to Contributor Guide and the Code of Conduct
I've included a screenshot or gif (if applicable)
I've added a line to CHANGELOG.md (if this is a big enough change to warrant it)

First half of #426

…g of alignment. Fix issues on render context shutdown

…Builder

…ke, add log

…taging-belt2

…t end of life of resource pool

…taging-belt2

teh-cmc

Marvellous.

crates/re_renderer/src/allocator/cpu_write_gpu_read_belt.rs

teh-cmc · 2023-02-23T09:46:51Z

crates/re_renderer/src/context.rs

+        }
+    }
+
+    fn poll_device(&mut self) {


teh-cmc · 2023-02-23T09:52:48Z

crates/re_renderer/src/point_cloud_builder.rs

            );
        }

-        if self.0.user_data.len() < self.0.user_data.len() {
+        if self.0.user_data.len() < self.0.vertices.len() {


that caused me some reeeaaal weird crash :D

* Add CpuWriteGpuReadBelt and use it for frameuniform buffer and colors as poc * Limit number of in-flight queue submissions

Wumpf added 20 commits February 20, 2023 22:55

Allow mapped_on_creation for wgpu buffer

137deba

Add CpuWriteGpuReadBelt and use it for frameuniform buffer as poc

ef93d72

Reduce amount of unsafe code in CpuWriteGpuReadBuffer. Better handlin…

a1c439e

…g of alignment. Fix issues on render context shutdown

debug fmt for CpuWriteGpuReadBelt for easier debugging

847e3af

add missing profiler scopes & call to before_submit

ceb2695

require point cloud builder for creating point cloud data

10c4a9f

Make CpuWriteGpuReadBuffer more like a Vec. POC usage in PointCloud…

0226fb2

…Builder

add note on pitfalls of write combined memory

1bf15f7

Introduce minimum alignment to CpuWriteGpuReadBelt, fix padding mista…

09a1f20

…ke, add log

Merge remote-tracking branch 'origin/main' into andreas/re_renderer/s…

628162f

…taging-belt2

cleanup, fixup alignment padding a bit

5728005

Limit number of in-flight queue submissions

7d544a7

different memory budget for web and native on staging belt

cf57460

nicer poll_device implementation

9db6f70

move per_frame_data_helper into ActiveFrame

1d30e9a

iterating on some doc and api aspects. Add resource ownership check a…

b7078c3

…t end of life of resource pool

commenting out DynamicResourcePool drop check

4e00a03

fix missing extend_defaults as well as fix a bug in it

053e100

Merge remote-tracking branch 'origin/main' into andreas/re_renderer/s…

8b37ffc

…taging-belt2

doc string link fixes

93f320c

Wumpf added the 🔺 re_renderer affects re_renderer itself label Feb 23, 2023

Wumpf added 2 commits February 23, 2023 09:19

drop a todo

eee79a2

fix writing to frame uniform buffer twice

b059255

teh-cmc self-requested a review February 23, 2023 08:44

teh-cmc approved these changes Feb 23, 2023

View reviewed changes

Wumpf force-pushed the andreas/re_renderer/staging-belt2 branch from 6dee277 to e98525c Compare February 23, 2023 10:26

comment fix & improvements

f193edb

Wumpf force-pushed the andreas/re_renderer/staging-belt2 branch from e98525c to f193edb Compare February 23, 2023 10:33

Wumpf merged commit 53b2d92 into main Feb 23, 2023

Wumpf deleted the andreas/re_renderer/staging-belt2 branch February 23, 2023 10:45

Wumpf mentioned this pull request Feb 24, 2023

Use CpuWriteGpuReadBuffer for all cpu->gpu transfers #1398

Closed

emilk pushed a commit that referenced this pull request Mar 2, 2023

CpuWriteGpuReadBelt for fast frame by frame memory transfers (#1382)

125ee72

* Add CpuWriteGpuReadBelt and use it for frameuniform buffer and colors as poc * Limit number of in-flight queue submissions

Wumpf mentioned this pull request Mar 6, 2023

wasm_bindgen-caused web crash when using more than 2GiB of RAM #1513

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`CpuWriteGpuReadBelt` for fast frame by frame memory transfers #1382

`CpuWriteGpuReadBelt` for fast frame by frame memory transfers #1382

Wumpf commented Feb 23, 2023 •

edited

Loading

teh-cmc left a comment

teh-cmc Feb 23, 2023

teh-cmc Feb 23, 2023

Wumpf Feb 23, 2023

CpuWriteGpuReadBelt for fast frame by frame memory transfers #1382

CpuWriteGpuReadBelt for fast frame by frame memory transfers #1382

Conversation

Wumpf commented Feb 23, 2023 • edited Loading

Checklist

teh-cmc left a comment

Choose a reason for hiding this comment

teh-cmc Feb 23, 2023

Choose a reason for hiding this comment

teh-cmc Feb 23, 2023

Choose a reason for hiding this comment

Wumpf Feb 23, 2023

Choose a reason for hiding this comment

`CpuWriteGpuReadBelt` for fast frame by frame memory transfers #1382

`CpuWriteGpuReadBelt` for fast frame by frame memory transfers #1382

Wumpf commented Feb 23, 2023 •

edited

Loading