Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CpuWriteGpuReadBelt for fast frame by frame memory transfers #1382

Merged
merged 23 commits into from
Feb 23, 2023

Conversation

Wumpf
Copy link
Member

@Wumpf Wumpf commented Feb 23, 2023

Rewrite of #594
Introduces a convenient mechanism to allow writing directly to gpu readable memory which gpu then can copy into its internally gpu-only memory.

Goal is faster (on less memcpy), safer (alignment guarantee) and easier (no need to create buffers manually) transfers.

An unexpected ripple effect we got was that in order to have much higher memory use on Web, we need to limit the number of frames (technically: queue submissions) in flight. This makes it a lot less memory hungry as we accumlate a lot less per-frame data! Empirically, it looks a bit like wgpu handles resource cleanup a bit differently on native than on web which might just come down to the present-mode it chooses automatically. Given that the introduced limiter & device.poll makes sense regardless I haven't investigated this further.

Only used as a proof of concept in a single uniform buffer as well as point colors.

Nevertheless, some early perf numbers, need to repeat this test when we have it it more places:
(Acquired by selecting a small timeline range range and letting it loop, then opening profiler, waiting a bit, going to profiler, pause it, select a range of frames where the app was still in focus)

Nyud

DEV:
                            before   |   after (still in dev)
--------------------------------------------------------------
PointsBuilder::colors         2.280 |  2.281
PointCloudDrawData::new()     2.170 |  1.372
Overall                      12.715 | 11.762

RELEASE:
                            before   |   after (still in dev)
--------------------------------------------------------------
PointsBuilder::colors        1.784 |  2.060
PointCloudDrawData::new()    1.179 |  0.949
Overall                      9.961 |  9.748

===============================================================

colmap with lots of history

DEV:
                            before   |   after
--------------------------------------------------------------
PointsBuilder::colors         10.160 | 10.486
PointCloudDrawData::new()      6.533 |  4.209
Overall                       46.420 | 43.941

RELEASE:
                            before   |   after
--------------------------------------------------------------
PointsBuilder::colors         13.770 |  6.766
PointCloudDrawData::new()      3.682 |  2.921
Overall                       42.257 | 34.285

Btw. there is a deeper rabbit hole we ignore here: One could leverage integrated gpus (or the cache of dedicated ones) where the gpu would use the memory directly, eliminating the need to schedule a gpu memcpy, however this is behind a wgpu feature flag anyways (MAPPABLE_PRIMARY_BUFFERS).

Checklist

  • I have read and agree to Contributor Guide and the Code of Conduct
  • I've included a screenshot or gif (if applicable)
  • I've added a line to CHANGELOG.md (if this is a big enough change to warrant it)

First half of #426

@Wumpf Wumpf added the 🔺 re_renderer affects re_renderer itself label Feb 23, 2023
@teh-cmc teh-cmc self-requested a review February 23, 2023 08:44
Copy link
Member

@teh-cmc teh-cmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marvellous.

}
}

fn poll_device(&mut self) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neat

);
}

if self.0.user_data.len() < self.0.user_data.len() {
if self.0.user_data.len() < self.0.vertices.len() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice save :trollface:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that caused me some reeeaaal weird crash :D

@Wumpf Wumpf force-pushed the andreas/re_renderer/staging-belt2 branch from 6dee277 to e98525c Compare February 23, 2023 10:26
@Wumpf Wumpf force-pushed the andreas/re_renderer/staging-belt2 branch from e98525c to f193edb Compare February 23, 2023 10:33
@Wumpf Wumpf merged commit 53b2d92 into main Feb 23, 2023
@Wumpf Wumpf deleted the andreas/re_renderer/staging-belt2 branch February 23, 2023 10:45
emilk pushed a commit that referenced this pull request Mar 2, 2023
* Add CpuWriteGpuReadBelt and use it for frameuniform buffer and colors as poc
* Limit number of in-flight queue submissions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔺 re_renderer affects re_renderer itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants