-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CpuWriteGpuReadBelt
for fast frame by frame memory transfers
#1382
Conversation
…g of alignment. Fix issues on render context shutdown
…t end of life of resource pool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marvellous.
} | ||
} | ||
|
||
fn poll_device(&mut self) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
neat
); | ||
} | ||
|
||
if self.0.user_data.len() < self.0.user_data.len() { | ||
if self.0.user_data.len() < self.0.vertices.len() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice save
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that caused me some reeeaaal weird crash :D
6dee277
to
e98525c
Compare
e98525c
to
f193edb
Compare
* Add CpuWriteGpuReadBelt and use it for frameuniform buffer and colors as poc * Limit number of in-flight queue submissions
Rewrite of #594
Introduces a convenient mechanism to allow writing directly to gpu readable memory which gpu then can copy into its internally gpu-only memory.
Goal is faster (on less memcpy), safer (alignment guarantee) and easier (no need to create buffers manually) transfers.
An unexpected ripple effect we got was that in order to have much higher memory use on Web, we need to limit the number of frames (technically: queue submissions) in flight. This makes it a lot less memory hungry as we accumlate a lot less per-frame data! Empirically, it looks a bit like wgpu handles resource cleanup a bit differently on native than on web which might just come down to the present-mode it chooses automatically. Given that the introduced limiter &
device.poll
makes sense regardless I haven't investigated this further.Only used as a proof of concept in a single uniform buffer as well as point colors.
Nevertheless, some early perf numbers, need to repeat this test when we have it it more places:
(Acquired by selecting a small timeline range range and letting it loop, then opening profiler, waiting a bit, going to profiler, pause it, select a range of frames where the app was still in focus)
Btw. there is a deeper rabbit hole we ignore here: One could leverage integrated gpus (or the cache of dedicated ones) where the gpu would use the memory directly, eliminating the need to schedule a gpu memcpy, however this is behind a wgpu feature flag anyways (
MAPPABLE_PRIMARY_BUFFERS
).Checklist
CHANGELOG.md
(if this is a big enough change to warrant it)First half of #426