Optimizations > Cached Descriptor Sets, Implied Multiple Frames in Flight, Fencing for faster perf 2X #7

SubiyaCryolite · 2023-06-12T10:01:53Z

Hi Everyone.

This PR is a followup to a comment I left a few months ago, visible here memononen#614 (comment)

I made some optimizations that have doubled the frame-rate of the example_vulkan demo on my machine. These changes do not break compatibility with the older implementations either, specifically with example_vulkan_min_no_glfw.

The changes I made are listed below:

Support for multiple command buffers, specifically one per swap-chain image / buffer
Caching of descriptor sets, removing the need to create new ones per draw func or to call vkResetDescriptorPool per frame.
Under example_vulkan, using vkWaitForFences to control rendering as opposed to vkQueueWaitIdle (seems to be the biggest perf booster). This change also has implied "multiple frames in flight" as dictated by the swap-chain image count.
Using persisted mapped buffers via VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT instead of VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT , allows to skip calls to vkMap/UnmapMemory per frame

The best way to observe the differences in performance is to run example_vulkan and example_vulkan_min_no_glfw separately. On my system ( RX 6800, Ryzen R7 7700 ) the example_vulkan demo has an average frame time of 0.80 ms (1250 fps) as opposed to the example_vulkan_min_no_glfw demo with 1.98 ms (505 fps). For reference example-gl3 runs at 0.55 ms (1818 fps), but this is a big improvement as example_vulkan is now just 31% slower as opposed to 72% slower for example_vulkan_min_no_glfw.

The main difference in terms of setup is:

Treating cmd_buffer as an array
Setting swapchainImageCount and * currentBuffer within the VKNVGCreateInfo init.

Open for comments and feedback, particular around tests on Nvidia, Intel and Apple Silicon hardware.

…p-chain images available.

… performance (almost doubled)

…r pool.

…nGL implementation

…ain images. Implementation works with or without semaphores

…formance, also addressing weird bug. swap chain image count tied to frames in flight.

SubiyaCryolite · 2023-06-12T13:53:59Z

Doing additional tests after enabling validations. Caught a few bugs and working on them, however it's clear that a lot of the performance issues come down to synchronization and having multiple frames in flight/processing. Rendering works with no issues, however I feel the validation warnings need to be addressed.

…er sdks

SubiyaCryolite · 2023-06-12T18:22:27Z

Happy to say all validation issues and bugs have been addressed. Performance is still maintained at 0.80 to 0.78ms locally. Feel free to provide feedback and thanks for laying the foundations :)

…uffer[fb->current_buffer] in example-vk

danilw · 2023-06-13T11:10:52Z

Absolutely crazy amount of effort! Wow!
Thank you for doing it!

il check it latter today/tomorrow

…`prepareFrame` and `submitFrame`

danilw · 2023-06-19T09:42:21Z

Sorry for delay, I said "will check" I still did not, will do during this week probably next days.

danilw · 2023-06-23T11:49:05Z

These changes do not break compatibility with the older implementations either, specifically with example_vulkan_min_no_glfw.

on Linux I can not build your repo after cloning, there few mistake in code example/example_vulkan_min_no_glfw.c, I think you change and not tested Linux version

can you change your example/example_vulkan_min_no_glfw.c to this fixed, in attachment
there 3 changes at 450-480 lines arroung VK_USE_PLATFORM_XCB_KHR
example_vulkan_min_no_glfw.zip

danilw · 2023-06-23T12:41:51Z

About performance:

Nvidia - absolutely no change in performance, your updated repo - it is exact same to this my original without your changes, I opened multiple window, make them fullscreen, pressed Space to test - everything is exact same.
And Nvidia performance with this is "same bad" as before about 50% slower than OpenGL. Nvidia drop below 60FSP on fullscreen with Space-clicked.
But maybe this is because I use Wayland with Nvidia being second GPU and Nvidia performance just "downgraded" because Nvidia driver being bad...

AMD - huge improvement on this repository-examples. from ~300fps original to 1200fps your improved.
it is even 2x more fps than "multiple frames in flight" example.

Now il check "multiple frames in flight"...

danilw · 2023-06-23T15:44:36Z

If I understand correctly - you integrated "multiple frames in flight" to nanovg_vk.h but you did it more correctly than I did in my example, you did it with cache and and memory optimizations. Looks good.
You also keep compatibility with "minimal example that use single frame".

I need to remake my external examples with integration, il do it latter.

Il wait for you reply.

Confirm if you do this fixes or I will after accepting your pull request:

Linux fix in Optimizations > Cached Descriptor Sets, Implied Multiple Frames in Flight, Fencing for faster perf 2X #7 (comment)
I also saw g++ complaining about this:
https://github.com/SubiyaCryolite/nanovg_vulkan/blob/feature/use-persistent-mapped-memory/src/nanovg_vk.h#L1504 arguments of vknvg_UpdateBuffer :
VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT
when:
static void vknvg_UpdateBuffer(... VkMemoryPropertyFlagBits memory_type ...)

I think it may work incorrectly because expected VkMemoryPropertyFlagBits and VkMemoryPropertyFlagBits is typedef enum VkMemoryPropertyFlagBits
https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VkMemoryPropertyFlagBits.html

And your value result of VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT may not go correctly.

I think this need to be fixed, by replacing VkMemoryPropertyFlagBits memory_type with uint32_t type or VkFlags type in chain down to vknvg_memory_type_from_properties.

I have not tested it on Windows, I assume it works.
I tested with validation layers on Nvidia and AMD - no errors. (there is 1 error when you close example_vulkan something is not cleaned - but it is same as in my original, I just keep it as it is this is not important)

I also saw bug with resize - crash on resize very common, on AMD GPU only, but my minimal "vulkan shader app" that is source of "no_glfw" example does not crash here, so this need investigation...
There no validation errors on crash it just segfault, crash message is this:

example-vk: /home/danil/2021_vulkan_projects/not_clean/nanovg_vulkan/example/example_vulkan.c:66: prepareFrame: Assertion `res == VK_SUCCESS' failed.

or

example-vk: /home/danil/2021_vulkan_projects/not_clean/nanovg_vulkan/example/example_vulkan.c:171: submitFrame: Assertion `res == VK_SUCCESS' failed.

But same bug happening in this my original version, not your bug.

SubiyaCryolite · 2023-06-24T17:51:22Z

If I understand correctly - you integrated "multiple frames in flight" to nanovg_vk.h but you did it more correctly than I did in my example, you did it with cache and and memory optimizations. Looks good. You also keep compatibility with "minimal example that use single frame".

I need to remake my external examples with integration, il do it latter.

Il wait for you reply.

Confirm if you do this fixes or I will after accepting your pull request:
1. Linux fix in [Optimizations > Cached Descriptor Sets, Implied Multiple Frames in Flight, Fencing for faster perf 2X #7 (comment)](https://github.com/danilw/nanovg_vulkan/pull/7#issuecomment-1604170006)

2. I also saw g++ complaining about this:
   https://github.com/SubiyaCryolite/nanovg_vulkan/blob/feature/use-persistent-mapped-memory/src/nanovg_vk.h#L1504 arguments of `vknvg_UpdateBuffer` :
   `VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT` 
   when:
   `static void vknvg_UpdateBuffer(...` **`VkMemoryPropertyFlagBits memory_type`** `...)`
I think it may work incorrectly because expected VkMemoryPropertyFlagBits and VkMemoryPropertyFlagBits is typedef enum VkMemoryPropertyFlagBits https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VkMemoryPropertyFlagBits.html

And your value result of VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT may not go correctly.

I think this need to be fixed, by replacing VkMemoryPropertyFlagBits memory_type with uint32_t type or VkFlags type in chain down to vknvg_memory_type_from_properties.

I have not tested it on Windows, I assume it works. I tested with validation layers on Nvidia and AMD - no errors. (there is 1 error when you close example_vulkan something is not cleaned - but it is same as in my original, I just keep it as it is this is not important)

I also saw bug with resize - crash on resize very common, on AMD GPU only, but my minimal "vulkan shader app" that is source of "no_glfw" example does not crash here, so this need investigation... There no validation errors on crash it just segfault, crash message is this:
example-vk: /home/danil/2021_vulkan_projects/not_clean/nanovg_vulkan/example/example_vulkan.c:66: prepareFrame: Assertion `res == VK_SUCCESS' failed.
or
example-vk: /home/danil/2021_vulkan_projects/not_clean/nanovg_vulkan/example/example_vulkan.c:171: submitFrame: Assertion `res == VK_SUCCESS' failed.
But same bug happening in this my original version, not your bug.

Thanks for the feedback. I'll be sure to look into these issues in the upcoming days.

Unfortunately I only have access to an AMD GPU (Windows) and my M1 Macbook Pro. So I may not be able to test on Nvidia anytime soon.

Thanks again.

danilw · 2023-06-24T18:36:53Z

Unfortunately I only have access to an AMD GPU (Windows) and my M1 Macbook Pro. So I may not be able to test on Nvidia anytime soon.

I mean - its fine, il test Linux and Nvidia

danilw · 2023-06-25T22:45:08Z

I see your updates, will check tomorrow.

danilw · 2023-06-28T09:24:41Z

@SubiyaCryolite you missed one Linux fix:
https://github.com/SubiyaCryolite/nanovg_vulkan/blob/feature/use-persistent-mapped-memory/example/example_vulkan_min_no_glfw.c#L486
https://github.com/SubiyaCryolite/nanovg_vulkan/blob/feature/use-persistent-mapped-memory/example/example_vulkan_min_no_glfw.c#L506

change fb->current_frame to fb.current_frame

I also checked dynamic UI - seems work, so only crash with resize - il fix it after, its this project bug.

I will accept this pull request after you fix this 2 lines I mention above.

SubiyaCryolite · 2023-06-28T09:52:24Z

Just addressed your last comment, thanks for catching those issues.

danilw · 2023-06-29T09:58:30Z

Thank you again for all this changes, huge improvement and huge effort from you. Much appreciated!

For future real-time communication if you need - you can join my discord, link in project description.

SubiyaCryolite added 12 commits June 10, 2023 00:20

Added support for multiple command buffers based on the number of swa…

98a6b43

…p-chain images available.

Ensure both examples bootstrap with no issues

a6adacd

Updated nanovg_vk.h to use currentFrame over currentBuffer

e8ea96b

Updated nanovg_vk.h to use currentFrame over currentBuffer

288ccdd

Converted all buffers to support multiple frames in flight

ba6d47d

Completed support for multiple frames in flight, significantly better…

8c5ace8

… performance (almost doubled)

Added support for cached descriptor sets. No need for reset descripto…

3363308

…r pool.

Opted to use persistent mapped buffers for rendering. Perf 70% of Ope…

d1fb16f

…nGL implementation

Minor optimizations with stroke offsets

1e243ff

Using implied multiple frames in flight based on the amount of swapch…

8d3d6ac

…ain images. Implementation works with or without semaphores

Revert example of not using vulkan to display changes in terms of per…

004d49b

…formance, also addressing weird bug. swap chain image count tied to frames in flight.

Got rid of vkResetFence, addressing timeout bug

ac0251f

SubiyaCryolite added 4 commits June 12, 2023 17:16

Optimizing fence behavior

2cd7730

Adding support for mac builds. May opt to use string contacts for old…

08904f8

…er sdks

Addressing validation errors in vknvg_fill function

682d3ba

Addressed all validation issues. Restored improved FPS.

e59e8ea

Addressing memory leak on terminate. Addressing dangling use of cmd_b…

bd4e686

…uffer[fb->current_buffer] in example-vk

SubiyaCryolite added 4 commits June 13, 2023 13:59

Fixed error on resize

e5d827d

Updated destroyFramebuffers function. Streamlining implementation of …

6c4eab0

…`prepareFrame` and `submitFrame`

Fixed render regression in example_vulkan_min_no_glfw.c

2922705

Fixes validation errors on termination

b1596bb

SubiyaCryolite added 2 commits June 25, 2023 13:36

Addresses feedback in comment #7 (comment)

ae81cfd

Addresses feedback in comment #7 (comment)

b519a31

Addressing #7 (comment)

4eea0ee

danilw merged commit 10d5211 into danilw:master Jun 29, 2023

SubiyaCryolite deleted the feature/use-persistent-mapped-memory branch June 29, 2023 14:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizations > Cached Descriptor Sets, Implied Multiple Frames in Flight, Fencing for faster perf 2X #7

Optimizations > Cached Descriptor Sets, Implied Multiple Frames in Flight, Fencing for faster perf 2X #7

SubiyaCryolite commented Jun 12, 2023 •

edited

Loading

SubiyaCryolite commented Jun 12, 2023

SubiyaCryolite commented Jun 12, 2023

danilw commented Jun 13, 2023

danilw commented Jun 19, 2023

danilw commented Jun 23, 2023

danilw commented Jun 23, 2023 •

edited

Loading

danilw commented Jun 23, 2023

SubiyaCryolite commented Jun 24, 2023

danilw commented Jun 24, 2023

danilw commented Jun 25, 2023

danilw commented Jun 28, 2023

SubiyaCryolite commented Jun 28, 2023

danilw commented Jun 29, 2023

Optimizations > Cached Descriptor Sets, Implied Multiple Frames in Flight, Fencing for faster perf 2X #7

Optimizations > Cached Descriptor Sets, Implied Multiple Frames in Flight, Fencing for faster perf 2X #7

Conversation

SubiyaCryolite commented Jun 12, 2023 • edited Loading

SubiyaCryolite commented Jun 12, 2023

SubiyaCryolite commented Jun 12, 2023

danilw commented Jun 13, 2023

danilw commented Jun 19, 2023

danilw commented Jun 23, 2023

danilw commented Jun 23, 2023 • edited Loading

About performance:

danilw commented Jun 23, 2023

SubiyaCryolite commented Jun 24, 2023

danilw commented Jun 24, 2023

danilw commented Jun 25, 2023

danilw commented Jun 28, 2023

SubiyaCryolite commented Jun 28, 2023

danilw commented Jun 29, 2023

SubiyaCryolite commented Jun 12, 2023 •

edited

Loading

danilw commented Jun 23, 2023 •

edited

Loading