-
Notifications
You must be signed in to change notification settings - Fork 6
Optimizations > Cached Descriptor Sets, Implied Multiple Frames in Flight, Fencing for faster perf 2X #7
Optimizations > Cached Descriptor Sets, Implied Multiple Frames in Flight, Fencing for faster perf 2X #7
Conversation
…p-chain images available.
… performance (almost doubled)
…nGL implementation
…ain images. Implementation works with or without semaphores
…formance, also addressing weird bug. swap chain image count tied to frames in flight.
Doing additional tests after enabling validations. Caught a few bugs and working on them, however it's clear that a lot of the performance issues come down to synchronization and having multiple frames in flight/processing. Rendering works with no issues, however I feel the validation warnings need to be addressed. |
Happy to say all validation issues and bugs have been addressed. Performance is still maintained at 0.80 to 0.78ms locally. Feel free to provide feedback and thanks for laying the foundations :) |
…uffer[fb->current_buffer] in example-vk
Absolutely crazy amount of effort! Wow! il check it latter today/tomorrow |
Sorry for delay, I said "will check" I still did not, will do during this week probably next days. |
on Linux I can not build your repo after cloning, there few mistake in code can you change your |
About performance:Nvidia - absolutely no change in performance, your updated repo - it is exact same to this my original without your changes, I opened multiple window, make them fullscreen, pressed Space to test - everything is exact same. AMD - huge improvement on this repository-examples. from ~300fps original to 1200fps your improved. Now il check "multiple frames in flight"... |
If I understand correctly - you integrated "multiple frames in flight" to I need to remake my external examples with integration, il do it latter. Il wait for you reply. Confirm if you do this fixes or I will after accepting your pull request:
I think it may work incorrectly because expected And your value result of I think this need to be fixed, by replacing I have not tested it on Windows, I assume it works. I also saw bug with resize - crash on resize very common, on AMD GPU only, but my minimal "vulkan shader app" that is source of "no_glfw" example does not crash here, so this need investigation...
or
But same bug happening in this my original version, not your bug. |
Thanks for the feedback. I'll be sure to look into these issues in the upcoming days. Unfortunately I only have access to an AMD GPU (Windows) and my M1 Macbook Pro. So I may not be able to test on Nvidia anytime soon. Thanks again. |
I mean - its fine, il test Linux and Nvidia |
I see your updates, will check tomorrow. |
@SubiyaCryolite you missed one Linux fix: change I also checked dynamic UI - seems work, so only crash with resize - il fix it after, its this project bug. I will accept this pull request after you fix this 2 lines I mention above. |
Just addressed your last comment, thanks for catching those issues. |
Thank you again for all this changes, huge improvement and huge effort from you. Much appreciated! For future real-time communication if you need - you can join my discord, link in project description. |
Hi Everyone.
This PR is a followup to a comment I left a few months ago, visible here memononen#614 (comment)
I made some optimizations that have doubled the frame-rate of the
example_vulkan
demo on my machine. These changes do not break compatibility with the older implementations either, specifically withexample_vulkan_min_no_glfw
.The changes I made are listed below:
vkResetDescriptorPool
per frame.example_vulkan
, usingvkWaitForFences
to control rendering as opposed tovkQueueWaitIdle
(seems to be the biggest perf booster). This change also has implied "multiple frames in flight" as dictated by the swap-chain image count.VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT
instead ofVK_MEMORY_PROPERTY_HOST_VISIBLE_BIT
, allows to skip calls tovkMap/UnmapMemory
per frameThe best way to observe the differences in performance is to run
example_vulkan
andexample_vulkan_min_no_glfw
separately. On my system ( RX 6800, Ryzen R7 7700 ) theexample_vulkan
demo has an average frame time of 0.80 ms (1250 fps) as opposed to theexample_vulkan_min_no_glfw
demo with 1.98 ms (505 fps). For referenceexample-gl3
runs at 0.55 ms (1818 fps), but this is a big improvement asexample_vulkan
is now just 31% slower as opposed to 72% slower forexample_vulkan_min_no_glfw
.The main difference in terms of setup is:
cmd_buffer
as an arrayswapchainImageCount
and* currentBuffer
within theVKNVGCreateInfo
init.Open for comments and feedback, particular around tests on Nvidia, Intel and Apple Silicon hardware.