-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Low FPS issue #188
Comments
Hi, @okkindel ! Generally speaking OpenGothic!=Gothic, when it comes to appearance: We have:
What is your hardware then? My HW is GTX960M + core-i7, resulting in ~30 in debug and 60 in release builds.
Can you measure/test what is slows down game on your setup (my is probably not very representative)? Is it CPU or GPU performance for you? |
I will try to measure it. another thing was that i only played debug builds. I will check one of release versions. But maybe it would be possible, for example, to limit the number of simulated npc to some radius around the player? That would improve the collision issues, likewise, maybe you could only render the elements visible in the camera? Probably quite a lot of room for optimization can be found here. This would reduce resource consumption and had no negative impact on gameplay. |
Oh I wish to, but how? :)
I wish to save this method as the last resort, to use it on mobile platforms (or if all else fails) |
* use skip-move strategy to save CPU performance on idle npc's #188
Situation update on top-level:
|
I also had a quick look into this issue. Quick maybe stupid question... |
This what I was talking about before, when BulletCollision was mention. In this function there rare cases like swim/fly/jump and a common one - straight movement. Straight move will come down to There is already caching-like optimisation inside: physic model is discrete with step of 2 cm.
Animations - sure, draws - no. |
"full life simulation - in vanilla npc do exist only in 20 proximity of player, but in OpenGothic - they always do exist and do things" That seems needlessly excessive since Gothic npc's daily routines are not that advanced. Shouldn't make any difference if you let them "think" when you are far away. I understand increasing the original radius a bit, but making all the npcs in the world think at all times seems pointless. |
@pseregiet Currently AI is good enough to work in 60fps on cur-gen hardware, so effort shifted to drawcall optimization |
I see, sounds reasonable. |
Maybe it's worth taking this NPC intelligence radius to a variable so that you can compile or run a project with more FPS? |
Minor update from runs on RTX: 00:26.01 : vkQueuePresentKHR = 8
00:26.03 : vkQueuePresentKHR = 5
00:26.03 : vkQueuePresentKHR = 8
00:26.05 : vkQueuePresentKHR = 8
00:26.07 : vkQueuePresentKHR = 3
00:26.07 : vkQueuePresentKHR = 3
00:26.07 : vkQueuePresentKHR = 10
00:26.08 : vkQueuePresentKHR = 5
00:26.10 : vkQueuePresentKHR = 9
00:26.11 : vkQueuePresentKHR = 3
00:26.11 : vkQueuePresentKHR = 3
00:26.12 : vkQueuePresentKHR = 9
00:26.13 : vkQueuePresentKHR = 9
00:26.15 : vkQueuePresentKHR = 3
00:26.15 : vkQueuePresentKHR = 3
00:26.15 : vkQueuePresentKHR = 9 That quite ridiculous amount of time for one api call. Similar timing happen on DX12 as well. |
Hi, I'm following this project and I'd like to give some tips to help improve the rendering performance. Maybe you're already following some of the tips, anyway I hope this is useful...
Profile npc tick with Intel vtune to identify cache misses. Optimize animation processing for cache performance by making sure alignment fits. When possible switch over to ecs style instead of sceneobjects with components. Use simd for matrix vector multiplications and vector math. I guess you're already doing that. |
Hi, @CoffeeParser , thanks for feedback! I'll address your proposes in groups:
OpenGothic uses hybrid approach:
Generally I'm looking forward for compute driven geometry(and mesh-shaders on NV) to handle geometry. That fully solves visibility tests issue: each meshlet tests itself versus HiZ of landscape, while landscape takes advantage of z-prepass (1 dispatch to fill z-buffer for opaque pieces of landscape)
Should use it for shadows and raytracing, but it's hard to implement into engine, since drawcall is like:
Nope, that won't work: bullet is used to test npc-to-landscape collision. So instead of optimizing it by time, OpenGothic(should be already there in since 1.0.1311) optimizes it by space: physical representation is updated only if logical position diverse for more than 2cm. Naturally it also takes care of standing still npc's.
Probably only at supper low-profile devices, not for PC. Bullet optimizations made a good deal of improval + in next build there going to be more npc-related stuff. I think we can have setup, when processing of all npc at once fusible on current gen CPU with no shrinking.
I was experimenting a little, yet there is a problem: when it's pure forward+, you won't have a gbuffer. No gbuffer means no ssao/ssdo and such. And hybrid solution is somewhat pointless - can use regular gbuffer in such case.
I was planning to move it to gpu-compute. |
While frustum culling goes as default option and OpenGothic has it, now I have a doubts: |
That sounds like the acquisition and preparation of the command buffers + frustum culling is inefficient. Are you using octree for frustum culling or brute force loop over every collider and test if it intersects with frustum sides? Anyway I doubt that having 60k+ draw calls per frame is reasonable and required. Batching static meshes together for instanced rendering would shave of lots of drawcalls and improve performance tremendously. In your example scene I saw lots of grass and trees, that could become 2 drawcalls instead of one per grass and one per tree just by using instanced rendering. Are you sorting opaque geometry by distance to camera? Draw opaque objects near camera first and then farther objects, lastly the terrain and skybox. In translucent pass the sorting goes in the opposite direction to achieve maximum overdraw of overlapping transparent geometry. Best rendering performance can be achieved by combining view frustum culling + hiz culling + using instanced rendering + sorting of gpu commands before submit. The problem of hiz culling is that you must use one full framebuffer drawing everything in frustum therefore limiting use of cards with low fillrate and again issuing lots of drawcalls?? On low fillrate cards it's anyway better to limit draw distance + fog and disable shadows or use simple circle shadows for npc. Same with ssao. Maybe it's worth it to rethink the whole animation drives movement speed thing as this seems to use most of game tick and contributes only in close distance to immersion. A simple solution would be to precalculate walk speed per animation frame upon loading of the character and then just interpolate the walk speed without interpolating the rig, thus getting precise movement speeds without actually having to animate the whole rig. This way it becomes possible to deactivate animation calculation for NPCs out of players view while still maintaining animation frames and getting same movement speed. Optimal solution would be to provide settings for this optimizations so that users can setup according to their machine. Full npc simulation and npc shrinking mode. Draw distance. Ambient occlusion. Shadow quality. |
@CoffeeParser you analyzing almost a year old trace. Let me do profile runs on latest build. CPU side
So in here:
Draw workflow:
Bucket can have up to 255 objects, memory for those is pre-allocated and managed by the bucket
Visibility token is a way to decouple drawing and visibility workload. On each frame engine resets all
If object is visible - atomic_inc to VisibleSet::size[view] + write obj index. For drawing:
If hardware supports mesh-shaders GPU sideRTX 3070, mesh shader are enabled.
It's 10k api calls. Yet there is a catch - without mesh api-call count is way worse, due lack of HiZ and other stuff. [writing in progress] |
Writing command to vulkan/dx command buffer is a bottleneck here - mostly it's about
No, that not promising at all - almost all hardware now has a tile-base rendering, and writes-out result only at the end of renderpass. Saving fragment shader invocations is not interesting - shading is trivial anyway.
There is a frustum culling on CPU + HiZ on GPU for objects + frustum for landscape meshlets.
Again it's super-low profile. Currently on intel game runs at 30-40 fps (assuming no SSAO). I assume it's possible to reach 60 only by massaging index buffers.
Yep, at this moment only SSAO checkbox is there, but not much else. |
Thanks for going into great detail on the current implementation. 10k is already much better than the previous 50k+ drawcalls. Please don't remove building your commandbuffer every frame. I doubt that sorting 10k longs is slow. While shading is trivial it can add up even on tiled hardware and sorting is worth it to allow the depth test to skip fragment shader calls and worth comparing frametimes. The real problem is coming up with a nice way to generate the command key. You can find a thorough explanation here: https://blog.molecular-matters.com/2014/11/06/stateless-layered-multi-threaded-rendering-part-1/ |
Some good(almost) new on performance. Now engine support mesh-shader emulation for any Vulkan 1.1 hardware. This will enable algorithmic-level solutions to culling, to be used not only on RTX, but practically everywhere. On current iteration emulator works correct, but uses ~128MB of scratch memory. Also it still requires compiler-level optimizations. More on that in: Try/Tempest#38 and Try/Tempest#33 |
A bit of necro-posting. Until now, I didn't bother to try to make changes in Extended Configuration because the game is running somewhat acceptable for me (~22 fps). I just tried to go through settings there and realized how heavily Cloud Shadows affects game performance, giving roughly 2x boost (~46 fps) if that setting is off. Is it really about shadows casted by clouds? From what I can see, visually it's more like extra layer of shadowing but to me it doesn't look being caused by clouds (shadows seem to be static). I wonder if it can be improved. Of course, turning it off in game settings for low-spec devices is also an option, it doesn't improve (?) things enough to justify cutting performance in half. |
Just wanted to "+1" this issue. On my setup (rather up-to-date laptop, i7-6600U, integrated graphic card) I get ~4 fps which makes the game absolutely unplayable. (All settings set to minimum/off, but that did not really make a difference) Anyway, this is a really exciting and awesome project, thank you for your work! |
Hi @simonsample ! I've spent most of june/july on optimizing game for Intel-UHD (integrated gpu on my laptop, GEN11). In short - this is very bad gpu by design :( On my side - now take a break from optimizations, if memory-speed issue can be solve in here, than it can just work. |
While on the topic: It would be nice to have good algorithm, similar to persistent-culling, that can work with alpha-tested geometry. Without bindless support... (bindless is semi-broken in vulkan). And without 64bit atomics ;) |
For general graphics-optimization stuff we have now: #568 |
I love this project and I always check the commits you are currently pushing to see what new things you are adding. What hurts me is the very low frame rate I'm able to achieve on my hardware, which is not weak though. I have an impression that the graphics engine generates the whole world at once, generally something is wrong if such an old game on a nvidia card runs at less than 20 FPS. Maybe it would be possible to limit the rendering range or use other tricks to overcome this problem?
The text was updated successfully, but these errors were encountered: