Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add profile-guided optimization (PGO) support for engine and for projects executable that use C++ libraries #2610

Open
Abdelilah-Majid opened this issue Apr 15, 2021 · 36 comments

Comments

@Abdelilah-Majid
Copy link

Abdelilah-Majid commented Apr 15, 2021

Describe the project you are working on

a game using c++

Describe the problem or limitation you are having in your project

i need more performance from my c++ game

Describe the feature / enhancement and how it helps to overcome the problem or limitation

implementing Profile-Guided Optimization (PGO) in the godot engine for more performance in the range of (15% to 20%) and in the exported game executable to use it with c++ libraries that also uses Profile-Guided Optimization (PGO) NOTE I CANT USE LIBRARIES WITH PGO UNLESS THE EXECUTABLE IS ALSO USING PGO;

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

//quoted from stackoverflow link: https://stackoverflow.com/questions/14492436/g-optimization-beyond-o3-ofast ===========
PGO

GCC has Profile-Guided Optimisations features. There isn't a lot of precise GCC documentation about it, but nevertheless getting it to run is quite straightforward.

first compile your program with -fprofile-generate.
let the program run (the execution time will be significantly slower as the code is also generating profile information into .gcda files).
recompile the program with -fprofile-use. If your application is multi-threaded also add the -fprofile-correction flag.
PGO with GCC can give amazing results and really significantly boost performance (I've seen a 15-20% speed increase on one of the projects I was recently working on). Obviously the issue here is to have some data that is sufficiently representative of your application's execution, which is not always available or easy to obtain.

//quoted from stackoverflow link: https://stackoverflow.com/questions/14492436/g-optimization-beyond-o3-ofast ===========

and implement the PGO in the exported game executables whether it uses gdscript or c++ but i will prefer c++

If this enhancement will not be used often, can it be worked around with a few lines of script?

it is demonstrated above

Is there a reason why this should be core and not an add-on in the asset library?

the only reason why this shold be in the core is the massive performance gain between (15-20%) more performance

@Calinou
Copy link
Member

Calinou commented Apr 15, 2021

Profile-guided optimization, as the name implies, depends on the workload being executed. If the workload matches the profile has been trained, you can get a significant performance boost. However, if the workload doesn't match the profile that has been trained, performance won't improve at all and may even be slightly degraded. Not to mention that producing PGO-optimized binaries requires compiling twice, which would slow down the official release process significantly. (Remember that dozens of Godot binaries need to be compiled each time an official release is made.)

While it's true that official releases of web browsers make use of PGO, applying this effectively to a game engine which is used in varied ways sounds difficult.

As for libraries, I don't think it's feasible to implement this for GDNative. However, it's certainly feasible for statically-compiled C++ modules since these are technically part of the engine source code (and application binary).

@Calinou Calinou changed the title add Profile-Guided Optimization (PGO) support for engine and for projects executables that uses c++ libraries Add profile-guided optimization (PGO) support for engine and for projects executable that use C++ libraries Apr 15, 2021
@Abdelilah-Majid
Copy link
Author

well you know best; even that i wold love to see such an implementation

@Abdelilah-Majid
Copy link
Author

so isn't there is a way to make a project that uses every functunality in godot as a traning module for the PGO
and compile godot to use that traning module project to generate .gcda files and then recompile the godot engine to use thees .gcda files

@Xrayez
Copy link
Contributor

Xrayez commented Apr 16, 2021

  • It's possible to compile the engine with custom flags via CCFLAGS or CXXFLAGS (see scons --help), so there's no need to modify the buildsystem at all.
  • Add generate_profile=yes/use_pgo=yes similarly to use_lto=yes build option via SCons. Official releases probably won't be optimized this way, but this should help you use this technique across compiler versions for your own use cases.

As someone who's interested in performance boosts with Godot, I could possibly work on this if the proposal is approved.

However, if the workload doesn't match the profile that has been trained, performance won't improve at all and may even be slightly degraded.

It depends I guess. If this can help to optimize the GDScript VM further, then it may not be necessarily project-specific anymore.

@Abdelilah-Majid
Copy link
Author

Abdelilah-Majid commented Apr 16, 2021

MR. Xrayez thank you; so if i compiled the godot engine with PGO flags wold my exexutable will be also using PGO or not;
and also if i compiled the godot engine with PGO wold be there any exported project performance gains after i trained my godot editor with my project or it will be only the editor who will gain thees performance

@Abdelilah-Majid
Copy link
Author

and also how stable could it be compiling godot engine branch 3.2 and using it

@Xrayez
Copy link
Contributor

Xrayez commented Apr 16, 2021

@Abdelilah-Majid I had mostly no issues compiling the stable branch on my host OS for development purposes. But the build complexity depends on how many platforms you're going to target for your project. I'm personally using https://github.com/godotengine/build-containers which makes it easy to compile for all supported platforms once you set the entire thing up.

I personally haven't used PGO myself to be honest... But yeah, you'd have to compile both editor and export templates with PGO if you want the performance boost in both I guess, there would be certainly some differences between debug vs release on byte code level, especially for GDScript.

@Abdelilah-Majid
Copy link
Author

Abdelilah-Majid commented Apr 16, 2021

thanks MR. Xrayez

@Abdelilah-Majid
Copy link
Author

Abdelilah-Majid commented Apr 17, 2021

MR. Xrayez i think that you build godot in your os host which i think is linux
i also use linux so i dont think that there will be any issues with compiling
or did you mean that you need a different godot engine builds with different compiling flags for you to export for different op systems

@Abdelilah-Majid
Copy link
Author

and for export templates is there is a repo for them that i can clode and build using PGO or there is no

@Xrayez
Copy link
Contributor

Xrayez commented Apr 17, 2021

MR. Xrayez i think that you build godot in your os host which i think is linux
i also use linux so i dont think that there will be any issues with compiling

Yes. 🙂

and for export templates is there is a repo for them that i can clode and build using PGO or there is no

Export templates is nothing more than a Godot build without editor scons tools=no compiled. There's no separate repository for them. But in some cases, export templates contain more than just an executable file, see https://github.com/godotengine/godot/tree/master/misc/dist.

@Abdelilah-Majid
Copy link
Author

thanks MR. Xrayez

@Abdelilah-Majid
Copy link
Author

okay so my internet connection is very slow now
tommorow i will clone the godot engine repo and i will compile it with PGO both godot engine editor and project template
and i will try to do some testing both for gdscript and c++
and i will try to come with some numbers if i can

@Abdelilah-Majid
Copy link
Author

Abdelilah-Majid commented Apr 18, 2021

from what you said MR. Xrayez and from what i guess when the game is exported the project template is acting like a VM for the project code and scenes the game will run like if you press the lunch button in the godot editor but without the editor itself, that explain why the game.x86_x64 has a big size even if its empty
so if this is true i think that the exported game will use the PGO file generated for the export template i guess, i will have to test

@Xrayez
Copy link
Contributor

Xrayez commented Apr 18, 2021

Yes, you can even place export templates (executable) into the project's source code directly (where project.godot resides) and launch it. It will launch the project without exporting.

@Abdelilah-Majid
Copy link
Author

ah, okay thanks

@Abdelilah-Majid
Copy link
Author

Abdelilah-Majid commented Apr 18, 2021

i actually have a question
so what flags do i need for the export template for it to be compiled for multiple op systems(i know that this is a selly question i just need to look at scons --help but i just want to make sure that what i am doing is correct)
and if you need to compile it multiple times for multiple OS i think that all of them should be in a single file for godot editor to import it and use it so how to do that too

@Xrayez
Copy link
Contributor

Xrayez commented Apr 18, 2021

I suggest you going through https://docs.godotengine.org/en/latest/development/compiling/index.html, but yes you'd have to compile for each platform... For instance, look at https://github.com/godotengine/godot-build-scripts/blob/master/build-linux/build.sh which compiles both editor and export templates for Linux in official build scripts. All export templates are then packaged with https://github.com/godotengine/godot-build-scripts/blob/master/build-release.sh. But again, that's how it's done officially, I'm linking those as a reference, those scripts are not really usable by themselves.

But that's not really important or needed for this proposal specifically, so lets keep the discussion on topic, if you have more questions you can ask them at community channels. 🙂

@Abdelilah-Majid
Copy link
Author

okay thanks MR. Xrayez

@Abdelilah-Majid
Copy link
Author

Abdelilah-Majid commented Apr 19, 2021

OK so i have done testing and the results where shocking and will be a game changer

so i will write here the steps i use to get to this point for anyone to follow along:

so here is the repo for the testing: https://github.com/Abdelilah-Majid/godot-PGO_test

so there is 3 tests; one for the cpu and one for the gpu and one for cpu & gpu

so i have clone the godot engine repo and i checkout to 3.2 branch
and i have added to the SConstruct file the following code in the first compiling time:

env.Append(CCFLAGS=['-fprofile-generate'])
env.Append(CFLAGS=['-fprofile-generate'])
env.Append(CXXFLAGS=['-fprofile-generate'])
env.Append(LINKFLAGS=['-fprofile-generate'])

then i compile the godot project templates using this command:

scons platform=x11 tools=no target=release bits=64 -j2

NOTE: i didnt use 'use_lto' because of the limitations of my laptop(i only have 6GB of ram) so there will be a performance gap between running the game on the official godot engine and on the project template that is using PGO so i will keep this in mind while i am doing my calculations

i have created a GLES 2.0 project because my laptop doent support GLES 3.0 and abouve

after compiling the project templates i train it whith my test project and after that i have done a sort on the .GCDA file to see which
one is begger and this is the result:

this sort is done using: ls -s **/*.gcda **/**/*.gcda **/**/**/*.gcda **/**/**/**/*.gcda **/**/**/**/**/*.gcda **/**/**/**/**/**/*.gcda **/**/**/**/**/**/**/*.gcda **/**/**/**/**/**/**/**/*.gcda **/**/**/**/**/**/**/**/**/*.gcda **/**/**/**/**/**/**/**/**/**/*.gcda

NOTE: the size here is in KB
NOTE: thees are only the file that are >= 100KB

332 core/bind/core_bind.x11.opt.64.gcda 
308  scene/register_scene_types.x11.opt.64.gcda
200  core/variant_call.x11.opt.64.gcda
216  modules/gdscript/gdscript_parser.x11.opt.64.gcda
216  modules/visual_script/visual_script_nodes.x11.opt.64.gcda
208  scene/gui/text_edit.x11.opt.64.gcda
272  scene/resources/visual_shader_nodes.x11.opt.64.gcda
288  servers/physics_2d/collision_solver_2d_sat.x11.opt.64.gcda
264  servers/visual_server.x11.opt.64.gcda
120  core/object.x11.opt.64.gcda
120  drivers/gles2/rasterizer_storage_gles2.x11.opt.64.gcda
152  drivers/gles3/rasterizer_scene_gles3.x11.opt.64.gcda
164  drivers/gles3/rasterizer_storage_gles3.x11.opt.64.gcda
140  modules/csg/csg_shape.x11.opt.64.gcda
100  modules/gdnative/nativescript/nativescript.x11.opt.64.gcda
104  modules/visual_script/visual_script_func_nodes.x11.opt.64.gcda
156  modules/visual_script/visual_script.x11.opt.64.gcda
116  scene/2d/canvas_item.x11.opt.64.gcda
116  scene/2d/cpu_particles_2d.x11.opt.64.gcda
112  scene/2d/physics_body_2d.x11.opt.64.gcda
120  scene/2d/tile_map.x11.opt.64.gcda
120  scene/3d/baked_lightmap.x11.opt.64.gcda
116  scene/3d/cpu_particles.x11.opt.64.gcda
156  scene/3d/physics_body.x11.opt.64.gcda
100  scene/3d/physics_joint.x11.opt.64.gcda
100  scene/3d/sprite_3d.x11.opt.64.gcda
112  scene/animation/animation_blend_tree.x11.opt.64.gcda
100  scene/animation/animation_player.x11.opt.64.gcda
108  scene/animation/animation_tree_player.x11.opt.64.gcda
100  scene/animation/animation_tree.x11.opt.64.gcda
156  scene/gui/control.x11.opt.64.gcda
112  scene/gui/graph_edit.x11.opt.64.gcda
116  scene/gui/item_list.x11.opt.64.gcda
108  scene/gui/popup_menu.x11.opt.64.gcda
132  scene/gui/rich_text_label.x11.opt.64.gcda
208  scene/gui/text_edit.x11.opt.64.gcda
188  scene/gui/tree.x11.opt.64.gcda
120  scene/main/node.x11.opt.64.gcda
116  scene/main/scene_tree.x11.opt.64.gcda
156  scene/main/viewport.x11.opt.64.gcda
120  scene/resources/animation.x11.opt.64.gcda
144  scene/resources/material.x11.opt.64.gcda
108  scene/resources/mesh.x11.opt.64.gcda
100  scene/resources/resource_format_text.x11.opt.64.gcda
188  scene/resources/texture.x11.opt.64.gcda
160  scene/resources/tile_set.x11.opt.64.gcda
180  scene/resources/visual_shader.x11.opt.64.gcda
152  servers/physics_2d_server.x11.opt.64.gcda
156  servers/physics_server.x11.opt.64.gcda
188  servers/visual/visual_server_wrap_mt.x11.opt.64.gcda

as you can see in the .gcda files there are more big .gcda files that are related to GLES 3.0 than GLES 2.0 so if the project was using GLES 3.0 the GPU performance could be better

then i have changed the commands i added before to SConstruct file to:

env.Append(CCFLAGS=['-fprofile-use', '-fprofile-correction'])
env.Append(CFLAGS=['-fprofile-use', '-fprofile-correction'])
env.Append(CXXFLAGS=['-fprofile-use', '-fprofile-correction'])
env.Append(LINKFLAGS=['-fprofile-use', '-fprofile-correction'])

i added the '-fprofile-correction' because of the use of multithreading in the godot engine

then i have recompile the godot project templates

and i run the same test project on the new project templates that uses PGO

and here is the results:

NOTE: all of the testing is done in GDscript

#this is the formula to calculate persentage:
(float (get_node("second").text)/(float (get_node("first").text) / 100) - 100)

#cpu test in ms
var cpu_test_average_time_one_without_PGO = 0.006035
var cpu_test_average_time_all_without_PGO = 93472

var cpu_test_average_time_one_while_PGO_is_generating_gcda_files = 0.00677
var cpu_test_average_time_all_while_PGO_is_generating_gcda_files = 104023

var cpu_test_average_time_one_using_PGO = 0.005075
var cpu_test_average_time_all_using_PGO = 77485

#cpu percentage calculations:


cpu_test_average_time_one_while_PGO_is_generating_gcda_files % cpu_test_average_time_one_without_PGO = -12.17%
cpu_test_average_time_all_while_PGO_is_generating_gcda_files % cpu_test_average_time_all_without_PGO = -11.28%

cpu_test_average_time_one_using_PGO % cpu_test_average_time_one_without_PGO = 18.91%
cpu_test_average_time_all_using_PGO % cpu_test_average_time_all_without_PGO = 20.63%

#adding percentages becouse the project template that i use dont make use of $(use_lto) 

cpu_test_average_time_one_using_PGO_persentage - cpu_test_average_time_one_while_PGO_is_generating_gcda_files = 30%
cpu_test_average_time_all_using_PGO - cpu_test_average_time_all_while_PGO_is_generating_gcda_files = 31%



#gpu test
var gpu_test_average_fps_without_PGO = 31

var gpu_test_average_fps_while_PGO_is_generating_gcda_files = 29

var gpu_test_average_fps_using_pgo = 29

#gpu percentage calculations:
gpu_test_average_fps_without_PGO % gpu_test_average_fps_while_PGO_is_generating_gcda_files = -6.89%

gpu_test_average_fps_without_PGO % gpu_test_average_fps_using_pgo = -6.89%

gpu_test_average_fps_using_pgo - gpu_test_average_fps_while_PGO_is_generating_gcda_files = 0%


#(cpu & gpu) test
var cpu_and_gpu_test_fps_without_PGO = 36

var cpu_and_gpu_test_fps_while_PGO_is_generating_gcda_files = 34

cpu_and_gpu_test_fps_without_PGO % cpu_and_gpu_test_fps_while_PGO_is_generating_gcda_files = -5.88

#cpu & gpu percentage calculations:

# and this is where the real shock is ===================================

var cpu_and_gpu_test_fps_using_PGO = 59
cpu_and_gpu_test_fps_without_PGO % cpu_and_gpu_test_fps_using_PGO = 63%

cpu_and_gpu_test_fps_using_PGO - cpu_and_gpu_test_fps_while_PGO_is_generating_gcda_files = 69%

dear godot core devs if you think that thees numbers are a waste of your time; i think that i am speaking on behalf of my self and the godot community when i am saying that we love to see godot stand out from the crowd in everything especialy performance and we hope that godot wont turn into another bloated unity and that we wold realy love to see that godot is the first game engine that uses the PGO technology;

#peace;

@Abdelilah-Majid
Copy link
Author

Abdelilah-Majid commented Apr 20, 2021

ok so i re run the last test (cpu and gpu test) before there was 1000 spritest now there is 10000

and here is the results:

cpu_and_gpu_test_without_pgo: 3 fps

cpu_and_gpu_test_with_PGO_enabled: 7 fps

cpu_and_gpu_test_without_pgo % cpu_and_gpu_test_with_PGO_enabled = 133%

i think that the time before this when i use PGO the bottleneck was the gpu so now i think that this is more precise number i guess

@Xrayez
Copy link
Contributor

Xrayez commented Apr 20, 2021

The results look interesting, but as Calinou said, the performance boost achieved might be only applicable to the workload used. There need to be several largely different test projects which do substantially different things, both on CPU and GPU levels. If all those projects gain significant performance in different domains and the performance doesn't degrade in other cases, then we'll solve the first equation.

Looking at your CPU test projects, the only thing they do is computing the sqrt() and call rand() methods. That's not what a typical game project would use, and the task here is to cater to most common use cases.

For GPU, yeah perhaps the performance can be achieved in a more general-purpose way, but it may just depend on the specific hardware/drivers used, so on other machines it may perform worser.

For CPU, as I said earlier, I think that PGO could be applied to optimize the GDScript VM, that means most common GDScript control paths need to be trained to be able to benefit from this kind of optimization in most common use cases.

Once all the above concerns are resolved, then the next task is to set up the official build toolchain to do this in an automated manner. I personally see this task quite insurmountable at the moment. First, the buildsystem would have to spend twice the time to compile all export templates, and it would have to run sample projects for each binary to train. It's not always possible to do in automated manner from a single host OS which just cross-compiles to other platforms. Those sample projects would also have to be maintained to ensure that they do work properly and don't ever regress during development.

Speaking about myself as a user, it currently takes me 12-24 hours to compile Godot for all platforms using the official build scripts with LTO. It means that it would take me more than 48 hours to compile with PGO, unlike official builds which only take like 4 hours on a powerful machine. 🥉

That said, this kind of optimization will be certainly useful for all who are interested in this technology for their (specific) projects, that's why I'm suggesting that at the very least, Godot should provide SCons build options related to PGO, see my previous comment: #2610 (comment).

But I'm not denying the possibility to use PGO for official builds as well, but there should be good proofs that PGO can be useful for most use cases, and won't make it worse for other use cases to be adopted.

@Abdelilah-Majid
Copy link
Author

Abdelilah-Majid commented Apr 20, 2021

i have an idea, why instead of going the hard way with general optimization why dont we add an official project template that use '-fprofile-generate' along side with the one that doesnt make use of PGO and letting the game devs have a simple way of downloading the godot source code and compiling the projects template with '-fprofile-use' '-fprofile-correction' in the godot editor and this way they can make use of PGO for there needs and this way you dont have to train anything yourself and the PGO will be there for people that need some specific optimization

@Abdelilah-Majid
Copy link
Author

Abdelilah-Majid commented Apr 20, 2021

i have re run the smae cpu and gpu test this time with 2000 objects
and here are the numbers:

cpu_and_gpu_test_without_pgo: 15 fps

cpu_and_gpu_test_with_PGO_enabled: 35 fps

cpu_and_gpu_test_without_pgo % cpu_and_gpu_test_with_PGO_enabled = 133.33%

@Abdelilah-Majid
Copy link
Author

Abdelilah-Majid commented Apr 20, 2021

MR. Xrayez i am so sorry for saying this but i dont think that you get the point
diffrent technologies have diffrent use cases; its not about what the average user do; its about what the technologie use case is
and i can only think of two use cases here
so gdscript has very poor performance unlike c or c++
and for that the use cases are:
adding more performance to cpu botlnecked programes(including games), and lowering the cpu usage

so imagine this:
we have SimpleJony here and he wants a pc for gaming and he dosent know much about hardware so he order a gaming pc from dellx
and dellx thinks that hey SimpleJony need a gaming pc and he can only pay an x amount of money
so lets blow most of his mony on the most important thing the GPU
so they build him a pc with a 3070 GPU and becouse there is no mony left they buy him a 2 core CPU

and we have here SuperUserMax he know a lot about hardware and technologie he even use linux
so SuperUserMax built a game where he render a 100000 moving objects lets say he is rendering 100000 Insects that lights up in a big scene for that aesthetic or lets say that his game do a lot of calculations which is very common in modern titles like calculating lot of collisions or lot of object movment; that cold be very heavy on the CPU
and this make SuperUserMax's game cpu heavy so his game need at least 4 core CPU

so SimpleJony saw SuperUserMax's game and he think oh this game is very butiful and i think that my 3070 GPU can handl it
so he download it and run it and para boom para baa the FPS is very low
why you might ask becouse the game is CPU butelnecked

and here is where the PGO hero comes in so as we saw in my calculations PGO can increase performance in thees specific cases by 133%

and for SimpleJony thats a lot of FPS;

end

so did you get the idea its not about what this can do for the average user and dont get me wrong it can realy decrease CPU load a lot;
but rather the use case of the PGO technologie

@Abdelilah-Majid
Copy link
Author

Abdelilah-Majid commented Apr 20, 2021

so if you think that traning project template will take a lote of time just dont do it
and just give peuple a pre compiled templates both for generating and using PGO files
and game devs can train there projects for there use cases
that way you save your self some time and you give peuple more performance

with that said i dont know if you can use -fprofile-use with files that aren't trained and then train these file and then use project_template_PGO_use_files on them and get some performance i never done such a thing and i dont know if it will work

@Xrayez
Copy link
Contributor

Xrayez commented Apr 21, 2021

and here is where the PGO hero comes in so as we saw in my calculations PGO can increase performance in thees specific cases by 133%

and for SimpleJony thats a lot of FPS;

I think you miss an important point: SimpleJony and SuperUserMax will both use PGO if this proposal is implemented. This way, SuperUserMax will take advantage of additional performance gains and push the performance to the max again, while SimpleJony won't be able to keep up in either case. I don't think this problem will ever go away unless SuperUserMax stops being so ambitious and demanding. In fact, SuperUserMax would likely be the one who'd use PGO in the first place. 😛

And your case may be totally different. I don't know exactly what you want to achieve with this in your own project or use case as a developer. If you're already using C++ to develop a game, then this should likely solve 95% of the performance problems (especially when you just want to switch from slow GDScript). Unless you're specifically targeting really low-end hardware/market and audience which cannot afford high-end technologies. This is where I can understand the problem.

diffrent technologies have diffrent use cases; its not about what the average user do;

That's what general-purpose software has to do. I mean, it's not necessarily "average Joe" problem, but how many people stumble upon a similar problem to justify addition to the engine.

Again, that's only my opinion, so far I'm the only one who actively participates in the discussion with you. But I'm the one who's also interested in performance gains with Godot. But even then, I haven't really needed something more from C++ development in Godot. Just being able to use C++ over GDScript for performance-critical tasks resolve quite a lot of limitations already. It might be actually the algorithms and data structures that you use which can significantly improve the performance even without resorting to technologies like LTO/PGO.

Yet again, having additional performance gains would be certainly nice (that's why official builds use LTO now), but we also have to think in terms of how this will affect daily Godot development and maintenance.

Also, Godot does not really prefer performance for development anyways, but more like usability. The fact that Godot uses a tree architecture for everything already creates some performance penalties in contrast with ECS and whatnot.

@Abdelilah-Majid
Copy link
Author

Abdelilah-Majid commented Apr 21, 2021

well i agree with you
if this can help everyone then yeah why not

@Abdelilah-Majid
Copy link
Author

Abdelilah-Majid commented Apr 21, 2021

and for

And your case may be totally different. I don't know exactly what you want to achieve with this in your own project or use case as
 a developer. If you're already using C++ to develop a game, then this should likely solve 95% of the performance problems 
(especially when you just want to switch from slow GDScript). Unless you're specifically targeting really low-end hardware/market 
and audience which cannot afford high-end technologies. This is where I can understand the problem.

i am just like Linus Torvalds i love optimizations for the sake of optimizations

@Abdelilah-Majid
Copy link
Author

Abdelilah-Majid commented Apr 21, 2021

as for my game it has a big grid of a map and a lot of characters that need to spot the position of the the place they want to go to and they need to calculate what is the best road to go to while avoiding objects that are in the way so they need to loop throw the std::vector floor_grid again and again to find the best way to go to the place they programmed to go to for each character; and i think that this will need a lot of cpu power

@Abdelilah-Majid
Copy link
Author

Abdelilah-Majid commented Apr 21, 2021

also i forgot to say something which has to do with template files size
so:
godot.x11.opt_PGO_generate.64 size is: 529.3 MB
and
godot.x11.opt_PGO_use.64 size is: 335.8 MB

i think that this is becouse of the PGO optimizations which make the programes that use .GCDA files lighter in size and this also could be usful to reduce the godot project template size by a lote

@YuriSizov
Copy link
Contributor

I suggest @Abdelilah-Majid you don't attach yourself too tightly to your preliminary results. As mentioned by both Calinou and Xrayez, there is likely no universal way to optimize everyone's performance with PGO, or with any other means for that matter. You created an arbitrary project that was successfully optimized, but it's far from a complete game, and it doesn't even do things that most games do in isolation. So while impressive, those results are for the most part irrelevant.

To get some real use out of PGO every developer would have to generate their own profile on the per project basis. Some general optimizations in the engine can be possible, such as the ones mentioned by Xrayez, but those would require carefully designed tests to evaluate them.

@Abdelilah-Majid
Copy link
Author

okay

@Abdelilah-Majid
Copy link
Author

Abdelilah-Majid commented Apr 21, 2021

but if the godot devs didnt implement PGO in the godot engine i hope that at least they add some commands in the scons file for optionally use PGO

@Abdelilah-Majid
Copy link
Author

Abdelilah-Majid commented Apr 21, 2021

if any one is interested here is the project that i use for testing PGO on godot: https://github.com/Abdelilah-Majid/godot-PGO_test
if you want to add a test make a pull request so that every one who wants to test PGO in godot can find some tests to do so;

note i wont do testing for you tests becouse my laptop is too weak to compile godot engine(it takes a long time on my 2 core 2 tread cpu)

@zamazan4ik
Copy link

I want to add more materials about PGO for possible future developments in this area.

Regarding gamedev domain, I know the following results about PGO:

  • Unreal Engine supports PGO as a build option since 4.27 (search for "PGO" on the page). According to the official release notes, in some scenarios, PGO makes it possible to achieve +10% in performance for CPU-bound scenarios.
  • According to my tests in Bevy (a Rust-based engine), PGO can improve performance (In these results you need to interpret performance decrease as "Release version is slower than PGOed" and performance increase as "Release version is faster than PGOed").

More results about PGO for other pieces of software, including some low-level libraries like libspng, you can find here. Hope it can help someone in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants