Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forceinline improves performance significantly with MSVC #168

Closed
sbsce opened this issue Nov 21, 2022 · 3 comments
Closed

Forceinline improves performance significantly with MSVC #168

sbsce opened this issue Nov 21, 2022 · 3 comments

Comments

@sbsce
Copy link

sbsce commented Nov 21, 2022

While testing boost::unordered_flat_set, I noticed my code is reliably running over 10% faster if I __forceinline all the function calls that the boost::unordered_flat_set makes in my hot path. My hot path is only doing .contains(), so anything called by .contains(), including the .contains itself, is where I added __forceinline. So that in my own code where I call .contains(), looking at the disassembly there is no call anywhere any more, it's fully inlined. I think I had to add __forceinline to 6 functions inside boost code.

It is a bit inconvenient to manually add __forceinline to all those functions though - it's definitely worth the 10% performance gain, but I am quite sure that the next time I update boost in a few years, I'll forget to apply these changes again, and then my performance will be worse.

Assuming you don't want to add __forceinline to those functions by default, could there maybe some define like BOOST_FORCEINLINE_UNORDERED_SET that automatically enables forceinlining all the important functions?

I am already compiling with maximum optimization level of MSVC, so by default it doesn't want to inline it, MSVC often needs to be forced to inline stuff.

@joaquintides
Copy link
Member

joaquintides commented Nov 21, 2022

Can you list those 6 Boost-internal functions you inlined?

@sbsce
Copy link
Author

sbsce commented Nov 21, 2022

Sure, here's what I had to change to get it fully inlined, I changed it one by one until none of the functions individually showed up in the VS profiler any more and the disassembly of my hot path no longer contained any call instruction to boost:

image
image
image

FORCEINLINE is a globally defined macro I have that's just doing __forceinline.

@pdimov
Copy link
Member

pdimov commented Nov 21, 2022

Force-inlining the destructor and the copy assignment is a bit odd. The rest seem sensible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants