-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate performance regression in 0.4 #510
Conversation
Codecov Report
@@ Coverage Diff @@
## master #510 +/- ##
=======================================
Coverage 99.69% 99.69%
=======================================
Files 54 54
Lines 17183 17183
=======================================
Hits 17131 17131
Misses 52 52 |
const auto ret = fizzy::test::adler32( | ||
bytes_view{*instance.memory}.substr(args[0].as<uint32_t>(), args[1].as<uint32_t>())); | ||
const auto ret = | ||
fizzy::test::adler32(bytes_view{*instance.memory}.substr(args[0].i64, args[1].i64)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have zero effect on the benchmarks, except for micro/host_adler32
.
--- asm.bad 2020-08-25 09:24:08.712147961 +0200
+++ asm.good 2020-08-25 09:23:36.355214339 +0200
@@ -1,4 +1,4 @@
- 3e: 8b 48 08 mov 0x8(%rax),%ecx
- 41: 48 8b 7a 08 mov 0x8(%rdx),%rdi
- 45: 48 8b 32 mov (%rdx),%rsi
- 48: 8b 10 mov (%rax),%edx
+ 3e: 48 8b 48 08 mov 0x8(%rax),%rcx
+ 42: 48 8b 7a 08 mov 0x8(%rdx),%rdi
+ 46: 48 8b 32 mov (%rdx),%rsi
+ 49: 48 8b 10 mov (%rax),%rdx The only function difference is changing |
And these are benchmarks on Skylake CPU, after the "fix":
|
0785a12
to
ed89bcf
Compare
So I tried LTO builds for both Haswell (original source of the fix) and Skylake. This makes the benchmark results before and after the "fix" the same (i.e. stable), but the absolute values comparing with not-LTO build are ~6% slower. I think I will go with LTO builds to compare performance to have more stable environment. |
So the LTO builds make the fix obsolete, but there is still up to 10% performance regression on Haswell. 0.3:
0.4:
This standing out counter is The |
Here are my results on GCC 10.2 without LTO
With LTO
|
I propose we recheck it again before releasing 0.5.0 |
I suppose this was not done? |
ed89bcf
to
9454010
Compare
9454010
to
41e7bbc
Compare
Still has some impact on performance, but falls into "small unrelated code change affects performance somewhere else" basket. |
This partly reverts 67fd4c8#diff-47ed36245217950f22b14dcb2472adbaR60