-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up the fast path for assert_eq! and assert_ne! #57815
Conversation
Currently, the panic!() calls directly borrow the value bindings. This causes those bindings to always be initialized, i.e. they're initialized even before the values are even compared. This causes noticeable overhead in what should be a really cheap operation. By performing a reborrow of the value in the call to panic!(), we allow LLVM to optimize that code, so that the extra borrow only happens in the error case. We could achieve the same result by dereferencing the values passed to panic!(), as the format machinery borrows them anyway, but this causes assertions to fail to compile if one of the values is unsized, i.e. it would be a breaking change.
r? @sfackler (rust_highfive has picked a reviewer for you, use r? to override) |
@bors try Local numbers seem promising, but I'd like to have "official" numbers from prlo as well |
Speed up the fast path for assert_eq! and assert_ne! Currently, the panic!() calls directly borrow the value bindings. This causes those bindings to always be initialized, i.e. they're initialized even before the values are even compared. This causes noticeable overhead in what should be a really cheap operation. By performing a reborrow of the value in the call to panic!(), we allow LLVM to optimize that code, so that the extra borrow only happens in the error case. We could achieve the same result by dereferencing the values passed to panic!(), as the format machinery borrows them anyway, but this causes assertions to fail to compile if one of the values is unsized, i.e. it would be a breaking change.
FWIW, the above numbers are on top of a build with my pending PRs already applied. |
Would it be worth adding a codegen test to make sure these continue to behave nicely? |
Good idea. I'll try to add one tomorrow. I will have to check how to do so, it's been a while and I'm out of time for today. |
Oh, just in case somebody's wondering about it, a good deal of the perf win is probably due to the fact that functions using those macros are now smaller, which allows for them to be inlined, pontentially avoiding even more copies. This was the case for the to_bits function, which I had been investigating, and which wasn't inlined even though it had the inline attribute which even lowers the bar for inlining. With those changes, the function is properly inlined. |
☀️ Test successful - checks-travis |
Maybe my issue will be solved then? #55914 |
For most practical purposes it should be fixed.
There will still be a slight difference because of the stack space needed
for the formatting of the panic message, but that's just the adjustment of
the stack pointer, which will probably happen anyway in many cases because
of the stack needed by function using the assertion and is the price you
pay for getting a more detailed error message.
Am Di., 22. Jan. 2019, 09:50 hat Marcel Hellwig <[email protected]>
geschrieben:
… Maybe my issue will be solved then? #55914
<#55914>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#57815 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAOGMhZhwDaDmZravaSflub6mrzyeuJWks5vFtC4gaJpZM4aLWje>
.
|
Actually, I just checked again, and there's still some extra work happening that's not strictly required. Assembly looks like this: assert_eq! pushq %rax
.cfi_def_cfa_offset 16
cmpl %esi, %edi
jne .LBB10_1
movb $1, %al
popq %rcx
.cfi_def_cfa_offset 8
retq assert_eq! (new) subq $104, %rsp
.cfi_def_cfa_offset 112
movl %edi, (%rsp)
movl %esi, 4(%rsp)
cmpl %esi, %edi
jne .LBB11_1
movb $1, %al
addq $104, %rsp
.cfi_def_cfa_offset 8
retq 😢 |
@rust-timer build 261660d |
Success: Queued 261660d with parent 7164a9f, comparison URL. |
Finished benchmarking try commit 261660d |
Hm, that's interesting, the number's aren't even close to what I got here. I'll check later whether there is some interaction between this and my other PRs that were included on both sides here. |
Seems that my baseline compiler was somehow slower than it should be, after cleaning out the build directory of that one and doing a fresh build ( I'm experimenting with an improved version of this to get (closer to) a result that is more or less equal to the code that Do we want to merge this anyway (possibly with an adapted commit message) or shall I close it for now? |
I did some benchmarks with outlining the panic. It will give me this: example::unlkly:
pushq %rax
cmpl %esi, %edi
jne .LBB2_1
popq %rcx
retq
.LBB2_1:
callq example::moep
ud2 IMHO, this should be even more simplified to example::unlkly:
cmpl %esi, %edi
jne .LBB2_1
retq
.LBB2_1:
# [doing stuff]
callq example::moep
ud2 Either way, I don't understand why rust/LLVM does |
No, these instructions adjust |
So it sounds like there's still a performance improvement from this change, it's just smaller than originally thought? SGTM if that's the case. |
I don't think that this applies here, because you don't need to align the stack at all... Just do a compare, branch (if needed) or return. |
You can skip the alignment step in leaf functions, but this is not a leaf function, there's a call to |
ping from triage @sfackler awaiting your review on this |
@bors r+ rollup |
📌 Commit 5a7cd84 has been approved by |
Speed up the fast path for assert_eq! and assert_ne! Currently, the panic!() calls directly borrow the value bindings. This causes those bindings to always be initialized, i.e. they're initialized even before the values are even compared. This causes noticeable overhead in what should be a really cheap operation. By performing a reborrow of the value in the call to panic!(), we allow LLVM to optimize that code, so that the extra borrow only happens in the error case. We could achieve the same result by dereferencing the values passed to panic!(), as the format machinery borrows them anyway, but this causes assertions to fail to compile if one of the values is unsized, i.e. it would be a breaking change.
Speed up the fast path for assert_eq! and assert_ne! Currently, the panic!() calls directly borrow the value bindings. This causes those bindings to always be initialized, i.e. they're initialized even before the values are even compared. This causes noticeable overhead in what should be a really cheap operation. By performing a reborrow of the value in the call to panic!(), we allow LLVM to optimize that code, so that the extra borrow only happens in the error case. We could achieve the same result by dereferencing the values passed to panic!(), as the format machinery borrows them anyway, but this causes assertions to fail to compile if one of the values is unsized, i.e. it would be a breaking change.
Rollup of 13 pull requests Successful merges: - #57693 (Doc rewording) - #57815 (Speed up the fast path for assert_eq! and assert_ne!) - #58034 (Stabilize the time_checked_add feature) - #58057 (Stabilize linker-plugin based LTO (aka cross-language LTO)) - #58137 (Cleanup: rename node_id_to_type(_opt)) - #58166 (allow shorthand syntax for deprecation reason) - #58196 (Add specific feature gate error for const-unstable features) - #58200 (fix str mutating through a ptr derived from &self) - #58273 (Rename rustc_errors dependency in rust 2018 crates) - #58289 (impl iter() for dyn Error) - #58387 (Disallow `auto` trait alias syntax) - #58404 (use Ubuntu keyserver for CloudABI ports) - #58405 (Remove some dead code from libcore) Failed merges: r? @ghost
Speed up the fast path for assert_eq! and assert_ne! Currently, the panic!() calls directly borrow the value bindings. This causes those bindings to always be initialized, i.e. they're initialized even before the values are even compared. This causes noticeable overhead in what should be a really cheap operation. By performing a reborrow of the value in the call to panic!(), we allow LLVM to optimize that code, so that the extra borrow only happens in the error case. We could achieve the same result by dereferencing the values passed to panic!(), as the format machinery borrows them anyway, but this causes assertions to fail to compile if one of the values is unsized, i.e. it would be a breaking change.
Rollup of 12 pull requests Successful merges: - #57693 (Doc rewording) - #57815 (Speed up the fast path for assert_eq! and assert_ne!) - #58034 (Stabilize the time_checked_add feature) - #58057 (Stabilize linker-plugin based LTO (aka cross-language LTO)) - #58137 (Cleanup: rename node_id_to_type(_opt)) - #58166 (allow shorthand syntax for deprecation reason) - #58200 (fix str mutating through a ptr derived from &self) - #58273 (Rename rustc_errors dependency in rust 2018 crates) - #58289 (impl iter() for dyn Error) - #58387 (Disallow `auto` trait alias syntax) - #58404 (use Ubuntu keyserver for CloudABI ports) - #58405 (Remove some dead code from libcore) Failed merges: r? @ghost
I don't think that's true. Stack alignment only has to be correct when you call into a function. Seems like LLVM needs an optimization to defer aligning the stack if the hot code contains no function calls. |
IIRC some instructions also need an aligned stack. It's a weird scenario after all. |
Some SSE instructions require aligned data addresses. The compiler can easily handle this. |
But then it would require an LLVM update, I'm afraid there's not much rust can do. |
Currently, the panic!() calls directly borrow the value bindings. This
causes those bindings to always be initialized, i.e. they're initialized
even before the values are even compared. This causes noticeable
overhead in what should be a really cheap operation.
By performing a reborrow of the value in the call to panic!(), we allow
LLVM to optimize that code, so that the extra borrow only happens in the
error case.
We could achieve the same result by dereferencing the values passed to
panic!(), as the format machinery borrows them anyway, but this causes
assertions to fail to compile if one of the values is unsized, i.e. it
would be a breaking change.