Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustc beta (1.28.0) generates unnecessary call instruction #52651

Closed
tspiteri opened this issue Jul 23, 2018 · 10 comments · Fixed by #56358
Closed

rustc beta (1.28.0) generates unnecessary call instruction #52651

tspiteri opened this issue Jul 23, 2018 · 10 comments · Fixed by #56358
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code. P-medium Medium priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Milestone

Comments

@tspiteri
Copy link
Contributor

For this code:

pub struct A([i32; 4]);
impl A {
    pub fn foo(self) -> Self { self }
    pub fn bar(self) -> Self { self }
}

rustc beta (1.28.0) with opt-level=3 generates:

example::A::bar:
        movups  xmm0, xmmword ptr [rsi]
        movups  xmmword ptr [rdi], xmm0
        mov     rax, rdi
        ret

example::A::foo:
        push    rbx
        mov     rbx, rdi
        call    example::A::bar@PLT
        mov     rax, rbx
        pop     rbx
        ret

rustc 1.27.1 and rustc beta with opt-level=s generates:

example::A::foo:
        movups  xmm0, xmmword ptr [rsi]
        movups  xmmword ptr [rdi], xmm0
        mov     rax, rdi
        ret

example::A::bar:
        movups  xmm0, xmmword ptr [rsi]
        movups  xmmword ptr [rdi], xmm0
        mov     rax, rdi
        ret

The beta with opt-level=3 seems to be anti-inlining the identical functions.

@Mark-Simulacrum Mark-Simulacrum added the regression-from-stable-to-beta Performance or correctness regression from stable to beta. label Jul 23, 2018
@Mark-Simulacrum Mark-Simulacrum added this to the 1.28 milestone Jul 23, 2018
@Mark-Simulacrum Mark-Simulacrum added I-slow Issue: Problems and improvements with respect to performance of generated code. I-nominated T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 23, 2018
@Mark-Simulacrum
Copy link
Member

cc @rust-lang/compiler

(I've not attempted a reproduction locally)

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Jul 23, 2018

This might be due to #49479, the time frame seems right and the code change observed here is precisely what MergeFunctions does. Some observations:

  • it's counter intuitive that this doesn't happen with opt-level=s
  • it's silly that it doesn't generate a tail call, wonder why that happens
  • if we decided that distinct source-level functions are allowed to have identical addresses and told LLVM about it with the unnamed_addr attribute, MergeFunctions could turn one function into a link-time alias for the other instead of creating a thunk that calls

@hanna-kruppe
Copy link
Contributor

cc @nox

@pnkfelix pnkfelix added the P-medium Medium priority label Jul 26, 2018
@pietroalbini pietroalbini added regression-from-stable-to-stable Performance or correctness regression from one stable version to another. and removed I-nominated regression-from-stable-to-beta Performance or correctness regression from stable to beta. labels Jul 26, 2018
@eddyb
Copy link
Member

eddyb commented Jul 31, 2018

@rkruppe Huh, I thought we used unnamed_addr already, and didn't make any promises?

@hanna-kruppe
Copy link
Contributor

@eddyb Evidently we're not using it. I don't know of any reason to make promises here, but since we de facto don't merge addresses, who knows whether anyone in the wild is relying on it for crazy shenanigans 🤷‍♂️

@dshynkev
Copy link
Contributor

dshynkev commented Aug 8, 2018

Interestingly, the LLVM IR the sample code results in appears to be using both unnamed_addr and tail_call, but the assembly is the same as in the parent post.

define void @bar(%A* noalias nocapture sret dereferenceable(16), %A* noalias nocapture readonly dereferenceable(16) %self) unnamed_addr #0 {
 ; ...
}
; ...
define void @foo(%A* noalias nocapture sret dereferenceable(16), %A* noalias nocapture readonly dereferenceable(16)) unnamed_addr #0 {
  tail call void @bar(%A* noalias nocapture sret dereferenceable(16) %0, %A* noalias nocapture readonly dereferenceable(16) %1) #0
  ret void
}

@dshynkev
Copy link
Contributor

dshynkev commented Aug 8, 2018

The lack of a tail call is caused by the sret attribute. Relevant LLVM code.

@dshynkev
Copy link
Contributor

dshynkev commented Aug 8, 2018

unnamed_addr doesn't have the expected effect because here the only possible outcomes are:

  1. If bar has local linkage, then it is erased (all uses are replaced with foo).
  2. Otherwise, if creating a thunk is profitable, bar becomes a thunk (tail-)calling foo (which isn't lowered as an actual tail call because of sret, as above). This is what happened in the parent post.
  3. Otherwise, foo and bar remain unchanged (all local uses of bar are replaced with foo, but the definition is kept for external linkage).

There is no attempt to alias bar with foo. Two things should probably be fixed in LLVM code:

  1. isThunkProfitable should be less optimistic (currently it returns true for all functions longer than 2 instructions). This will, at least, avoid the bloat seen in the parent post.
  2. writeThunkOrAlias as described here should replace writeThunk. Then we can be even more space-efficient by making bar an alias for foo.

I will look into submitting a patch to LLVM.

@nikic
Copy link
Contributor

nikic commented Nov 5, 2018

For the record, I've submitted a couple of patches for MergeFuncs to improve handling in some cases and add alias support. They're accepted, but not landed yet:

https://reviews.llvm.org/D53262
https://reviews.llvm.org/D53271
https://reviews.llvm.org/D53285

The alias functionality will be behind a -mergefunc-use-aliases flag, because historically there have been issues here with lacking linker support. We can try enabling this flag for rust and see if there are any issues.

@nikic
Copy link
Contributor

nikic commented Nov 22, 2018

The patches have all landed upstream. After the next LLVM update, we can enable use of aliases by passing -mergefunc-use-aliases in

unsafe fn configure_llvm(sess: &Session) {

nikic added a commit to nikic/rust that referenced this issue Nov 29, 2018
If the Rust LLVM fork is used, enable the -mergefunc-use-aliases
flag, which will create aliases for merged functions, rather than
inserting a call from one to the other.

A number of codegen tests needed to be adjusted, because functions
that previously fell below the thunk limit are now being merged.
Merging is prevented either using -C no-prepopulate-passes, or by
making the functions non-identical.

I expect that this is going to break something, somewhere, because
it isn't able to deal with aliases properly, but we won't find out
until we try :)

This fixes rust-lang#52651.
bors added a commit that referenced this issue Dec 3, 2018
Enable -mergefunc-use-aliases

If the Rust LLVM fork is used, enable the -mergefunc-use-aliases
flag, which will create aliases for merged functions, rather than
inserting a call from one to the other.

A number of codegen tests needed to be adjusted, because functions
that previously fell below the thunk limit are now being merged.
Merging is prevented in various ways now.

I expect that this is going to break something, somewhere, because
it isn't able to deal with aliases properly, but we won't find out
until we try :)

This fixes #52651.

r? @rkruppe
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code. P-medium Medium priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants