Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missed optimization: references from pointers aren't treated as noalias #38941

Open
mjbshaw opened this issue Jan 9, 2017 · 27 comments
Open
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-enhancement Category: An issue proposing an enhancement or a PR with one. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-slow Issue: Problems and improvements with respect to performance of generated code. P-medium Medium priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-llvm Working group: LLVM backend code generation

Comments

@mjbshaw
Copy link
Contributor

mjbshaw commented Jan 9, 2017

The following code results in a being dereferenced again for the return value: (https://godbolt.org/z/YnrzMa3oj)

pub unsafe fn g(a: *mut i32, b: *mut i32) -> i32 {
  *a = 0;
  *b = 1;
  return *a;
}

That is to be expected. In contrast, the same function with references avoids the load: (https://godbolt.org/z/YnrzMa3oj)

pub fn g_ref(a: &mut i32, b: &mut i32) -> i32 {
  *a = 0;
  *b = 1;
  return *a;
}

However, if we change the code to the following: (https://godbolt.org/z/YnrzMa3oj)

pub fn g(a: *mut i32, b: *mut i32, x: *const i32) {
  let a = &mut *a;
  let b = &mut *b;
  *a = 10;
  *b = 11;
  return *a;
}

then the extra dereference is not optimized out as it should be, because do not emit anything like noalias inside functions.

@bluss
Copy link
Member

bluss commented Jan 9, 2017

It might be due to /issues/31681

@aidanhs
Copy link
Member

aidanhs commented Jan 9, 2017

Do you have a play link demonstrating noalias taking effect? AFAIK noalias annotations were completely removed after 1.7.0 due to the llvm bugs.

Here is your foo with some bits and pieces to defeat whole-program optimisation:

use std::process;

fn main() {
    let r1 = foo(&mut 1, &mut 2, &3);
    let r2 = foo(&mut 1, &mut 2, &4);
    process::exit(r1 + r2)
}

#[inline(never)]
pub fn foo(a: &mut i32, b: &mut i32, x: &i32) -> i32 {
  *a = *x;
  *b = *x;
  *a
}

https://is.gd/NsngDd

If you compile that in release mode and look at the LLVM IR, you'll see that foo does not have noalias annotations and performs the store twice.

@mjbshaw
Copy link
Contributor Author

mjbshaw commented Jan 9, 2017

Here's a demo: https://godbolt.org/g/pSj51i

pub fn foo(a: &mut i32, b: &mut i32, x: &i32) {
  *a = *x;
  *b = *x;
}

compiles into:

example::foo:
        push    rbp
        mov     rbp, rsp
        mov     eax, dword ptr [rdx]
        mov     dword ptr [rdi], eax
        mov     dword ptr [rsi], eax
        pop     rbp
        ret

@aidanhs
Copy link
Member

aidanhs commented Jan 9, 2017

Right, sorry, I was being dim - you're referring to the load only happening once.

This is probably because immutable references are marked as readonly in llvm (take a look at the debug llvm IR of my example). If you make x be &mut in your example, you'll see it does do the load twice despite it theoretically giving more information in this function (if all three were noalias, as implied by &mut, llvm could infer that x is readonly). Given this deoptimisation, noalias annotations must not be being emitted at all which agrees with my understanding of the current situation.

Your last example has all pointer arguments which do not get annotated, even if they're const. I don't know what the current state of annotating local variables is in rust, which is what that would require.

@mjbshaw
Copy link
Contributor Author

mjbshaw commented Jan 9, 2017

I don't think the readonly attribute makes a difference here. Or at least, it shouldn't. LLVM documents it as:

On an argument, this attribute indicates that the function does not write through this pointer argument, even though it may write to the memory that the pointer points to.

It clearly states that the memory may be written to (but only through a different pointer, not through the argument marked as readonly). Thus, it should have to load the pointer twice.

AFAIK, this optimization should only be possible if a, b, or x are marked as noalias.

Indeed, if you compile the following C code:

void foo(int *a, int *b, const int* x) {
    *a = *x;
    *b = *x;
}

The generated LLVM IR marks x as readonly but still loads it twice (which is to be expected):

; Function Attrs: norecurse nounwind ssp uwtable
define void @foo(i32* nocapture, i32* nocapture, i32* nocapture readonly) #0 {
  %4 = load i32, i32* %2, align 4, !tbaa !2
  store i32 %4, i32* %0, align 4, !tbaa !2
  %5 = load i32, i32* %2, align 4, !tbaa !2
  store i32 %5, i32* %1, align 4, !tbaa !2
  ret void
}

Actually, your point about making x an &mut hints at the reason why the extra load is optimized out in the first place: if x is an immutable reference, it is marked as noalias; if it's a mutable reference, then it is not marked as noalias (edit: on second thought, is it valid to have two readonly noalias pointers that alias? They're both noalias, but at the same time neither will be used to modify the data... I'm not actually sure how LLVM would respond in this case). I believe this is due to the issue bluss linked to: #31681

However, it doesn't answer the question why safe_x doesn't get marked as noalias, despite being an immutable reference. I believe safe_x should be marked as noalias.

@bluss
Copy link
Member

bluss commented Jan 9, 2017

Good issue. noalias on &mut is unrelated, sorry for the misdirection.

I'd look at

  1. Is rustc emitting noalias where it can? (Or other relevant metadata)
  2. Is this metadata lost in some optimization pass? (Example: Extra null check in slice iterator's .next().is_none() #37945)

I'm not sure there is a place where noalias can be placed

For this code:

#![crate_type="lib"]

pub fn foo(a: &mut i32, b: &mut i32, x: *const i32) {
    unsafe {
        let r = &*x;
        *a = *r;
        *b = *r;
    }
}

This is produced (rustc --emit=llvm-ir noalias.rs). It looks like there is no place where noalias can be used, except if it were back traced to the function argument for this particular function body.

; ModuleID = 'noalias.cgu-0.rs'
source_filename = "noalias.cgu-0.rs"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

; Function Attrs: uwtable
define void @_ZN7noalias3foo17h1c56d659e6e1ab4bE(i32* dereferenceable(4), i32* dereferenceable(4), i32*) unnamed_addr #0 {
entry-block:
  %_0 = alloca {}
  br label %start

start:                                            ; preds = %entry-block
  %3 = load i32, i32* %2
  store i32 %3, i32* %0
  %4 = load i32, i32* %2
  store i32 %4, i32* %1
  ret void
}

attributes #0 = { uwtable }

@aidanhs
Copy link
Member

aidanhs commented Jan 9, 2017

This may be what you're looking for - #16515

@sanmai-NL
Copy link

Perhaps someone can tag this issue as performance related? How can non-maintainers help to do such routine work? I vaguely remember @TimNN doing this a lot for issues here.

@bluss bluss added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Feb 19, 2017
@steveklabnik
Copy link
Member

How can non-maintainers help to do such routine work?

Unfortunately, GitHub doesn't let anyone do issue triage without having a commit bit, it's kind of annoying.

If you email me a list of issues and changes that should be made, I can manually do it on your behalf, for now. It's not great. I'd like to solve this somehow, but there's so much work to do...

@sanmai-NL
Copy link

Off-topic
@steveklabnik:
It's not so much a request from me, but a general question. Perhaps there is some way using the GitHub API and crowd sourced tagging.

@Mark-Simulacrum Mark-Simulacrum added C-enhancement Category: An issue proposing an enhancement or a PR with one. and removed C-enhancement Category: An issue proposing an enhancement or a PR with one. labels Jul 26, 2017
@nox
Copy link
Contributor

nox commented Apr 2, 2018

So in which way is this issue not a duplicate of #16515?

Cc @rust-lang/wg-codegen

@shepmaster
Copy link
Member

So in which way is this issue not a duplicate of #16515?

In the related Stack Overflow question by the OP, an answer was provided:

pub unsafe fn f(a: *mut i32, b: *mut i32, x: *const i32) {
    (|safe_a: &mut i32, safe_b: &mut i32, safe_x: &i32| {
        *safe_a = *safe_x;
        *safe_b = *safe_x;
    })(&mut *a, &mut *b, &*x)
}

This results in the LLVM IR:

; playground::f1
; Function Attrs: nounwind uwtable
define void @_ZN10playground2f117h829c91c07d14df5dE(i32* %a, i32* %b, i32* readonly %x) unnamed_addr #1 {
start:
  %0 = icmp ne i32* %a, null
  tail call void @llvm.assume(i1 %0)
  %1 = icmp ne i32* %b, null
  tail call void @llvm.assume(i1 %1)
  %2 = icmp ne i32* %x, null
  tail call void @llvm.assume(i1 %2)
  %x.val = load i32, i32* %x, align 4
  store i32 %x.val, i32* %a, align 4, !alias.scope !0, !noalias !3
  store i32 %x.val, i32* %b, align 4, !alias.scope !3, !noalias !0
  ret void
}

Note that this contains !noalias and assume.

The "normal" way of writing this:

pub unsafe fn f2(a: *mut i32, b: *mut i32, x: *const i32) {
    let safe_a: &mut i32 = &mut *a;
    let safe_b: &mut i32 = &mut *b;
    let safe_x: &i32 = &*x;

    *safe_a = *safe_x;
    *safe_b = *safe_x;
}

Has no such metadata:

define void @_ZN10playground2f217h6945a283c9ef76d3E(i32* nocapture %a, i32* nocapture %b, i32* nocapture readonly %x) unnamed_addr #0 {
start:
  %0 = load i32, i32* %x, align 4
  store i32 %0, i32* %a, align 4
  %1 = load i32, i32* %x, align 4
  store i32 %1, i32* %b, align 4
  ret void
}

It appears that noalias information only applies at function boundaries.

@eddyb
Copy link
Member

eddyb commented Sep 22, 2018

Do we have an issue for !alias.scope and !noalias? cc @rkruppe

@nagisa
Copy link
Member

nagisa commented Sep 22, 2018

@eddyb An issue for !alias.scope was linked above: #16515

@eddyb
Copy link
Member

eddyb commented Sep 22, 2018

@nagisa Oops, the way I read the start of the comment, I didn't think to check, my bad.

@shepmaster
Copy link
Member

#16515 is titled "make use of LLVM's scoped noalias metadata"

We are making use of it, for function arguments. We are not making use of it in every case. If the issues are the same, perhaps one of y'all would be good enough to improve the title of that issue to make it more discoverable, and maybe update it to have a list of places that should use the metadata and don't. Otherwise, it reads a lot like a nebulous "make the compiler make faster code" issue.

@eddyb
Copy link
Member

eddyb commented Sep 23, 2018

"metadata" is not function argument/return attributes, but rather the !foo things on instructions, and we don't ever emit !noalias AFAIK.

@shepmaster
Copy link
Member

and we don't ever emit !noalias AFAIK.

I am missing something, please help me understand what. There's literally !noalias in the LLVM IR I pasted earlier:

pub unsafe fn f(a: *mut i32, b: *mut i32, x: *const i32) {
    (|safe_a: &mut i32, safe_b: &mut i32, safe_x: &i32| {
        *safe_a = *safe_x;
        *safe_b = *safe_x;
    })(&mut *a, &mut *b, &*x)
}

This results in the LLVM IR:

; playground::f1
; Function Attrs: nounwind uwtable
define void @_ZN10playground2f117h829c91c07d14df5dE(i32* %a, i32* %b, i32* readonly %x) unnamed_addr #1 {
start:
  %0 = icmp ne i32* %a, null
  tail call void @llvm.assume(i1 %0)
  %1 = icmp ne i32* %b, null
  tail call void @llvm.assume(i1 %1)
  %2 = icmp ne i32* %x, null
  tail call void @llvm.assume(i1 %2)
  %x.val = load i32, i32* %x, align 4
  store i32 %x.val, i32* %a, align 4, !alias.scope !0, !noalias !3
  store i32 %x.val, i32* %b, align 4, !alias.scope !3, !noalias !0
  ret void
}

Zooming and enhancing:

store i32 %x.val, i32* %a, align 4, !alias.scope !0, !noalias !3
store i32 %x.val, i32* %b, align 4, !alias.scope !3, !noalias !0

@eddyb
Copy link
Member

eddyb commented Sep 23, 2018

@shepmaster LLVM generates that itself after inlining noalias attributes (which are much easier to generate, as !alias.scope is not involved, and we've had them for ages).
The "smoking gun" is llvm.assume being called (it comes from nonnull / dereferenceable) - we try to avoid calling llvm.assume, as it has, historically, slowed down/broken optimizations.

One more recent example is I wanted use to emit align on (pointer) arguments in #45225, but had to give up because some tests don't optimize, due to LLVM's inliner generating llvm.assume's.

@eddyb eddyb added I-nominated T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Sep 23, 2018
@eddyb
Copy link
Member

eddyb commented Sep 23, 2018

Nominated for compiler team discussion - how should we track these instances of #16515?
I edited its description to reflect the current status, but can we do something more organized?
Do we want to close and open a new tracking issue?
(prompted by @shepmaster in #38941 (comment))

@pnkfelix pnkfelix added A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. WG-llvm Working group: LLVM backend code generation labels Oct 25, 2018
@pnkfelix
Copy link
Member

So we very briefly discussed this at the T-compiler meeting.

I'm not sure what path we want to use going forward to track work items like this.

One possibility is having separate issues .. and then making #16515 a metabug tracking them?

Another possibility is a separate github project. Not sure if that setup is helpful or harmful for a topic like this.

@eddyb were there any particular thoughts/ideas you had? Or would you simply be disposed to close this and open a fresh tracking issue, as you mentioned above?

@nagisa nagisa added P-medium Medium priority and removed I-nominated labels Nov 1, 2018
@eddyb
Copy link
Member

eddyb commented Nov 5, 2018

@pnkfelix I was specifically wondering if we should close #16515 or keep using it.

@workingjubilee workingjubilee added the C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such label Oct 8, 2023
@clubby789
Copy link
Contributor

The original example now produces, eliding the second read

f:
        mov     eax, dword ptr [rdx]
        mov     dword ptr [rdi], eax
        mov     dword ptr [rsi], eax
        ret

as of Rust 1.78. Invoking with an aliasing a and x causes Miri to report UB. This is probably due to the LLVM bump in that version

@RalfJung
Copy link
Member

We still don't emit any noalias for code like that though, so it has to optimize assuming the pointers alias.

@Dushistov
Copy link
Contributor

Now (with current nightly)

pub unsafe fn f(a: *mut i32, b: *mut i32, x: *const i32) {
  *a = *x;
  *b = *x;
}

compiled (-O) to

f:
        mov     eax, dword ptr [rdx]
        mov     dword ptr [rdi], eax
        mov     dword ptr [rsi], eax
        ret

so x deference only once. See https://godbolt.org/z/9WGxGsdzG .

pub fn g(a: *mut i32, b: *mut i32, x: *const i32) {
  let safe_a = unsafe { &mut *a };
  let safe_b = unsafe { &mut *b };
  let safe_x = unsafe { &*x };
  *safe_a = *safe_x;
  *safe_b = *safe_x;
}

produces the same assembly https://godbolt.org/z/oa557jqcj ,

so I suppose the issue can be closed?

@RalfJung
Copy link
Member

RalfJung commented Dec 14, 2024

The reason *x is run only once is that due to alignment, *x and *a are either the exact same memory or disjoint. Of they are the same memory, *a = *x is a NOP and we can keep using the old value. If they are disjoint, then obviously we can keep using the old value.

So, what we need here is a better example; the issue almost certainly is still present.

@RalfJung
Copy link
Member

I have updated the example. But I am not sure if it's worth keeping the issue, this is basically a duplicate of #16515.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-enhancement Category: An issue proposing an enhancement or a PR with one. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-slow Issue: Problems and improvements with respect to performance of generated code. P-medium Medium priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-llvm Working group: LLVM backend code generation
Projects
None yet
Development

No branches or pull requests