Missed optimization: `_ => 0` generates worse code than `0 => 0, _ => unreachable!()` #118306

mcy · 2023-11-26T04:53:04Z

Consider the following functions (https://godbolt.org/z/a8r3Tc7TE):

pub fn faster(input: u64) -> u64 {
  match input % 4 {
    0 => 0,
    1 | 2 => 1,
    3 => 2,
    _ => unreachable!(),
  }
}

pub fn branchy(input: u64) -> u64 {
  match input % 4 {
    1 | 2 => 1,
    3 => 2,
    _ => 0,
  }
}

These functions have identical behavior: they map input to input % 4 - (input % 4 / 2). In the former case, LLVM generates a nice lookup table for us, but in the latter, it emits an extra branch. The only difference is that I've used _ => ... to avoid needing to write an unreachable-by-optimization branch.

If we look at the generated IR (after -Cpasses=strip,mem2reg,simplifycfg):

define i64 @faster(i64 %0) unnamed_addr #0 {
  %2 = urem i64 %0, 4
  switch i64 %2, label %.unreachabledefault [
    i64 0, label %5
    i64 1, label %3
    i64 2, label %3
    i64 3, label %4
  ]

.unreachabledefault:                              ; preds = %1
  unreachable

3:                                                ; preds = %1, %1
  br label %5

4:                                                ; preds = %1
  br label %5

5:                                                ; preds = %1, %4, %3
  %.0 = phi i64 [ 2, %4 ], [ 1, %3 ], [ 0, %1 ]
  ret i64 %.0
}

define i64 @branchy(i64 %0) unnamed_addr #0 {
  %2 = urem i64 %0, 4
  switch i64 %2, label %5 [
    i64 1, label %3
    i64 2, label %3
    i64 3, label %4
  ]

3:                                                ; preds = %1, %1
  br label %5

4:                                                ; preds = %1
  br label %5

5:                                                ; preds = %1, %4, %3
  %.0 = phi i64 [ 2, %4 ], [ 1, %3 ], [ 0, %1 ]
  ret i64 %.0
}

The problem is clear: LLVM does not seem to realize that it can trivially transform branchy to faster here, by observing that the default in the switch is only taken when %2 == 0.

I suspect this is more LLVM bug than Rust bug, but it feels fixable by a MIR peephole optimization? Unclear. The _ => 0 code I wrote is an attractive nuisance that I imagine other people writing, too, so perhaps there is value to seeing if this optimization can be made before LLVM.

This bug is also present in Clang, in case someone wants to file an LLVM bug: https://godbolt.org/z/x7rec97E7. It's unclear to me if this is the sort of optimization Clang would do in the frontend instead of in LLVM; could go either-or here, tbh.

The text was updated successfully, but these errors were encountered:

DianQK · 2023-11-26T12:36:37Z

Upstream issue: llvm/llvm-project#73446.

@rustbot claim

DianQK · 2024-01-03T10:08:34Z

@rustbot label llvm-fixed-upstream

nikic · 2024-02-14T09:07:23Z

It looks like this is fixed since 1.75, but I don't know what fixed it: https://godbolt.org/z/eGnWbxbG4

DianQK · 2024-02-20T06:15:14Z

I don't think it has been fixed. It looks like the function is not submitted: https://godbolt.org/z/YdqWq8hbb.
Bisected to 5d5edf0? cc @saethlin

saethlin · 2024-02-20T06:43:02Z

You need to stick #[inline(never)] or #[no_mangle] on the function otherwise when CE compiles the crate the function is only lowered to MIR as if it has the #[inline] attribute.

DianQK · 2024-02-20T07:05:39Z

You need to stick #[inline(never)] or #[no_mangle] on the function otherwise when CE compiles the crate the function is only lowered to MIR as if it has the #[inline] attribute.

Normally I would do this. Maybe we should mention this somewhere to avoid submitting invalid code to godbolt?

nikic · 2024-02-20T09:00:31Z

Oops, sorry. I saw that one function was generated and assumed the other one got merged...

blyxyas · 2024-05-23T11:58:51Z

Seems like this has been reverted :/, look at the generated assembly with Copt-level=3

DianQK · 2024-05-23T13:09:23Z

Seems like this has been reverted :/, look at the generated assembly with Copt-level=3

Can you explain why you think that? I don't see any changes: https://godbolt.org/z/rrb5oKbjb.

BTW, I can reland the upstream patch now.

blyxyas · 2024-05-23T15:50:27Z

I think that it's been reverted because the assembly output contains this:

faster:
        and     edi, 3
        lea     rax, [rip + .Lswitch.table.faster]
        mov     rax, qword ptr [rax + 8*rdi]
        ret

branchy:
        and     edi, 3
        lea     rax, [rdi - 1]
        cmp     rax, 2
        ja      .LBB1_1
        lea     rax, [rip + .Lswitch.table.branchy]
        mov     rax, qword ptr [rax + 8*rdi - 8]
        ret

The branchy branch still has some branching (this can also be seen on the LLVM IR, branchy has 12 lines and 2 br jumps, while faster has four lines and no branches)

DianQK · 2024-05-24T13:07:14Z

Ah, I think you're saying that this optimization is still in a missing state, right?

blyxyas · 2024-05-24T23:05:16Z

Yep, I thought you meant that you implemented the optimization, is it not implemented? 😅

DianQK · 2024-05-25T02:49:44Z

Yep, I thought you meant that you implemented the optimization, is it not implemented? 😅

Yes, I have implemented it, but due to the compilation time issue mentioned in llvm/llvm-project#78578, I had to revert the commit. Now I have relanded it: llvm/llvm-project#73446 (comment).

@rustbot label +llvm-fixed-upstream

nikic · 2024-08-01T14:02:27Z

Confirmed fixed by #127513, needs codegen test.

Add a set of tests for LLVM 19 Close rust-lang#107681. Close rust-lang#118306. Close rust-lang#126585. r? compiler

Add a set of tests for LLVM 19 Close rust-lang#107681. Close rust-lang#118306. Close rust-lang#126585. r? compiler try-job: i686-msvc try-job: arm-android try-job: test-various

Add a set of tests for LLVM 19 Close rust-lang#107681. Close rust-lang#118306. Close rust-lang#126585. r? compiler

rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Nov 26, 2023

DianQK mentioned this issue Nov 26, 2023

Remove the default branch of a switch llvm/llvm-project#73446

Closed

rustbot assigned DianQK Dec 28, 2023

rustbot added the llvm-fixed-upstream Issue expected to be fixed by the next major LLVM upgrade label Jan 3, 2024

nikic added E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. and removed llvm-fixed-upstream Issue expected to be fixed by the next major LLVM upgrade labels Feb 14, 2024

nikic removed the E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. label Feb 20, 2024

rustbot added the llvm-fixed-upstream Issue expected to be fixed by the next major LLVM upgrade label May 25, 2024

nikic added E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. and removed llvm-fixed-upstream Issue expected to be fixed by the next major LLVM upgrade labels Aug 1, 2024

DianQK mentioned this issue Aug 3, 2024

Add a set of tests for LLVM 19 #128584

Merged

bors added a commit to rust-lang-ci/rust that referenced this issue Aug 4, 2024

Auto merge of rust-lang#128584 - DianQK:tests-for-llvm-19, r=nikic

df62a42

Add a set of tests for LLVM 19 Close rust-lang#107681. Close rust-lang#118306. Close rust-lang#126585. r? compiler

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Aug 7, 2024

Rollup merge of rust-lang#128584 - DianQK:tests-for-llvm-19, r=nikic

1987f15

Add a set of tests for LLVM 19 Close rust-lang#107681. Close rust-lang#118306. Close rust-lang#126585. r? compiler

bors added a commit to rust-lang-ci/rust that referenced this issue Aug 9, 2024

Auto merge of rust-lang#128584 - DianQK:tests-for-llvm-19, r=nikic

c80d992

Add a set of tests for LLVM 19 Close rust-lang#107681. Close rust-lang#118306. Close rust-lang#126585. r? compiler

bors closed this as completed in 69b380d Aug 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missed optimization: `_ => 0` generates worse code than `0 => 0, _ => unreachable!()` #118306

Missed optimization: `_ => 0` generates worse code than `0 => 0, _ => unreachable!()` #118306

mcy commented Nov 26, 2023 •

edited by nikic

Loading

DianQK commented Nov 26, 2023 •

edited

Loading

DianQK commented Jan 3, 2024

nikic commented Feb 14, 2024

DianQK commented Feb 20, 2024

saethlin commented Feb 20, 2024

DianQK commented Feb 20, 2024

nikic commented Feb 20, 2024

blyxyas commented May 23, 2024

DianQK commented May 23, 2024

blyxyas commented May 23, 2024 •

edited

Loading

DianQK commented May 24, 2024

blyxyas commented May 24, 2024

DianQK commented May 25, 2024

nikic commented Aug 1, 2024

Missed optimization: _ => 0 generates worse code than 0 => 0, _ => unreachable!() #118306

Missed optimization: _ => 0 generates worse code than 0 => 0, _ => unreachable!() #118306

Comments

mcy commented Nov 26, 2023 • edited by nikic Loading

DianQK commented Nov 26, 2023 • edited Loading

DianQK commented Jan 3, 2024

nikic commented Feb 14, 2024

DianQK commented Feb 20, 2024

saethlin commented Feb 20, 2024

DianQK commented Feb 20, 2024

nikic commented Feb 20, 2024

blyxyas commented May 23, 2024

DianQK commented May 23, 2024

blyxyas commented May 23, 2024 • edited Loading

DianQK commented May 24, 2024

blyxyas commented May 24, 2024

DianQK commented May 25, 2024

nikic commented Aug 1, 2024

Missed optimization: `_ => 0` generates worse code than `0 => 0, _ => unreachable!()` #118306

Missed optimization: `_ => 0` generates worse code than `0 => 0, _ => unreachable!()` #118306

mcy commented Nov 26, 2023 •

edited by nikic

Loading

DianQK commented Nov 26, 2023 •

edited

Loading

blyxyas commented May 23, 2024 •

edited

Loading