Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple nested loops taking very long to compile with CPU extensions #115465

Open
Ben-Lichtman opened this issue Sep 2, 2023 · 2 comments
Open
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-compiletime Issue: Problems and improvements with respect to compile times. P-high High priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@Ben-Lichtman
Copy link
Contributor

Ben-Lichtman commented Sep 2, 2023

Code

I tried this code:

use std::collections::HashMap;

pub fn main() {
    let mut table = HashMap::new();
    
    for a in 0..0xff {
        for b in 0..0xff {
            for c in 0..0xff {
                for d in 0..0xff {
                    let hash = 5u64;
                    let trunc = hash & 0xffffffffffffff00;
                    table.insert(trunc, (a, b, c, d));
                }
            }
        }
    }
    for x in 0..0xff {
        for y in 0..0xff {
            for z in 0..0xff {
                let hash = 5u64;
                let trunc = hash & 0xffffffffffffff00;
                if let Some(orig) = table.get(&trunc) {
                    println!("Original {orig:?}");
                    println!("New ({x}, {y}, {z})")
                }
            }
        }
    }
}

This slowdown is only visible on some target-cpus (not the default x86 target), for me this was -Ctarget-cpu=znver3 but it also happens on target-cpu=native on godbolt

https://godbolt.org/z/xqnbfdxKb

I expected to see this happen: It compiles in some reasonable amount of time

Instead, this happened: It takes over 7 minutes to compile on my ryzen 5900X

I'm thinking that the compiler is aggressively trying to unroll the loops and then inline the formatting code (the compile speeds up quite a bit when I remove the prints), but that is just speculation.

Version it worked on

It's hard to track down exactly where this regression happened, but it seems to be at least working on 1.64 (takes ~20s to compile on godbolt), at 1.65 it starts timing on on cpu=znver3

Version with regression

It seems that the regression happened somewhere between 1.65 and 1.72, however I am using nightly
rustc --version --verbose:

rustc 1.74.0-nightly (2f5df8a94 2023-08-31)
binary: rustc
commit-hash: 2f5df8a94bb3c5fae4e3fcbfc8ef20f1f976cb19
commit-date: 2023-08-31
host: x86_64-unknown-linux-gnu
release: 1.74.0-nightly
LLVM version: 17.0.0

Timings

Timings

time:   0.000; rss:   99MB ->   99MB (   +0MB)  module_lints
time:   0.000; rss:   99MB ->   99MB (   +0MB)  lint_checking
time:   0.000; rss:   99MB ->   99MB (   +0MB)  check_lint_expectations
time:   0.000; rss:   98MB ->   99MB (   +1MB)  misc_checking_3
time:   0.000; rss:   99MB ->  100MB (   +0MB)  monomorphization_collector_root_collections
time:   0.001; rss:  100MB ->  102MB (   +2MB)  Inline
time:   0.000; rss:  102MB ->  102MB (   +0MB)  ReferencePropagation
time:   0.001; rss:  102MB ->  103MB (   +1MB)  ConstProp
time:   0.012; rss:  100MB ->  116MB (  +16MB)  monomorphization_collector_graph_walk
time:   0.001; rss:  116MB ->  117MB (   +2MB)  partition_and_assert_distinct_symbols
time:   0.000; rss:  122MB ->  125MB (   +3MB)  write_allocator_module
time:   0.008; rss:  128MB ->  148MB (  +20MB)  codegen_to_LLVM_IR
time:   0.022; rss:   99MB ->  148MB (  +48MB)  codegen_crate
time:   0.002; rss:  148MB ->  119MB (  -28MB)  free_global_ctxt
time:   0.004; rss:  116MB ->  118MB (   +3MB)  LLVM_lto_optimize(fnv.fb177d82fe6a19f5-cgu.2)
time:   0.020; rss:  116MB ->  127MB (  +11MB)  LLVM_lto_optimize(fnv.fb177d82fe6a19f5-cgu.0)
time: 450.155; rss:  116MB ->  135MB (  +19MB)  LLVM_lto_optimize(fnv.fb177d82fe6a19f5-cgu.1)
time: 450.215; rss:  137MB ->  136MB (   -2MB)  LLVM_passes
time:   0.000; rss:  136MB ->  129MB (   -7MB)  join_worker_thread
time: 450.207; rss:  119MB ->  129MB (   +9MB)  finish_ongoing_codegen
time:   0.000; rss:  129MB ->  128MB (   -1MB)  link_binary_check_files_are_writeable
time:   0.042; rss:  120MB ->  117MB (   -3MB)  run_linker
time:   0.043; rss:  129MB ->  117MB (  -12MB)  link_binary
time:   0.043; rss:  129MB ->  117MB (  -12MB)  link_crate
time: 450.250; rss:  119MB ->  117MB (   -3MB)  link
time: 450.292; rss:   33MB ->  100MB (  +67MB)  total

@Ben-Lichtman Ben-Lichtman added C-bug Category: This is a bug. regression-untriaged Untriaged performance or correctness regression. labels Sep 2, 2023
@rustbot rustbot added I-prioritize Issue: Indicates that prioritization has been requested for this issue. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Sep 2, 2023
@matthiaskrgr matthiaskrgr added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Sep 2, 2023
@the8472 the8472 added I-compiletime Issue: Problems and improvements with respect to compile times. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Sep 2, 2023
@Ben-Lichtman Ben-Lichtman changed the title Multiple nested loops taking very long to compile Multiple nested loops taking very long to compile with CPU extensions Sep 4, 2023
@apiraino
Copy link
Contributor

apiraino commented Sep 4, 2023

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-high +T-compiler

@rustbot rustbot added P-high High priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Sep 4, 2023
@apiraino apiraino added the E-needs-bisection Call for participation: This issue needs bisection: https://github.com/rust-lang/cargo-bisect-rustc label Sep 4, 2023
@boulanlo
Copy link

boulanlo commented Sep 8, 2023

Bisected with cargo-bisect-rustc, found a regression in nightly-2022-08-13. There are no CI artifacts so I couldn't get the cargo-bisect-rustc output, but here are the commits:

- commit[0] 2022-08-11: Auto merge of #100416 - Dylan-DPC:rollup-m344lh1, r=Dylan-DPC
- commit[1] 2022-08-11: Auto merge of #100426 - matthiaskrgr:rollup-0ks4dou, r=matthiaskrgr
- commit[2] 2022-08-12: Auto merge of #100419 - flip1995:clippyup, r=Manishearth
- commit[3] 2022-08-12: Auto merge of #99464 - nikic:llvm-15, r=cuviper
- commit[4] 2022-08-12: Auto merge of #100435 - ehuss:update-cargo, r=ehuss
- commit[5] 2022-08-12: Auto merge of #99624 - vincenzopalazzo:macros/unix_error, r=Amanieu
- commit[6] 2022-08-12: Auto merge of #100328 - davidtwco:perf-implications, r=nnethercote
- commit[7] 2022-08-12: Auto merge of #100456 - Dylan-DPC:rollup-fn17z9f, r=Dylan-DPC

I'm no Rust compiler dev, but it looks like the upgrade to LLVM-15 (#99464) might be the culprit.

@Noratrieb Noratrieb added regression-from-stable-to-stable Performance or correctness regression from one stable version to another. and removed E-needs-bisection Call for participation: This issue needs bisection: https://github.com/rust-lang/cargo-bisect-rustc regression-untriaged Untriaged performance or correctness regression. labels Nov 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-compiletime Issue: Problems and improvements with respect to compile times. P-high High priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants