Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

static array of zeroes can take minutes to lint check #55795

Closed
kazcw opened this issue Nov 8, 2018 · 8 comments · Fixed by #120069
Closed

static array of zeroes can take minutes to lint check #55795

kazcw opened this issue Nov 8, 2018 · 8 comments · Fixed by #120069
Labels
A-const-eval Area: Constant evaluation (MIR interpretation) A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@kazcw
Copy link

kazcw commented Nov 8, 2018

This program takes an inordinate amount of time and memory to compile (it's worse if the array is actually used, but this is a minimal test case):

const SIZE: usize = 1 << 30;
static SLICE: [u8; SIZE] = [0u8; SIZE];
fn main() {}

I was hoping this would be a viable way to get a big chunk of .bss so I don't have to depend on an mmap crate to get a lot of static zeroes, but the effect on compile time makes that impractical.

If I run rustc -Z time-passes, the big offender is:
time: 47.666; rss: 2486MB lint checking

Although the time (and memory) is reportedly spent checking lints, setting --cap-lints allow doesn't make any difference. I'm guessing the "lint checking" pass includes some things that need to be checked even if lints are suppressed? If not, it seems like a separate issue is that a lot of work could be saved with cap-lints set (e.g. when compiling dependencies).

Here are the top results from perf report, for rustc 1.32.0-nightly (25a42b2ce 2018-11-07):

  31.62%  rustc     librustc_mir-714845413a99e6ff.so              [.] <rustc_mir::interpret::memory::Memory<'a, 'mir, 'tcx, M>>::copy_repeatedly
  22.94%  rustc     librustc_mir-714845413a99e6ff.so              [.] <core::iter::Map<I, F> as core::iter::iterator::Iterator>::fold
   7.32%  rustc     librustc-0eb8c117db37850c.so                  [.] rustc::mir::interpret::UndefMask::grow
   7.31%  rustc     librustc-0eb8c117db37850c.so                  [.] rustc::mir::interpret::UndefMask::set_range
   6.94%  rustc     libc-2.27.so                                  [.] __memmove_sse2_unaligned_erms
   5.91%  rustc     librustc_mir-714845413a99e6ff.so              [.] <rustc_mir::interpret::memory::Memory<'a, 'mir, 'tcx, M>>::check_bytes

So it looks like miri is actually creating the array and folding over it. I know it's not going to find any problems, because I have ECC memory 😆.

There are already some bugs relating to slow compilation of large arrays, with the most relevant I could find being #37155, #49330. I think this is separate from those cases because:

  • the input in this case is a [0; _] array, whereas the others initialize arrays from sequences of elements
  • those cases seem to refer to superlinear runtime; this issue appears roughly linear in time and space and only becomes noticeable for much larger arrays
  • the bottleneck in this case occurs during lint checking, which I didn't see in any other array performance bugs
@estebank estebank added I-slow Issue: Problems and improvements with respect to performance of generated code. A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html labels Nov 8, 2018
@ishitatsuyuki ishitatsuyuki added I-compiletime Issue: Problems and improvements with respect to compile times. and removed I-slow Issue: Problems and improvements with respect to performance of generated code. labels Nov 9, 2018
@dotdash dotdash added the A-const-eval Area: Constant evaluation (MIR interpretation) label Jan 6, 2019
@pnkfelix pnkfelix added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 12, 2019
@oli-obk
Copy link
Contributor

oli-obk commented May 13, 2019

Have you seen any improvement after #58556 ?

@jonas-schievink
Copy link
Contributor

rustc 1.34.0-nightly (146aa60f3 2019-02-18)   0:21.37elapsed
rustc 1.36.0-nightly (08bfe1612 2019-05-02)   0:07.22elapsed

@oli-obk
Copy link
Contributor

oli-obk commented May 14, 2019

I've wondered before whether we should special-case constants whose bits are all defined and zero. I'll open a discussion in the const-eval zulip channel.

@dbdr
Copy link

dbdr commented Nov 14, 2019

I also ran into this problem. @oli-obk, has that discussion happened?

@oli-obk
Copy link
Contributor

oli-obk commented Nov 14, 2019

Kind of. We will have special treatment for constants that consist solely of undef bytes (see #62655). This scheme can likely be extended to support all bytes being zero withour requiring time or space overhead.

@jyn514
Copy link
Member

jyn514 commented Jan 21, 2021

If not, it seems like a separate issue is that a lot of work could be saved with cap-lints set (e.g. when compiling dependencies).

I opened #74704 for this.

@jyn514
Copy link
Member

jyn514 commented Jan 21, 2021

Kind of. We will have special treatment for constants that consist solely of undef bytes (see #62655). This scheme can likely be extended to support all bytes being zero withour requiring time or space overhead.

#62655 was closed, are there still plans to implement this?

@oli-obk
Copy link
Contributor

oli-obk commented Jan 21, 2021

are there still plans to implement this?

Not right now... there are other more important construction sites in const eval.

bors added a commit to rust-lang-ci/rust that referenced this issue Jan 17, 2024
Optimize large array creation in const-eval

This changes repeated memcpy's to a memset for the case that we're propagating a single byte into a region of memory. It also optimizes the element-by-element copies to have a tighter loop; I'm pretty sure the old code was actually doing a multiply within each loop iteration.

For an 8GB array (`static SLICE: [u8; SIZE] = [0u8; 1 << 33];`) this takes us from ~23 seconds to ~6 seconds locally, which is spent roughly 50/50 in (a) memset to zero and (b) memcpy of the original place into a new place, when popping stack frame. The latter seems hard to avoid but is a big memcpy (since we're copying the type rather than initializing a region, so it's pretty fast), and the first is as good as it's going to get without special casing constant-valued arrays.

Closes rust-lang#55795. (That issue's references to lint checking don't appear true anymore, but I think this closes that case as something that is slow due to *time* pretty fully. An 8GB array taking only 6 seconds feels reasonable enough to not merit further tracking).
bors added a commit to rust-lang-ci/rust that referenced this issue Jan 19, 2024
Optimize large array creation in const-eval

This changes repeated memcpy's to a memset for the case that we're propagating a single byte into a region of memory. It also optimizes the element-by-element copies to have a tighter loop; I'm pretty sure the old code was actually doing a multiply within each loop iteration.

For an 8GB array (`static SLICE: [u8; SIZE] = [0u8; 1 << 33];`) this takes us from ~23 seconds to ~6 seconds locally, which is spent roughly 50/50 in (a) memset to zero and (b) memcpy of the original place into a new place, when popping stack frame. The latter seems hard to avoid but is a big memcpy (since we're copying the type rather than initializing a region, so it's pretty fast), and the first is as good as it's going to get without special casing constant-valued arrays.

Closes rust-lang#55795. (That issue's references to lint checking don't appear true anymore, but I think this closes that case as something that is slow due to *time* pretty fully. An 8GB array taking only 6 seconds feels reasonable enough to not merit further tracking).
@bors bors closed this as completed in 16fadb3 Jan 19, 2024
lnicola pushed a commit to lnicola/rust-analyzer that referenced this issue Apr 7, 2024
Optimize large array creation in const-eval

This changes repeated memcpy's to a memset for the case that we're propagating a single byte into a region of memory. It also optimizes the element-by-element copies to have a tighter loop; I'm pretty sure the old code was actually doing a multiply within each loop iteration.

For an 8GB array (`static SLICE: [u8; SIZE] = [0u8; 1 << 33];`) this takes us from ~23 seconds to ~6 seconds locally, which is spent roughly 50/50 in (a) memset to zero and (b) memcpy of the original place into a new place, when popping stack frame. The latter seems hard to avoid but is a big memcpy (since we're copying the type rather than initializing a region, so it's pretty fast), and the first is as good as it's going to get without special casing constant-valued arrays.

Closes rust-lang/rust#55795. (That issue's references to lint checking don't appear true anymore, but I think this closes that case as something that is slow due to *time* pretty fully. An 8GB array taking only 6 seconds feels reasonable enough to not merit further tracking).
RalfJung pushed a commit to RalfJung/rust-analyzer that referenced this issue Apr 27, 2024
Optimize large array creation in const-eval

This changes repeated memcpy's to a memset for the case that we're propagating a single byte into a region of memory. It also optimizes the element-by-element copies to have a tighter loop; I'm pretty sure the old code was actually doing a multiply within each loop iteration.

For an 8GB array (`static SLICE: [u8; SIZE] = [0u8; 1 << 33];`) this takes us from ~23 seconds to ~6 seconds locally, which is spent roughly 50/50 in (a) memset to zero and (b) memcpy of the original place into a new place, when popping stack frame. The latter seems hard to avoid but is a big memcpy (since we're copying the type rather than initializing a region, so it's pretty fast), and the first is as good as it's going to get without special casing constant-valued arrays.

Closes rust-lang/rust#55795. (That issue's references to lint checking don't appear true anymore, but I think this closes that case as something that is slow due to *time* pretty fully. An 8GB array taking only 6 seconds feels reasonable enough to not merit further tracking).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-const-eval Area: Constant evaluation (MIR interpretation) A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants