Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rustup uses backtrace regardless of RUST_BACKTRACE #591

Closed
pravic opened this issue Jul 18, 2016 · 21 comments
Closed

Rustup uses backtrace regardless of RUST_BACKTRACE #591

pravic opened this issue Jul 18, 2016 · 21 comments

Comments

@pravic
Copy link

pravic commented Jul 18, 2016

I noticed that rustup update scans my HDD every time it been called.

I found that it always initializes a symbol helper (dbghelp!SymInitialize) at begin.

Setting RUST_BACKTRACE=0 or removing RUST_BACKTRACE does not help.

Any explanation/recomendations about it? I tested cargo also, it does not load .pdb every time like rustup does.

@Diggsey
Copy link
Contributor

Diggsey commented Jul 18, 2016

I suspect this is due to the use of the error_chain crate, which itself depends on the backtrace crate, and is used to capture stack traces within Error objects.

@pravic
Copy link
Author

pravic commented Jul 18, 2016

@Diggsey thanks regardless of I am not so lucky to hear about that.

Any way to disable it?

I've set _NT_SYMBOL_PATH to NULL before calling rustup and it helps partially: rustup scans only root directory of the current drive about PDB files.

@brson
Copy link
Contributor

brson commented Jul 19, 2016

@pravic Does this only happen when rustup ends with an error or does it also happen when rustup exits successfully? (I don't know why any backtrace code would be running on success)

I'd like to know more about what it means to 'scan your HDD'. Why is this a problem for you?

@pravic
Copy link
Author

pravic commented Jul 19, 2016

@brson There are no errors in regular rustup update, so thats why I complaining about it.

The problem is in unnesessary HDD access which is quiet noisy and slowly. And thats despite of incorrect behavior with RUST_BACKTRACE ignoring.

@brson
Copy link
Contributor

brson commented Jul 19, 2016

Here's the strace output of rustup --help for the open and stat syscalls:

open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=38191, ...}) = 0
open("/lib/x86_64-linux-gnu/libz.so.1", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=104824, ...}) = 0
open("/lib/x86_64-linux-gnu/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=14608, ...}) = 0
open("/lib/x86_64-linux-gnu/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0755, st_size=138744, ...}) = 0
open("/lib/x86_64-linux-gnu/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=89696, ...}) = 0
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0755, st_size=1864888, ...}) = 0
open("/lib/x86_64-linux-gnu/libm.so.6", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=1088952, ...}) = 0
open("/lib/x86_64-linux-gnu/librt.so.1", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=31712, ...}) = 0
open("/sys/devices/system/cpu/online", O_RDONLY|O_CLOEXEC) = 3
open("/proc/self/maps", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
open("/proc/self/exe", O_RDONLY)        = 3
open("/lib/x86_64-linux-gnu/libz.so.1", O_RDONLY) = 3
open("/lib/x86_64-linux-gnu/libdl.so.2", O_RDONLY) = 3
open("/lib/x86_64-linux-gnu/libpthread.so.0", O_RDONLY) = 3
open("/lib/x86_64-linux-gnu/libgcc_s.so.1", O_RDONLY) = 3
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY) = 3
open("/lib64/ld-linux-x86-64.so.2", O_RDONLY) = 3
open("/lib/x86_64-linux-gnu/libm.so.6", O_RDONLY) = 3
open("/lib/x86_64-linux-gnu/librt.so.1", O_RDONLY) = 3
stat("/home/brian/.cargo/bin/rustup-init", 0x7ffe018adbf0) = -1 ENOENT (No such file or directory)
stat("/home/brian/.cargo/bin/multirust-setup", 0x7ffe018adbf0) = -1 ENOENT (No such file or directory)

That doesn't look unexpected, though I'm not sure what the fstats are. Mostly the dynamic linker. I bet what's happening is that there are errors triggered internally, that are handled and turned into no-errors. The generation of those errors triggers a backtrace that is never seen.

So the same strace on rustup update foo (which triggers an error):

open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=38191, ...}) = 0
open("/lib/x86_64-linux-gnu/libz.so.1", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=104824, ...}) = 0
open("/lib/x86_64-linux-gnu/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=14608, ...}) = 0
open("/lib/x86_64-linux-gnu/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0755, st_size=138744, ...}) = 0
open("/lib/x86_64-linux-gnu/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=89696, ...}) = 0
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0755, st_size=1864888, ...}) = 0
open("/lib/x86_64-linux-gnu/libm.so.6", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=1088952, ...}) = 0
open("/lib/x86_64-linux-gnu/librt.so.1", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=31712, ...}) = 0
open("/sys/devices/system/cpu/online", O_RDONLY|O_CLOEXEC) = 3
open("/proc/self/maps", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
open("/proc/self/exe", O_RDONLY)        = 3
open("/lib/x86_64-linux-gnu/libz.so.1", O_RDONLY) = 3
open("/lib/x86_64-linux-gnu/libdl.so.2", O_RDONLY) = 3
open("/lib/x86_64-linux-gnu/libpthread.so.0", O_RDONLY) = 3
open("/lib/x86_64-linux-gnu/libgcc_s.so.1", O_RDONLY) = 3
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY) = 3
open("/lib64/ld-linux-x86-64.so.2", O_RDONLY) = 3
open("/lib/x86_64-linux-gnu/libm.so.6", O_RDONLY) = 3
open("/lib/x86_64-linux-gnu/librt.so.1", O_RDONLY) = 3
stat("/home/brian/.cargo/bin/rustup-init", 0x7ffcbcd299f0) = -1 ENOENT (No such file or directory)
stat("/home/brian/.cargo/bin/multirust-setup", 0x7ffcbcd299f0) = -1 ENOENT (No such file or directory)
stat("/home/brian/.terminfo", 0x7ffcbcd28c10) = -1 ENOENT (No such file or directory)
stat("/etc/terminfo", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat("/etc/terminfo/s/screen", 0x7ffcbcd28c10) = -1 ENOENT (No such file or directory)
stat("/etc/terminfo/73/screen", 0x7ffcbcd28c10) = -1 ENOENT (No such file or directory)
stat("/lib/terminfo", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat("/lib/terminfo/s/screen", {st_mode=S_IFREG|0644, st_size=1587, ...}) = 0
open("/lib/terminfo/s/screen", O_RDONLY|O_CLOEXEC) = 3
stat("/home/brian/.multirust", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
stat("/home/brian/.multirust/version", 0x7ffcbcd27e50) = -1 ENOENT (No such file or directory)
stat("/home/brian/.multirust", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
stat("/home/brian/.multirust/settings.toml", {st_mode=S_IFREG|0664, st_size=169, ...}) = 0
open("/home/brian/.multirust/settings.toml", O_RDONLY|O_CLOEXEC) = 3
stat("/home/brian/.multirust/toolchains/foo", 0x7ffcbcd28d70) = -1 ENOENT (No such file or directory)
stat("/home/brian/.terminfo", 0x7ffcbcd32870) = -1 ENOENT (No such file or directory)
stat("/etc/terminfo", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat("/etc/terminfo/s/screen", 0x7ffcbcd32870) = -1 ENOENT (No such file or directory)
stat("/etc/terminfo/73/screen", 0x7ffcbcd32870) = -1 ENOENT (No such file or directory)
stat("/lib/terminfo", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat("/lib/terminfo/s/screen", {st_mode=S_IFREG|0644, st_size=1587, ...}) = 0
open("/lib/terminfo/s/screen", O_RDONLY|O_CLOEXEC) = 3

Suddenly some code is very interested in terminfo, but still nothing that looks particularly out of the ordinary.

But I guess since you mentioned .pdb files you are on Windows. Maybe there are different performance characteristics for obtaining backtraces there.

@alexcrichton
Copy link
Member

@pravic when you mention that SymInitialize is on the stack, is that the symbol which seems like it's causing all of the I/O access? The backtrace crate does indeed call that and it's pretty liberal about its usage of it, there's no global cache or anything it just ends up calling it once per stack trace and once per symbol. If it's a slow function, then it should likely definitely cache that!

@brson I think that error-chain captures a stack trace when any error happens, right? That means that a backtrace would be captured for an error which is later ignored I believe? It may be worth optimizing some stuff in backtrace-rs for this use case if it's causing perf problems.

@brson
Copy link
Contributor

brson commented Jul 19, 2016

Yes, backtraces are generated for all errors regardless of whether they are displayed to the user.

alexcrichton added a commit to rust-lang/backtrace-rs that referenced this issue Jul 19, 2016
Looks like this can be an expensive operation [1] so we shouldn't be calling
it... once per symbol and stack trace!

[1]: rust-lang/rustup#591
@alexcrichton
Copy link
Member

Ok, I've pushed a commit to backtrace-rs which should greatly reduce the number of times SymInitialize is called. @brson do you want to try to get a windows build of rustup with that commit and see if @pravic can test it out?

@brson
Copy link
Contributor

brson commented Jul 20, 2016

I'm having some issues with my windows build at the moment, but when I've resolved them I'll post a testing build.

@brson
Copy link
Contributor

brson commented Jul 20, 2016

@pravic Here's a build that contains @alexcrichton's fixes. Maybe you can test it and see if it performs better for you: https://s3-us-west-1.amazonaws.com/static-rust-lang-org/rustup/temp-testing/rustup-init.exe

@pravic
Copy link
Author

pravic commented Jul 20, 2016

Just checked (rustup 0.3.0):
rustup --help (of course, without errors) produces 3977 filesystem events to query and read symbol files (pdb) which takes around 4 seconds. And this is without online symbol server access — it would take minutes in that case.

Well, I checked new build. Behavior the same, since all symbols were loaded on first SymInitialize call, because it enumerates all process modules and tries to load (or download) pdb file for each.

But yes, backtrace calls SymInitialize many times before patch:

$ rustup --help
dbghelp_init!
dbghelp_init!
dbghelp_init!
dbghelp_init!
dbghelp_init!
dbghelp_init!
dbghelp_init!
dbghelp_init!
dbghelp_init!
dbghelp_init!
dbghelp_init!
dbghelp_init!
dbghelp_init!
dbghelp_init!
rustup 0.3.0
The Rust toolchain installer

The only solution I found, it is disable "dbghelp" feature of backtrace completely. I definitely do not need a flying error chains with linked backtraces, RUST_BACKTRACE is enough for me when it needed.

@brson
Copy link
Contributor

brson commented Jul 20, 2016

I wonder why running rustup --help takes 4 seconds on your machine but not mine. How are you determining the filesystem event count? I'd like to try here.

@pravic
Copy link
Author

pravic commented Jul 21, 2016

I'm amenable to making this behavior an option at least, but I also want to figure out why you are seeing so much disk thrashing on windows

Guess, thats how windows symbol handler works.

rustup --help loads ~50 dlls in process. Symbol handler enumerates all of them and tries to find an appropriate PDB file for each module. If module not found and online symbol server is specified in _NT_SYMBOL_PATH, dbghelp will try to lookup and download PDB file from the server.

@brson
Copy link
Contributor

brson commented Jul 26, 2016

Hm, @pravic if the simple rustup --help command triggers this behavior then I don't see how only doing the backtrace on RUST_BACKTRACE would help - rustup --help never generates a backtrace.

Do you see the same behavior with rustc --help? Perhaps the in-tree libbacktrace is configured differently than the out-of-tree version.

@pravic
Copy link
Author

pravic commented Jul 26, 2016

cargo --help and rustc --help does not gather symbols as well.

@brson
Copy link
Contributor

brson commented Jul 29, 2016

Thanks @pravic. @alexcrichton can you think of any reason rustc and rustup would differ in this behavior? Apparently rustup --help does this disk thrashing for backtraces but rustc --help does not.

@alexcrichton
Copy link
Member

alexcrichton commented Jul 29, 2016

The standard library doesn't hit backtrace logic until a panic happens, which rustc --help does not trigger, but if rustup hits backtrace logic on every error then it may be the case that rustup --help triggers a normal warning somewhere (e.g. a file not existing, a directory not existing, etc). Maybe?

@alexcrichton
Copy link
Member

For example when I run strace -f rustup --help I see:

stat("/home/alex/.cargo/bin/multirust", 0x7fff237e74c0) = -1 ENOENT (No such file or directory)
getcwd("/home/alex", 512)               = 11
stat("/home/alex/.cargo/bin/rustup-init", 0x7fff237e1780) = -1 ENOENT (No such file or directory)
stat("/home/alex/.cargo/bin/multirust-setup", 0x7fff237e1780) = -1 ENOENT (No such file or directory)

Presumably a backtrace is getting generated each time that happens?

@brson
Copy link
Contributor

brson commented Jul 29, 2016

Oh duh. There is indeed some code that runs even before argument parsing.

Argh, I wish I knew why this disk access is so horrible on @pravic's system but doesn't seem to be on others.

It really isn't that important to carry the backtraces around most of the time so maybe it is worth just converting error-chain to obey RUST_BACKTRACE unconditionally. rust-lang-deprecated/error-chain#7 (comment)

@brson
Copy link
Contributor

brson commented Jul 29, 2016

Next step is to make the modification to error-chain then upgrade it in rustup. The rustup error code tries to show backtraces on --verbose, so that will need to be reexamined. Probably the error-chain modification will turn the backtrace into an Option so rustup can use that to determine whether to show it.

brson added a commit that referenced this issue Sep 30, 2016
Update to error-chain 0.5.0 to allow optional backtrace. #591
nodakai pushed a commit to nodakai/rustup.rs that referenced this issue Apr 23, 2017
@Diggsey
Copy link
Contributor

Diggsey commented May 4, 2017

A fix for this was merged.

@Diggsey Diggsey closed this as completed May 4, 2017
alexcrichton added a commit to rust-lang/backtrace-rs that referenced this issue Jul 31, 2019
This commit updates the behavior of backtraces on Windows to execute
`SymInitializeW` globally once-per-process and ignore the return value
as to whether it succeeded or not. This undoes previous work in this
crate to specifically check the return value, and this is a behavior
update for the standard library when this goes into the standard
library.

The `SymInitializeW` function is required to be called on MSVC before
any backtraces can be captured or before any addresses can be
symbolized. This function is quite slow. It can only be called
once-per-process and there's a corresponding `SymCleanup` to undo the
work of `SymInitializeW`.

The behavior of what to do with `SymInitializeW` has changed a lot over
time in this crate. The very first implementation for Windows paired
with `SymCleanup`. Then reports of slowness at rust-lang/rustup#591
led to ac8c6d2 where `SymCleanup` was removed. This led to #165 where
because the standard library still checked `SymInitializeW`'s return
value and called `SymCleanup` generating a backtrace with this crate
would break `RUST_BACKTRACE=1`. Finally (and expectedly) the performance
report has come back as #229 for this crate.

I'm proposing that the final nail is put in this coffin at this point
where this crate will ignore the return value of `SymInitializeW`,
initializing symbols once per process globally. This update will go into
the standard library and get updated on the stable channel eventually,
meaning both libstd and this crate will be able to work with one another
and only initialize the process's symbols once globally. This does mean
that there will be a period of "breakage" where we will continue to make
`RUST_BACKTRACE=1` not useful if this crate is used on MSVC, but that's
sort of the extension of the status quo as of a month or so ago. This
will eventually be fixed once the stable channel of Rust is updated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants