Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set RUST_BACKTRACE=1 for production services: create a crate and use it #5360

Open
5 tasks
sunshowers opened this issue Mar 30, 2024 · 8 comments
Open
5 tasks

Comments

@sunshowers
Copy link
Contributor

sunshowers commented Mar 30, 2024

While debugging an instance of #2416, I saw at gc08's /pool/ext/8a199f12-4f5c-483a-8aca-f97856658a35/crypt/debug/oxz_nexus_65a11c18-7f59-41ac-b9e7-680627f996e7/oxide-nexus:default.log.1711665000:

thread 'tokio-runtime-worker' panicked at nexus/db-queries/src/db/sec_store.rs:65:60:
called `Result::unwrap()` on an `Err` value: InternalError { internal_message: "database error (kind = Unknown): result is ambiguous: error=rpc error: code = Unavailable desc = error reading from server: read tcp [fd00:1122:3344:109::3]:56722->[fd00:1122:3344:105::3]:32221: read: connection reset by peer [exhausted]\n" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
[ Mar 28 22:01:58 Stopping because all processes in service exited. ]
[ Mar 28 22:01:58 Executing stop method (:kill). ]

In this case the issue is pretty clear, but I'm wondering if we've considered setting RUST_BACKTRACE=1 in our production environment. Having backtraces is something that can definitely aid in debugging, but maybe it isn't a big deal because the core file can show what's going on. (But see #5359.)

According to https://stackoverflow.com/questions/29421727/how-much-overhead-does-rust-backtrace-1-have it seems like there's some performance cost, so we'd have to measure it carefully.

Wonder if @hawkw has thoughts here.

Tasks

@davepacheco
Copy link
Collaborator

I definitely think this is worthwhile. Even in environments where I've had easy access to core files and easy ways to get stuff like a stack trace out, it was still very valuable to have the runtime-printed stack trace available.

@jclulow
Copy link
Collaborator

jclulow commented Apr 1, 2024

Is it possible to build the binary so it defaults to this behaviour, instead of requiring the environment be set?

@hawkw
Copy link
Member

hawkw commented Apr 1, 2024

Wonder if @hawkw has thoughts here.

I don't have a strong opinion --- IMO, we should have some way of collecting backtraces from production crashes; if we can get this data from the core, it's maybe less pressing, but I would err on the side of including them.

@jgallagher
Copy link
Contributor

Is it possible to build the binary so it defaults to this behaviour, instead of requiring the environment be set?

It doesn't look like it. The default panic hook calls get_backtrace_style, which reads from RUST_BACKTRACE unless set_backtrace_style has been called. set_backtrace_style is unstable (rust-lang/rust#93346), so we can't call that even if we wanted to.

We could set RUST_BACKTRACE from inside the program, presumably as one of the first thing we do in main? 😬 Gross, but it works, so I thought I'd mention it.

@sunshowers
Copy link
Contributor Author

I like the idea of setting RUST_BACKTRACE=1 within main. I'd do something like if RUST_BACKTRACE isn't set, set it to 1.

@sunshowers
Copy link
Contributor Author

(Note to people in the future wondering -- setting a panic hook isn't enough. For example, anyhow reads RUST_BACKTRACE).

@sunshowers
Copy link
Contributor Author

Another consideration here is whether we want anyhow to also capture backtraces. That is controlled with both RUST_BACKTRACE and another env var, RUST_LIB_BACKTRACE. anyhow documents the logic.

@sunshowers sunshowers changed the title Should we run our production services with RUST_BACKTRACE=1? Set RUST_BACKTRACE=1 for production services: create a crate and use it Apr 2, 2024
@sunshowers
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants