Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: runtime/debug: add SetCrashHeader function #64590

Closed
EtiennePerot opened this issue Dec 6, 2023 · 16 comments
Closed

proposal: runtime/debug: add SetCrashHeader function #64590

EtiennePerot opened this issue Dec 6, 2023 · 16 comments
Labels
Milestone

Comments

@EtiennePerot
Copy link

EtiennePerot commented Dec 6, 2023

Proposal Details

Add a function like SetCrashHeader(header []byte) to runtime/debug. When called, this copies the first K bytes of header into a region of memory owned by the Go runtime. If the program later panics, the contents of this buffer are printed out before any of the usual panic information.

This buffer would have a tight maximum byte size smaller or equal to the size of a page of memory. This ensures the crashing procedure remains fast and that this feature doesn't introduce more unpredictability when a process is already crashing. This header is global, not per-goroutine. The SetCrashHeader function may be called with nil as argument in order to clear out the runtime-owned buffer.

The idea is to capture useful information about a program that's crashing if crashing does happen, and to print it in a convenient place where an out-of-program system can easily consume it. This information could include basic program information (start time, version number, build options, architecture, etc.) or structured debugging information that isn't always easily obtainable from means outside of the Go program itself.

To disambiguate between this header and the rest of the panic message, the panic output also prints an extra \n--------------- backtrace ---------------\n after the header. This is not printed when the header isn't set (or has been reset by calling SetCrashHeader(nil)).

This pairs well with proposal #42888 (runtime/debug: add SetCrashOutput(file *os.File)). With these two proposals, this allows an out-of-program crash logging system consuming the output of panic logs (which are separate from other logs when using SetCrashOutput(file *os.File)) to obtain the information in the header from the same file handle as the one it already has to receive panic data.

Example

package main
import "runtime/debug"

func init() {
	debug.SetCrashHeader([]byte("MyCrashyProgram version 0.1\nfoo bar"))
}

func main() {
	panic("oops I crashed")
}

... would output to stderr:

MyCrashyProgram version 0.1
foo bar
--------------- backtrace ---------------
panic: oops I crashed

goroutine 1 [running]:
main.main()
	/path/to/prog.go:8 +0x...
@gopherbot gopherbot added this to the Proposal milestone Dec 6, 2023
@apparentlymart
Copy link

I would love to replace some hacky stuff here with something like what's proposed:

https://github.com/hashicorp/terraform/blob/58a51bffd8565b1fe8b9e7a19227ae8eeb85c62d/internal/logging/panic.go#L36-L64
https://github.com/hashicorp/terraform/blob/58a51bffd8565b1fe8b9e7a19227ae8eeb85c62d/main.go#L68
https://github.com/hashicorp/terraform/blob/58a51bffd8565b1fe8b9e7a19227ae8eeb85c62d/internal/terraform/graph.go#L52

Currently our extra messaging for panics appears only if we've remembered to write defer logging.PanicHandler() near the start of each new goroutine, which is impossible to do if a goroutine is started by a library that isn't aware of our application.

I'm assuming that the intention of the proposal is for debug.SetCrashHeader to change the panic reporting behavior for all goroutines in the current program, rather than just one which called debug.SetCrashHeader, in which case we could avoid this special extra work for each goroutine.

Our current approach also allows us to change which exit code gets reported when the application panics, which would be nice to preserve since the runtime's default (2) accidentally collides with a different meaning of that exit code in this program, which we chose before knowing that the runtime was using it. However, I expect we'd be willing to live with the inability to change the exit code if it meant having this more robust way to add additional messaging to guide a user toward reporting the panic as a bug.

@adonovan
Copy link
Member

adonovan commented Dec 7, 2023

The output should properly frame the two parts (header, backtrace) so they can be unambiguously parsed by the consumer.

@EtiennePerot
Copy link
Author

EtiennePerot commented Dec 8, 2023

I'm assuming that the intention of the proposal is for debug.SetCrashHeader to change the panic reporting behavior for all goroutines in the current program, rather than just one which called debug.SetCrashHeader

Correct. Clarified in proposal.

The output should properly frame the two parts (header, backtrace) so they can be unambiguously parsed by the consumer.

OK, updated proposal, though I'm not sure about how to best do that. Maybe just a --------------- backtrace --------------- between the header and the panic? Feel free to suggest alternatives.

@apparentlymart
Copy link

I would personally prefer that this not introduce any more involuntary content or delimiters into the output. In my case, the additional content is for the benefit of a human reader, not for machine consumption.

If I wanted to delimit the leading messages from the backtrace, I could presumably include a delimiter at the start and end of the buffer passed to debug.SetCrashHeader. Perhaps there's justification for an additional debug.SetCrashFooter to allow also delimiting the end of the backtrace; I have no use for that myself, so I'm only suggesting it to see if it would help satisfy the framing requirement without imposing any mandatory additional content on all callers.

@adonovan
Copy link
Member

adonovan commented Dec 8, 2023

A comment such as "--- backtrace ---" does not constitute reliable framing because this string can potentially appear within the string form of the panic value (and, in theory, within the name of a file in the stack trace, though that's very unlikely).

This could potentially be a security vulnerability: for privacy reasons, the Go telemetry system reports the stack but not the panic value, as the stack consists only of strings from the executable, whereas the panic value may contain arbitrary user data. (We disregard the potential for encoding information in the sequence of stack frames as a side channel, but it is real.) An attacker that is able to influence the panic value could prepend the string --- backtrace ---, causing the crash reporting system that discards the portion before that string to leak the rest of it.

For me, the primary motivation to add the SetCrashOutput function is to enable automated crash reporting, for which it is desirable to record the panic value and the stack dump with as much structure as possible. Indeed, I wonder whether we should add an option to record stacks in some form (e.g. JSON) that makes them easier to parse.

@ianlancetaylor ianlancetaylor moved this to Incoming in Proposals Dec 14, 2023
@adonovan
Copy link
Member

It dawned on me this morning that, so long as all the information we get out of the stack trace in the crash report is corroborated with the executable's symbol table, there's no way for the panic value to inject arbitrary strings into the telemetry counter name.

@rsc
Copy link
Contributor

rsc commented Feb 7, 2024

An alternative to make the crash dumps unambiguous would be to make the panic value printer insert a tab after every newline, like t.Log does. Then a "goroutine stack" inside a panic value won't look like an ordinary goroutine stack, because it will be indented.

@seankhliao
Copy link
Member

I've wanted a fixed header/footer for go panic / crash(fault?) messages.
It would make writing log parsers less ambiguous since most give us the option for matching multiline output against start/end sequences.

@rsc
Copy link
Contributor

rsc commented Feb 7, 2024

What if something prints the header in another context?

@seankhliao
Copy link
Member

Usually this is in the context of server applications with (semi) structured logging as the only expected output, so it's easy to match against a fixed string from the start of a line.
But if something else prints it i would also expect it to be captured as a panic, e.g. net/http automatically recovers panics and prints a stacktrace, i also want that to be captured.

@rsc rsc changed the title proposal: runtime/debug: add SetCrashHeader(header []byte) proposal: runtime/debug: add SetCrashHeader function Feb 8, 2024
@rsc
Copy link
Contributor

rsc commented Feb 8, 2024

The prints from net/http are separate from anything done by debug.SetCrashHeader, which would only be about crashes. So I think we still can just make the final crash messages unambiguous and not introduce what amounts to MIME framing around Go crash messages.

@rsc
Copy link
Contributor

rsc commented Feb 9, 2024

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc moved this from Incoming to Active in Proposals Feb 9, 2024
@aclements
Copy link
Member

Given SetCrashOutput(file *os.File), why can't programs that want to provide extra metadata to a crash handler write to the crash output os.File ahead of time to provide such context?

@rsc
Copy link
Contributor

rsc commented Mar 1, 2024

Based on the discussion above, this proposal seems like a likely decline.
— rsc for the proposal review group

@rsc rsc moved this from Active to Likely Decline in Proposals Mar 1, 2024
@rsc
Copy link
Contributor

rsc commented Mar 8, 2024

No change in consensus, so declined.
— rsc for the proposal review group

@rsc rsc moved this from Likely Decline to Declined in Proposals Mar 8, 2024
@rsc rsc closed this as completed Mar 8, 2024
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/581215 mentions this issue: runtime: properly frame panic values in tracebacks

gopherbot pushed a commit that referenced this issue May 8, 2024
This CL causes the printing of panic values to ensure that all
newlines in the output are immediately followed by a tab, so
that there is no way for a maliciously crafted panic value to
fool a program attempting to parse the traceback into thinking
that the panic value is in fact a goroutine stack.

See #64590 (comment)

+ release note

Updates #64590
Updates #63455

Change-Id: I5142acb777383c0c122779d984e73879567dc627
Reviewed-on: https://go-review.googlesource.com/c/go/+/581215
Auto-Submit: Alan Donovan <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Michael Pratt <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Declined
Development

No branches or pull requests

7 participants