Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gofailpoints should simulate proper process crash instead of panic #18142

Open
serathius opened this issue Jun 7, 2024 · 4 comments
Open

Gofailpoints should simulate proper process crash instead of panic #18142

serathius opened this issue Jun 7, 2024 · 4 comments

Comments

@serathius
Copy link
Member

What would you like to be added?

Using panic in failpoints results in process exiting, however the exit is handled through Golang runtime. It would be preferred to do a proper hard crash. This should be achievable by calling os.Exist(1).

cc @MadhavJivrajani @fuweid @henrybear327

Why is this needed?

More realistic failpoints.

@henrybear327
Copy link
Contributor

/assign @henrybear327

Hey @serathius, if I understand correctly, we would like to add a new failpoint type that leverages os.Exit(1) instead of panic(), is it? :)

@serathius
Copy link
Member Author

serathius commented Jun 7, 2024

Yes, would be also good to confirm how big is the difference between them. Not a Go expert but maybe there is explanation in Go documentation how Go runtime handles panic. It should help us confirm if we should have both options.

ALso, to not lose functionality from panic, before we call os.Exit(1) we would dump stacktrace to stderr.

@henrybear327
Copy link
Contributor

Yes, would be also good to confirm how big is the difference between them. Not a Go expert but maybe there is explanation in Go documentation how Go runtime handles panic. It should help us confirm if we should have both options.

ALso, to not lose functionality from panic, before we call os.Exit(1) we would dump stacktrace to stderr.

From my initial research, panic() will stop the ordinary flow of control, execute the functions defined with defer, and return to its caller. It's recoverable with recover() within the defer function if you want to handle it there.

On the other hand, os.Exit(1) will terminate the program, without executing the defer functions, with no chance to recover, it's a hard stop of the program.

References:

@henrybear327
Copy link
Contributor

henrybear327 commented Jun 7, 2024

For simulating a failing node, os.Exit(1) would be a more realistic approach as it directly terminates the program. A quick code search (if I am not mistaken) shows that we are not handling the panic() with recover() within the codebase at the top level, so the effect of terminating a program seems to be the same to me so far (maybe there might be other shutdown mechanisms that I am not aware of so my conclusion might be wrong). But panic() will contain more debugging traces for future analysis though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants