Gofailpoints should simulate proper process crash instead of panic #18142

serathius · 2024-06-07T12:55:22Z

What would you like to be added?

Using panic in failpoints results in process exiting, however the exit is handled through Golang runtime. It would be preferred to do a proper hard crash. This should be achievable by calling os.Exist(1).

cc @MadhavJivrajani @fuweid @henrybear327

Why is this needed?

More realistic failpoints.

The text was updated successfully, but these errors were encountered:

henrybear327 · 2024-06-07T13:43:13Z

/assign @henrybear327

Hey @serathius, if I understand correctly, we would like to add a new failpoint type that leverages os.Exit(1) instead of panic(), is it? :)

serathius · 2024-06-07T13:56:03Z

Yes, would be also good to confirm how big is the difference between them. Not a Go expert but maybe there is explanation in Go documentation how Go runtime handles panic. It should help us confirm if we should have both options.

ALso, to not lose functionality from panic, before we call os.Exit(1) we would dump stacktrace to stderr.

henrybear327 · 2024-06-07T15:41:28Z

Yes, would be also good to confirm how big is the difference between them. Not a Go expert but maybe there is explanation in Go documentation how Go runtime handles panic. It should help us confirm if we should have both options.

ALso, to not lose functionality from panic, before we call os.Exit(1) we would dump stacktrace to stderr.

From my initial research, panic() will stop the ordinary flow of control, execute the functions defined with defer, and return to its caller. It's recoverable with recover() within the defer function if you want to handle it there.

On the other hand, os.Exit(1) will terminate the program, without executing the defer functions, with no chance to recover, it's a hard stop of the program.

References:

henrybear327 · 2024-06-07T15:47:20Z

For simulating a failing node, os.Exit(1) would be a more realistic approach as it directly terminates the program. A quick code search (if I am not mistaken) shows that we are not handling the panic() with recover() within the codebase at the top level, so the effect of terminating a program seems to be the same to me so far (maybe there might be other shutdown mechanisms that I am not aware of so my conclusion might be wrong). But panic() will contain more debugging traces for future analysis though.

serathius added help wanted type/feature area/robustness-testing labels Jun 7, 2024

k8s-ci-robot assigned henrybear327 Jun 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gofailpoints should simulate proper process crash instead of panic #18142

Gofailpoints should simulate proper process crash instead of panic #18142

serathius commented Jun 7, 2024

henrybear327 commented Jun 7, 2024

serathius commented Jun 7, 2024 •

edited

Loading

henrybear327 commented Jun 7, 2024

henrybear327 commented Jun 7, 2024 •

edited

Loading

Gofailpoints should simulate proper process crash instead of panic #18142

Gofailpoints should simulate proper process crash instead of panic #18142

Comments

serathius commented Jun 7, 2024

What would you like to be added?

Why is this needed?

henrybear327 commented Jun 7, 2024

serathius commented Jun 7, 2024 • edited Loading

henrybear327 commented Jun 7, 2024

henrybear327 commented Jun 7, 2024 • edited Loading

serathius commented Jun 7, 2024 •

edited

Loading

henrybear327 commented Jun 7, 2024 •

edited

Loading