tetragon: Teach file writer to exit gracefully #185

jrfastab · 2022-06-22T23:54:58Z

We observed that rarely if we SIGTERM Tetragon (e.g. what pod restart and destroy do before SIGKILL it) that a corrupted json file might get left behind. The theory is this is do to writers being killed in the middle of an operation.

To fix we do two things. First propagate a single context through the processManager and Server objects instead of just using context.Backgroun(). This allows the cancel() call from our SIGTERM handler to be handled by ctx.Done() in the respective go routines. Then we add a waitGroup to the signal handler to ensure it waits until the writer has exited.

I only fix the go routine responsible for writes here. We should audit the other routines as well.

And second I started thinking about how we nest the various objects, at the moment we pass a lot of input types through the New() constructors. But this looked bigger than a fix so held off for now on this.

The context 'cancel()' done on closing does not propagate into the ProcessManager and Server objects, which currently use the basic context.Background(). The problem with this is on exit we hook SIGTERM, etc. and call cancel() but, because this doesn't get pushed back to the JSON writer object its possible to have partial writes completed. First to clean this lets ensure we use the ctx from main so cancel will be pushed into these objects. Signed-off-by: John Fastabend <[email protected]>

When we receive a signal -- sigterm for example -- we tear down the BPF state and event handlers in go routines with cancel(). But, we do not wait to ensure they have received the signal and done any cleanup needed. This adds a waitgroup to use for this. So objects increment it on New() create op and then call the Done() when they are complete. Ensuring we don't exit in the middle of sensitive operations, such as writes to gRPC streams and JSON files where we can generate corrupted events when we are unlucky. Signed-off-by: John Fastabend <[email protected]>

pkg/server/server.go

jrfastab added 2 commits June 22, 2022 12:16

jrfastab requested a review from a team as a code owner June 22, 2022 23:54

jrfastab requested a review from tixxdz June 22, 2022 23:54

jrfastab force-pushed the pr/jrfastab/ctxprop branch from 7b8a132 to 7b4ea4d Compare June 23, 2022 00:17

kkourt self-requested a review June 23, 2022 05:22

michi-covalent reviewed Jun 23, 2022

View reviewed changes

pkg/server/server.go Outdated Show resolved Hide resolved

kkourt reviewed Jun 23, 2022

View reviewed changes

pkg/server/server.go Show resolved Hide resolved

pkg/server/server.go Outdated Show resolved Hide resolved

jrfastab force-pushed the pr/jrfastab/ctxprop branch from 7b4ea4d to 4b7c8d1 Compare June 24, 2022 05:41

kkourt approved these changes Jun 24, 2022

View reviewed changes

kkourt requested a review from willfindlay June 24, 2022 05:59

willfindlay approved these changes Jun 24, 2022

View reviewed changes

michi-covalent approved these changes Jun 24, 2022

View reviewed changes

jrfastab merged commit d2de3fa into main Jun 28, 2022

jrfastab deleted the pr/jrfastab/ctxprop branch June 28, 2022 06:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tetragon: Teach file writer to exit gracefully #185

tetragon: Teach file writer to exit gracefully #185

jrfastab commented Jun 22, 2022

tetragon: Teach file writer to exit gracefully #185

tetragon: Teach file writer to exit gracefully #185

Conversation

jrfastab commented Jun 22, 2022