tetragon: handling clone events and exec events more correctly #325

jrfastab · 2022-08-12T23:22:15Z

Currently, we leak clone events into the procCache. The procCache is an LRU so not free'ing events correctly means after some uptime we have a bunch of unused memory around and LRU will free things. Nothing breaks but its unnecessary memory usage.

To fix ensure clone and execve events work together correctly see patch1. This issue was introduced with the clone event relatively recently.

Currently, clone events have unique execIDs. So when we get the simple flow clone event => execId=1 execve event => execId=2 Both entries are added to the procCache. But then on the original clone() event with execId=1 does not have a matching exit event because the exit comes with execId=2 associated with the execve event. From simple application point of view this makes sense with the flow, clone() id=1 exec() id=2 exit() id=2 The result though is the procCache is slowly filling up with processes that have no references to any process or future events. To fix this lets use the same execID for the exec() event when its attached to the clone so that we get, clone() id=1 exec() id=1 exit() id=1 then the exit event triggers the procCache delete and the process is removed from the process cache after the grace period. Now we might be curious about a few other flows. The clone was original built to fix the case where a process doesn't do an exec. clone() id=1 kprobe() id=1 exit() id=1 In this case we are still in good shape because any kprobe events in between the clone and exit will pick up the same id and when we do process association we get the clone info. Now subsequent clones also work as expected, clone() id=1 clone() id=2 kprobe() id=2 exit() id=2 exit() id=1 multiple execs should also be followed, clone() id=1 exec() id=1 exec() id=1 exit() id=1 In this case we will get distinct exec events for each new exec but we track it as a single ID so that when we get the exit we know the entire chain is under the exit. Signed-off-by: John Fastabend <[email protected]>

kkourt · 2022-08-16T08:38:32Z

2022-08-12T23:29:51.6448826Z --- FAIL: TestGrpcExecOutOfOrder (1.90s)

Failures seem legit.

jrfastab · 2022-08-17T19:32:26Z

Dropped the 2nd patch as I ponder if ths is a good idea. The 2nd patch theory was in the case where we have a clone event followed by an exec event we currently check if the clone bit is set in the exec event and then inc the parent. If its not set we do a bit of a dance with an inc followed by a dec. But, receiving the clone event indicates we are in the clone path so we could do the inc there without any if/else logic and then exec could drop the inc/dec dance.

Although we hit the problem that the exec tests don't currently do clone events even though they set the clone bit. So we need a bit more work to make above correct from testing side. Basically we need to add clone events to the tests and then also test the other case where we get exec() calls without clones(). This happens in libs and such when they smash their own stack with a new binary. Often used by launcher codes to avoid creating more stacks.

So TLDR I think its still correct just didn't want to hold the first patch up to get all the tests in place.

jrfastab · 2022-08-17T19:34:40Z

Further there are some interesting cases to consider from an ordering side. Some that come to mind is what happens if we get OOO exec and clone events? In this way having the bit set is nice because the exec event is mostly self contained. And what happens if the parent event is missing when we get the clone event. At the moment the clone event just folds and gives up. But current code will recover reasonably well because the exec event will go through the cache.

jrfastab · 2022-08-17T19:35:01Z

@tpapagian some things to consider ^^^

jrfastab requested a review from a team as a code owner August 12, 2022 23:22

jrfastab requested a review from kaworu August 12, 2022 23:22

jrfastab force-pushed the pr/jrfastab/cloneRefCnt branch from 2284cef to 26683cc Compare August 12, 2022 23:24

jrfastab requested review from kkourt and tpapagian August 12, 2022 23:28

jrfastab force-pushed the pr/jrfastab/cloneRefCnt branch from 26683cc to 788923c Compare August 17, 2022 19:28

jrfastab merged commit ed2dc24 into main Aug 17, 2022

jrfastab deleted the pr/jrfastab/cloneRefCnt branch August 17, 2022 20:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tetragon: handling clone events and exec events more correctly #325

tetragon: handling clone events and exec events more correctly #325

jrfastab commented Aug 12, 2022

kkourt commented Aug 16, 2022

jrfastab commented Aug 17, 2022

jrfastab commented Aug 17, 2022

jrfastab commented Aug 17, 2022

tetragon: handling clone events and exec events more correctly #325

tetragon: handling clone events and exec events more correctly #325

Conversation

jrfastab commented Aug 12, 2022

kkourt commented Aug 16, 2022

jrfastab commented Aug 17, 2022

jrfastab commented Aug 17, 2022

jrfastab commented Aug 17, 2022