Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tetragon: Allow retry logic for clone events #362

Merged
merged 2 commits into from
Aug 26, 2022

Conversation

olsajiri
Copy link
Contributor

The AddCloneEvent fails to add process the the cache if
the parent does not exist - there's no record of it.

This can happen for example in TestFork test where fork events
can be delievered out of order and tetragon sees the child fork
event before the parent's one.

Adding retry logic for clone events, which will retry the fork
event processing and adding the hild process entry after the
timeout. During the timeout the parent is updated and following
retry logic will add the child's entry properly.

Signed-off-by: Jiri Olsa [email protected]

Adding Notify callback to Message interface, so we can prevent
final NotifyListeners call. It will be used for clone events
which are not actually delivered, but we add retry logic for them
in following change.

Signed-off-by: Jiri Olsa <[email protected]>
The AddCloneEvent fails to add process the the cache if
the parent does not exist - there's no record of it.

This can happen for example in TestFork test where fork events
can be delivered out of order and tetragon sees the child fork
event before the parent's one.

Adding retry logic for clone events, which will retry the fork
event processing and adding the child process entry after the
timeout. During the timeout the parent is updated and following
retry logic will add the child's entry properly.

Signed-off-by: Jiri Olsa <[email protected]>
@olsajiri olsajiri marked this pull request as ready for review August 26, 2022 08:44
@olsajiri olsajiri requested a review from a team as a code owner August 26, 2022 08:44
@olsajiri olsajiri requested a review from tpapagian August 26, 2022 08:44
Copy link
Member

@tpapagian tpapagian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable, thanks!

it would be great if we can test that this PR solves all TestFork flakes. @kkourt had a way to repro the flake in TestFork (I tried to reproduce that locally but didn't manage to achieve that).

@olsajiri
Copy link
Contributor Author

This seems reasonable, thanks!

it would be great if we can test that this PR solves all TestFork flakes. @kkourt had a way to repro the flake in TestFork (I tried to reproduce that locally but didn't manage to achieve that).

TestFork was failing on 4.19 for me, with this I can no longer reproduce the fail, so I'm hopeful ;-)

@jrfastab jrfastab self-requested a review August 26, 2022 18:20
@jrfastab jrfastab merged commit a34d1ff into cilium:main Aug 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants