Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests for pkg/grpc/exec #218

Merged
merged 9 commits into from
Aug 2, 2022
Merged

Add tests for pkg/grpc/exec #218

merged 9 commits into from
Aug 2, 2022

Conversation

tpapagian
Copy link
Member

@tpapagian tpapagian commented Jul 6, 2022

An exec and exit event can be delivered out of order to the agent. For this reason, we have a cache that keeps the un populated events for a small period of time to wait for the exec event.

There are also other caches that deal with this situations.

This commit adds tests for pkg/grpc/exec. For now it only contains 2 different cases:

  1. exec and exit events in order (TestGrpcExecInOrder)
  2. exec and exit events out of order (TestGrpcExecOutOfOrder)

Signed-off-by: Anastasios Papagiannis [email protected]

@tpapagian tpapagian force-pushed the pr/apapag/cache_tests branch from e837d4a to 8716128 Compare July 6, 2022 12:22
Copy link
Contributor

@kkourt kkourt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Your last commit mentions:

This commit tries to fix the issues with the previous tests.

Are those issues described somewhere? That is, where there bugs uncovered with the test? If so, I think it's worth describing them in the commit message. E.g., did the test fail? How?

pkg/eventcache/eventcache.go Outdated Show resolved Hide resolved
pkg/grpc/exec/exec_test.go Outdated Show resolved Hide resolved
pkg/grpc/exec/exec_test.go Outdated Show resolved Hide resolved
pkg/grpc/exec/exec_test.go Outdated Show resolved Hide resolved
@tpapagian tpapagian force-pushed the pr/apapag/cache_tests branch from 8716128 to e8bcadc Compare July 15, 2022 10:00
@tpapagian tpapagian force-pushed the pr/apapag/cache_tests branch 3 times, most recently from 93848b2 to e5ceb65 Compare August 1, 2022 12:03
@tpapagian tpapagian changed the base branch from main to pr/willfindlay/unify-execcache August 1, 2022 12:04
@tpapagian tpapagian force-pushed the pr/apapag/cache_tests branch 4 times, most recently from acf93f5 to 08c9b73 Compare August 1, 2022 12:33
Base automatically changed from pr/willfindlay/unify-execcache to main August 1, 2022 19:26
This will be used to create unit tests that run in a small period of time.

Signed-off-by: Anastasios Papagiannis <[email protected]>
An exec and exit event can be delivered out of order to the agent.
For this reason, we have a cache that keeps the un-populated events
for a small period of time to wait for the exec event.

There are also other caches that deal with this situations.

This commit adds tests for pkg/grpc/exec. For now it only contains
2 different cases:
1. 2 events in order (TestGrpcExecInOrder)
2. 2 events out of order (TestGrpcExecOutOfOrder)

Now the TestGrpcExecOutOfOrder fails while the TestGrpcExecInOrder passes.

This can be uses as a skeleton to implement tests for other caches as well.

ps. Trying to understand if the failure is due to the exec.go or the test
itself.

Signed-off-by: Anastasios Papagiannis <[email protected]>
@tpapagian tpapagian force-pushed the pr/apapag/cache_tests branch from 08c9b73 to faf4cf2 Compare August 2, 2022 07:00
Resolve obvious cases for Needed() where we have a process pointer, but
are missing binary and other information. This happens through common
paths for example when exit events are received before their exec event.

To fix just extend the Needed() check to search for Binary name. I
introduced this while preparing initial release and doing some
refactoring.

Fixes: 931b70f "Tetragon: Cilium developers happily contribute Tetragon"
Signed-off-by: John Fastabend <[email protected]>
@tpapagian tpapagian force-pushed the pr/apapag/cache_tests branch 5 times, most recently from ad13ba6 to b4bcc35 Compare August 2, 2022 08:18
If event.internal was nil here we would then dereference a nil pointer
and promptly panic. So, lets delete this if/else condition. I believe
at some point we didn't always set the internal field, but now if we
track back into pkg/grpc code we see that if the internal field can
not be looked up we create an empty one and set the time/pid so we
at least have some reference to the process.

Signed-off-by: John Fastabend <[email protected]>
There will be no k8s info if the k8s is disabled and so lets not try
to query it. This can cause events to remain in cache for the full
timeout when k8s is disabled, but underlying system does have docker
IDs.

At the same time remove the blanket check to skip handleEvent logic
when k8s is disabled because we will still want to handle the case
where the process info is missing due to out of order event. For
example when the tracing event is processed before the execve event.

Signed-off-by: John Fastabend <[email protected]>
The eventcache event handler (not execve handlers) does not actually
wait for the internal process pointer to be populated. Because we
are just looking at a pointer and waiting for the execve event to
get handled we need some extra checks.

Signed-off-by: John Fastabend <[email protected]>
Its possible we have a reference to a process that is missing its K8s
podInfo. If this is the case simply wait for it to get flushed through
the cache. This happens when both the pkg/grpc/* event is in the cache
and the associated execve event. In this case we just have to wait for
the pod watcher to catch up with us.

Signed-off-by: John Fastabend <[email protected]>
@tpapagian tpapagian force-pushed the pr/apapag/cache_tests branch from 15deb00 to 28e16eb Compare August 2, 2022 09:03
@tpapagian tpapagian marked this pull request as ready for review August 2, 2022 09:19
@tpapagian tpapagian requested a review from a team as a code owner August 2, 2022 09:19
@tpapagian tpapagian requested review from olsajiri and jrfastab August 2, 2022 09:19
tpapagian and others added 2 commits August 2, 2022 13:42
Eventcache tries in each iteration to find if the process info is
in the cache. In order to do so, we use pid and the time that the process
started. We used timestamppb which is actually normalized with
ktime.DecodeKtime() and is not what we need.

This resulted that we always wait for all iterations to to end
before we produce the event.

This commit used the raw ktime (uint64) and if we need a timestamppb.Timestamp
we simply generate that on-demand.

Signed-off-by: Anastasios Papagiannis <[email protected]>
Collecting endpoint does not do anything here so lets drop it for now.
Presumably, we lost the code that used this in one of our refactors so
we can add it back if we have a need for it later.

We also need to update the skeleton e2e test since we are now receiving more events than
initially expected which would cause the test to cross the event limit. To fix this, bump
the limit up to a conservative 100 events.

Signed-off-by: John Fastabend <[email protected]>
Signed-off-by: William Findlay <[email protected]>
@jrfastab jrfastab merged commit ac1c3e3 into main Aug 2, 2022
@jrfastab jrfastab deleted the pr/apapag/cache_tests branch August 2, 2022 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants