-
Notifications
You must be signed in to change notification settings - Fork 380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test improvements, and avoiding retries in event tests #103
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
kkourt
changed the title
test improvements, and avoiding retries
test improvements, and avoiding retries in event tests
May 30, 2022
kkourt
force-pushed
the
pr/kkourt/test-improvements
branch
from
May 30, 2022 14:10
aa70392
to
e265817
Compare
willfindlay
approved these changes
May 30, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks awesome. Just left a couple comments/nits.
looks good, there are tests like |
Signed-off-by: Kornilios Kourtis <[email protected]>
Signed-off-by: Kornilios Kourtis <[email protected]>
Signed-off-by: Kornilios Kourtis <[email protected]>
Signed-off-by: Kornilios Kourtis <[email protected]>
Signed-off-by: Kornilios Kourtis <[email protected]>
The typical structure of our tests is: 1. start observer 2. do some things on user-space 3. check that we get the expected events from tetragon The above approach requires retries for step 3 if the events we are looking for are not there. These retry counts are large so that we can deal with worst case scenarios, and, subsequently, they induce a significant time cost in failing tests. Furthermore, for some tests we also need to check for the absence of events (e.g., when doing filtering), which also requires waiting on timeouts. This patch introduces the first steps to avoid having to wait on timeouts. We use pkg/sensors/test, which is a simpler sensor that hooks into the lseek call, and generates a MSG_TEST event when lseek is called with bogus arguments (fd=-1, whence=4444). The idea is simple: after step 2 in our tests, can we trigger events from test sensor on all CPUs. Once we have seen all the test events (on all CPUs), then we know that if we expect any events, they are not there, and, subsequently, there is no need for retries. This patch: * introduces the user-space component for above functionality: TestChecker * modifies the BPF hook to also record the current CPU * adds a trigger-test-events program that triggers the test events on all CPUs TestChecker works by wrapping another MultiEventChecker and looking for TEST messages. Once it has seen one such message in all CPUs, it will terminate the test by calling FinalCheck of the underlying MultiEventChecker. Signed-off-by: Kornilios Kourtis <[email protected]>
Change the value from 4444 to 4729 to avoid conflicts with tracepoint tests. Signed-off-by: Kornilios Kourtis <[email protected]>
This patch introduces a simple test to test the TestChecker. The code uses a dummy checker, which wraps to use the TestChecker. It can be used as example for other tests to remove the need for retries. Signed-off-by: Kornilios Kourtis <[email protected]>
When our json test check fails, it marks the export file containing the events so that it will not be deleted. In some cases (e.g., in TestTestChecker) , however, we expect the test to fail. Add a DontKeepExportFile for this case. Also, use DontKeepExportFile in TestTestChecker. Signed-off-by: Kornilios Kourtis <[email protected]>
doTestGenericTracepointPidFilter calls JsonTestCheck which will call KeepExportFile if the test fails. Remove unnecessary code. Signed-off-by: Kornilios Kourtis <[email protected]>
- use t.Logf instead of fmt.Printf - add comment to doTestGenericTracepointPidFilter Signed-off-by: Kornilios Kourtis <[email protected]>
Signed-off-by: Kornilios Kourtis <[email protected]>
We seem to be leaving /sys/fs/bpf/testObserver directories around after we finish the tests. Make sure they are removed. Also, use defaults.DefaultMapRoot to define the above directory. Signed-off-by: Kornilios Kourtis <[email protected]>
Signed-off-by: Kornilios Kourtis <[email protected]>
Add a LoadSensor function in testutils. Currently, this is only used by tests in pkg/sensors/test, but other tests might find it useful as well. Signed-off-by: Kornilios Kourtis <[email protected]>
This commit adds a helper sensor manager than can be used in testing. Signed-off-by: Kornilios Kourtis <[email protected]>
This patch adds the sensor/test TestChecker in the generic tracepoint tests. The immediate result is that failing the test can be failed quicker. For example, if we modify the code so that the ArgFilterLseek test fails: op := func() { t.Logf("Calling lseek...\n") - unix.Seek(fd, 0, whence) + // unix.Seek(fd, 0, whence) unix.Seek(fd, 0, whence+1) } Then, with this patch the test fails in 4.5s [1], while without in 46s [2]. Additionally, the idea here is to try and use this approach in the GH actions environment and see how well it works. If it does, we can use it to make our testing more reliable and predicatable. 1: observer_test_json.go:153: time="2022-05-30T14:28:47+02:00" level=info msg="seen events on all CPUs, finalizing test" ... --- FAIL: TestGenericTracepointArgFilterLseek (4.46s) 2: tracepoint_test.go:199: error: JsonTestCheck failed after 20 retries: JsonEOF: failed to match after 167 events: err:Got 0 events while expecting at least 1 ... --- FAIL: TestGenericTracepointArgFilterLseek (45.84s) Signed-off-by: Kornilios Kourtis <[email protected]>
kkourt
force-pushed
the
pr/kkourt/test-improvements
branch
from
May 31, 2022 07:24
e265817
to
0fd9845
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Please review commit-by-commit
This PR contains a number of improvements for testing. Most of them are small, but there is one which is notable: allowing testing without retries.
The idea is the following:
The typical structure of our tests is:
The above approach requires retries for step 3 if the events we are looking
for are not there. These retry counts are large so that we can deal with
worst case scenarios, and, subsequently, they induce a significant time cost
in failing tests. Furthermore, for some tests we also need to check for the
absence of events (e.g., when doing filtering), which also requires waiting
on timeouts.
This patch introduces the first steps to avoid having to wait on
timeouts.
We use pkg/sensors/test, which is a simpler sensor that hooks into the
lseek call, and generates a MSG_TEST event when lseek is called with
bogus arguments (fd=-1, whence=4444).
The idea is simple: after step 2 in our tests, can we trigger events
from test sensor on all CPUs. Once we have seen all the test events (on
all CPUs), then we know that if we expect any events, they are not
there, and, subsequently, there is no need for retries.
This patch:
TestChecker
all CPUs
TestChecker works by wrapping another MultiEventChecker and looking for
TEST messages. Once it has seen one such message in all CPUs, it will
terminate the test by calling FinalCheck of the underlying
MultiEventChecker.