-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Optimize single_process_type execution #2110
Conversation
ff9e3a1
to
2df2ce9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice change, optimizations for this test are very welcome.
Few additional points:
- Maybe we could use "crictl" for k8s introspection change for easier support of [Feature] Support multiple container engines #2103 in case it will be implemented?
- Operation of workload_resource_test and cnf_workload_resources is complicated and feels that workload_resource_test design could be a cause for multiple slowdowns in our tests. Probably we should look into fixing this at a higher scale. Not a scope of this PR though.
Yes, after some discussions we have come to the conclusion that Edit:
The spec tests are passing. |
3d06250
to
b161a6d
Compare
Refs: cnti-testcatalog#2084 - Reduces resource iteration by only checking the container processes. - Also introduced changes to the the cluster_tools and k8s_kernel_introspection libraries. Signed-off-by: svteb <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Log.for(t.name).info { "multiple proc types detected verified: #{verified}" } | ||
fail_msg = "resource: #{resource} has more than one process type (#{container_proctree_statuses.map { |x| x["cmdline"]? }.compact.uniq.join(", ")})" | ||
unless fail_msgs.find { |x| x == fail_msg } | ||
puts fail_msg.colorize(:red) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could be changed to stdout_error, but as we have CI fully successful now, we can probably do that on the scope of the whole testsuite with another PR
Description
The single_process_type / process_check spec tests took on average 1:30:00 to execute (in github actions), this was reduced to around 10-15 minutes. It was caused by incorrect use of libraries which checked unnecessary processes multiple times.
These two changes need to be merged first:
cluster_tools change: #27
k8s_kernel_introspection change: #4
Because this pull request makes changes in two libraries, it is rather difficult to verify it through actions. Currently there are changes to the
shards.yaml
andshards.lock
files which will be removed once the reviewers approve the change and the individual library pull requests are merged.For reviewers:
There are two big changes, the first is the replacement of
workload_resource_test
bycnf_workload_resources
, both yield resource names, butworkload_resource_test
also yields all the container names per resource, thus we could get resources like this (it also did not return all the containers for some reason?):optimized version:
With this change, we do not iterate over every resource multiple times. The second big change is the addition of pid filtering by cgroups. The previous version was incorrectly checking processes on node instead of a container. This lead to something like this (which makes no sense considering we want verify to processes of specific container).
optimized version:
The cgroup filtering currently depends on
ctr
purely because it was easier to write thanrunc
filtering. Considering the fact that the current testsuite depends on containerd runtime (ctr
) it should not be too problematic and can be changed in the future. Finally, I noticed that the cluster_tools library provided a function that was a copy of the code insingle_process_check
. That same function is also shared by the zombie task which has some unique quirks I will not get too deep into here. That shared function (ClusterTools.all_containers_by_resource?
) had to be changed slightly so it would not break the zombie task. A better refactor can be considered in the future but it is outside the scope of this pull request.Issues:
Refs: #2084
How has this been tested:
Types of changes:
Checklist:
Documentation
Code Review
Issue