-
Notifications
You must be signed in to change notification settings - Fork 42.3k
Add e2e test for Regular Container image change #126794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add e2e test for Regular Container image change #126794
Conversation
|
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Hi @dshebib. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
| Containers: []v1.Container{ | ||
| { | ||
| Name: regular1, | ||
| Image: busyboxImage, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: prefer agnhost to other images:
https://github.com/kubernetes/kubernetes/blob/master/test/images/agnhost/README.md
@aojea @pohly @dims we could probably do more to highlight this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do what? Communicating this to devs (like a mail to dev) or something technical (like a linter?)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
speaking as a new contributor, if most of the tests use agnhost then I'd continue to use it for most tests without thinking twice. I'm working on a PR for this test file to reuse the Context and will also update the images to agnhost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that we need to inform new and existing contributors about such best practices. That's hard, in particular when existing code and tests don't follow them for historic reasons 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that we need to inform new and existing contributors about such best practices. That's hard, in particular when existing code and tests don't follow them for historic reasons 😢
Right. I think just an email would be a good start. We have #122751 but that doesn't handle additional usage of other currently permitted images from historic reasons.
|
/ok-to-test |
|
/retest |
| }, | ||
| Spec: v1.PodSpec{ | ||
| RestartPolicy: v1.RestartPolicyAlways, | ||
| Containers: []v1.Container{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you update the PodSpec to have two or more regular containers and then check to see if only the one with the updated image restarts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated test
|
|
||
| ginkgo.By("updating the image", func() { | ||
| client.Update(ctx, podSpec.Name, func(pod *v1.Pod) { | ||
| pod.Spec.Containers[0].Image = "fakeimage" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also want to add a test case for when the container image is updated to one that starts correctly.
@BenTheElder @pohly @aojea
What would be the preferred approach from the SIG-Testing perspective?
- Update to a different image tag with agnhost
- Update to busybox or another image
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a curated list of images which are okay to use in tests: https://github.com/kubernetes/kubernetes/blob/master/test/images/.permitted-images
We don't have different tags for the same image. So for this test, I think switching to registry.k8s.io/pause is fine. There's also registry.k8s.io/e2e-test-images/busybox, but it might get (be?) superseded by agnhost and I wouldn't add more usages of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of fakeimage, use invalid.registry.k8s.io/invalid/alpine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pause is probably suitable for this, the main difference in recent versions is windows support.
However we want to keep it minimal. So maybe the one pause image and the one agnhost image, instead of multiple tags for those images. Which would also mean we are at the latest windows support for both.
Sorry I lost this thread in all the other github notifications.
|
|
||
| preparePod(originalPodSpec) | ||
|
|
||
| ginkgo.It("should not restart when the image is updated", func(ctx context.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks inconsistent with the actual code:
kubernetes/pkg/kubelet/kuberuntime/kuberuntime_manager.go
Lines 991 to 995 in 4aeaf1e
| if _, _, changed := containerChanged(&container, containerStatus); changed { | |
| message = fmt.Sprintf("Container %s definition changed", container.Name) | |
| // Restart regardless of the restart policy because the container | |
| // spec changed. | |
| restart = true |
|
|
||
| ginkgo.By("updating the image", func() { | ||
| client.Update(ctx, podSpec.Name, func(pod *v1.Pod) { | ||
| podSpec.Spec.Containers[0].Image = imageutils.GetE2EImage(imageutils.InvalidRegistryImage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be:
| podSpec.Spec.Containers[0].Image = imageutils.GetE2EImage(imageutils.InvalidRegistryImage) | |
| pod.Spec.Containers[0].Image = imageutils.GetE2EImage(imageutils.InvalidRegistryImage) |
Then, the pod will be restarted.
This comment was marked as outdated.
This comment was marked as outdated.
| podSpec, err = client.Get(ctx, podSpec.Name, metav1.GetOptions{}) | ||
| framework.ExpectNoError(err) | ||
| results := parseOutput(ctx, f, podSpec) | ||
| framework.ExpectNoError(results.HasNotRestarted(regular1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, we cannot use HasNotRestarted() with an invalid image because this function depends on log messages emitted by valid containers.
|
I'm afraid a previous comment (#126794 (comment)) was based on a behavior with an invalid image. If a new image is valid, the pod will be running after the container is restarted because the newer container will be observed. |
| err := e2epod.WaitForPodCondition(ctx, f.ClientSet, podSpec.Namespace, podSpec.Name, "wait for container to fail due to image", | ||
| time.Duration(bufferSeconds)*time.Second, func(pod *v1.Pod) (bool, error) { | ||
| containerStatus := pod.Status.ContainerStatuses[0] | ||
| if containerStatus.State.Waiting != nil && containerStatus.State.Waiting.Reason == "ErrImagePull" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The container that hit ErrImagePull at restarting doesn't get into Waiting because ShouldContainerBeRestarted() returns false based on RestartPolicy:
kubernetes/pkg/kubelet/kubelet_pods.go
Lines 2330 to 2332 in 7dd03c1
| if !kubecontainer.ShouldContainerBeRestarted(&container, pod, podStatus) { | |
| continue | |
| } |
In addition, this container will not restarted:
| continue |
If there is no other container in the pod, the behavior is a little different. The pod will be terminated as mentioned in #126794 (comment).
Anyway, a container whose image was updated to an invalid image will never restart. This may be a bug. It looks a corner case, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like a very specific bug, because it only occurs when restartPolicy=onFailure but not when restartPolicy=Never. So far I could not figure out the reason. For now I just put a fixme but the test is still failing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| framework.ExpectNoError(results.ExitsBefore(prefixedName(PostStartPrefix, regular1), regular2)) | ||
| }) | ||
|
|
||
| ginkgo.When("A pod is running a regular container with restartPolicy=Never", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A test case with restartPolicy=OnFailure is rendered like:
[sig-node] [NodeConformance] Containers Lifecycle when A pod is running a regular container with restartPolicy=Never [It] should restart when the image is updated with a bad image and restartPolicy=OnFailure
There are restartPolicy=Never and restartPolicy=OnFailure. It would be better to move restartPolicy to each It.
| }) | ||
| }) | ||
|
|
||
| ginkgo.When("A pod is running a regular container with restartPolicy=Always", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initialization here looks almost the same as one above. Could we put all test cases under a single When?
e3c81ac to
e91689e
Compare
|
@dshebib do you think you will continue with this PR? |
e91689e to
e9e46f2
Compare
| regularContainerInvalidImgUpdateTest(ctx) | ||
| }) | ||
|
|
||
| ginkgo.It("should successfully restart when the image is updated and restartPolicy=OnFailure", func(ctx context.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ginkgo.It("should successfully restart when the image is updated and restartPolicy=OnFailure", func(ctx context.Context) { | |
| ginkgo.It("should successfully restart when the image is updated and restartPolicy=Never", func(ctx context.Context) { |
| regularContainerImgUpdateTest(ctx) | ||
| }) | ||
|
|
||
| ginkgo.It("should restart when the image is updated with a bad image and restartPolicy=OnFailure", func(ctx context.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ginkgo.It("should restart when the image is updated with a bad image and restartPolicy=OnFailure", func(ctx context.Context) { | |
| ginkgo.It("should restart when the image is updated with a bad image and restartPolicy=Never", func(ctx context.Context) { |
|
/retest |
|
/assign @haircommander |
| }) | ||
| } | ||
|
|
||
| regularContainerInvalidImgUpdateTest := func(ctx context.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not immediately clear to me the difference between these two funcitons, but can you possibly compress them? I think it'll be easier to reason about
|
this is a great update! I have one nit to make it a bit easier to read but otherwise LGTM |
3553293 to
21649fd
Compare
|
/retest |
21649fd to
1ee7d94
Compare
|
/retest |
SergeyKanzhelev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
|
LGTM label has been added. DetailsGit tree hash: 36a1a3ad7c9e686076058c979454bb2990156f7b |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dshebib, SergeyKanzhelev The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Which issue(s) this PR fixes:
See #126525 for context.
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
Demonstrates current behaviour before this use case is implemented (if it is) #122926