Skip to content

Conversation

@dshebib
Copy link
Contributor

@dshebib dshebib commented Aug 19, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:

See #126525 for context.

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Demonstrates current behaviour before this use case is implemented (if it is) #122926

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 19, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Aug 19, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @dshebib. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added area/test sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 19, 2024
Containers: []v1.Container{
{
Name: regular1,
Image: busyboxImage,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: prefer agnhost to other images:
https://github.com/kubernetes/kubernetes/blob/master/test/images/agnhost/README.md

@aojea @pohly @dims we could probably do more to highlight this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do what? Communicating this to devs (like a mail to dev) or something technical (like a linter?)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

speaking as a new contributor, if most of the tests use agnhost then I'd continue to use it for most tests without thinking twice. I'm working on a PR for this test file to reuse the Context and will also update the images to agnhost.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that we need to inform new and existing contributors about such best practices. That's hard, in particular when existing code and tests don't follow them for historic reasons 😢

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that we need to inform new and existing contributors about such best practices. That's hard, in particular when existing code and tests don't follow them for historic reasons 😢

Right. I think just an email would be a good start. We have #122751 but that doesn't handle additional usage of other currently permitted images from historic reasons.

@aojea
Copy link
Member

aojea commented Aug 26, 2024

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 26, 2024
@dshebib
Copy link
Contributor Author

dshebib commented Aug 27, 2024

/retest

},
Spec: v1.PodSpec{
RestartPolicy: v1.RestartPolicyAlways,
Containers: []v1.Container{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update the PodSpec to have two or more regular containers and then check to see if only the one with the updated image restarts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated test


ginkgo.By("updating the image", func() {
client.Update(ctx, podSpec.Name, func(pod *v1.Pod) {
pod.Spec.Containers[0].Image = "fakeimage"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also want to add a test case for when the container image is updated to one that starts correctly.

@BenTheElder @pohly @aojea
What would be the preferred approach from the SIG-Testing perspective?

  • Update to a different image tag with agnhost
  • Update to busybox or another image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a curated list of images which are okay to use in tests: https://github.com/kubernetes/kubernetes/blob/master/test/images/.permitted-images

We don't have different tags for the same image. So for this test, I think switching to registry.k8s.io/pause is fine. There's also registry.k8s.io/e2e-test-images/busybox, but it might get (be?) superseded by agnhost and I wouldn't add more usages of it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of fakeimage, use invalid.registry.k8s.io/invalid/alpine?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pause is probably suitable for this, the main difference in recent versions is windows support.

However we want to keep it minimal. So maybe the one pause image and the one agnhost image, instead of multiple tags for those images. Which would also mean we are at the latest windows support for both.

Sorry I lost this thread in all the other github notifications.


preparePod(originalPodSpec)

ginkgo.It("should not restart when the image is updated", func(ctx context.Context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks inconsistent with the actual code:

if _, _, changed := containerChanged(&container, containerStatus); changed {
message = fmt.Sprintf("Container %s definition changed", container.Name)
// Restart regardless of the restart policy because the container
// spec changed.
restart = true


ginkgo.By("updating the image", func() {
client.Update(ctx, podSpec.Name, func(pod *v1.Pod) {
podSpec.Spec.Containers[0].Image = imageutils.GetE2EImage(imageutils.InvalidRegistryImage)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be:

Suggested change
podSpec.Spec.Containers[0].Image = imageutils.GetE2EImage(imageutils.InvalidRegistryImage)
pod.Spec.Containers[0].Image = imageutils.GetE2EImage(imageutils.InvalidRegistryImage)

Then, the pod will be restarted.

@hshiina

This comment was marked as outdated.

podSpec, err = client.Get(ctx, podSpec.Name, metav1.GetOptions{})
framework.ExpectNoError(err)
results := parseOutput(ctx, f, podSpec)
framework.ExpectNoError(results.HasNotRestarted(regular1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, we cannot use HasNotRestarted() with an invalid image because this function depends on log messages emitted by valid containers.

@hshiina
Copy link
Contributor

hshiina commented Sep 30, 2024

I'm afraid a previous comment (#126794 (comment)) was based on a behavior with an invalid image. If a new image is valid, the pod will be running after the container is restarted because the newer container will be observed.

err := e2epod.WaitForPodCondition(ctx, f.ClientSet, podSpec.Namespace, podSpec.Name, "wait for container to fail due to image",
time.Duration(bufferSeconds)*time.Second, func(pod *v1.Pod) (bool, error) {
containerStatus := pod.Status.ContainerStatuses[0]
if containerStatus.State.Waiting != nil && containerStatus.State.Waiting.Reason == "ErrImagePull" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The container that hit ErrImagePull at restarting doesn't get into Waiting because ShouldContainerBeRestarted() returns false based on RestartPolicy:

if !kubecontainer.ShouldContainerBeRestarted(&container, pod, podStatus) {
continue
}

In addition, this container will not restarted:

If there is no other container in the pod, the behavior is a little different. The pod will be terminated as mentioned in #126794 (comment).

Anyway, a container whose image was updated to an invalid image will never restart. This may be a bug. It looks a corner case, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like a very specific bug, because it only occurs when restartPolicy=onFailure but not when restartPolicy=Never. So far I could not figure out the reason. For now I just put a fixme but the test is still failing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

framework.ExpectNoError(results.ExitsBefore(prefixedName(PostStartPrefix, regular1), regular2))
})

ginkgo.When("A pod is running a regular container with restartPolicy=Never", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test case with restartPolicy=OnFailure is rendered like:
[sig-node] [NodeConformance] Containers Lifecycle when A pod is running a regular container with restartPolicy=Never [It] should restart when the image is updated with a bad image and restartPolicy=OnFailure
There are restartPolicy=Never and restartPolicy=OnFailure. It would be better to move restartPolicy to each It.

})
})

ginkgo.When("A pod is running a regular container with restartPolicy=Always", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initialization here looks almost the same as one above. Could we put all test cases under a single When?

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 10, 2024
@dshebib dshebib force-pushed the addRegularContainerImgChangeE2E branch from e3c81ac to e91689e Compare October 14, 2024 00:15
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Oct 14, 2024
@SergeyKanzhelev
Copy link
Member

@dshebib do you think you will continue with this PR?

@dshebib dshebib force-pushed the addRegularContainerImgChangeE2E branch from e91689e to e9e46f2 Compare October 22, 2024 05:07
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 22, 2024
regularContainerInvalidImgUpdateTest(ctx)
})

ginkgo.It("should successfully restart when the image is updated and restartPolicy=OnFailure", func(ctx context.Context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ginkgo.It("should successfully restart when the image is updated and restartPolicy=OnFailure", func(ctx context.Context) {
ginkgo.It("should successfully restart when the image is updated and restartPolicy=Never", func(ctx context.Context) {

regularContainerImgUpdateTest(ctx)
})

ginkgo.It("should restart when the image is updated with a bad image and restartPolicy=OnFailure", func(ctx context.Context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ginkgo.It("should restart when the image is updated with a bad image and restartPolicy=OnFailure", func(ctx context.Context) {
ginkgo.It("should restart when the image is updated with a bad image and restartPolicy=Never", func(ctx context.Context) {

@dshebib
Copy link
Contributor Author

dshebib commented Nov 22, 2024

/retest

@SergeyKanzhelev
Copy link
Member

/assign @haircommander

})
}

regularContainerInvalidImgUpdateTest := func(ctx context.Context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not immediately clear to me the difference between these two funcitons, but can you possibly compress them? I think it'll be easier to reason about

@haircommander
Copy link
Contributor

this is a great update! I have one nit to make it a bit easier to read but otherwise LGTM

@dshebib dshebib force-pushed the addRegularContainerImgChangeE2E branch from 3553293 to 21649fd Compare January 20, 2025 03:26
@dshebib
Copy link
Contributor Author

dshebib commented Jan 28, 2025

/retest

@dshebib dshebib force-pushed the addRegularContainerImgChangeE2E branch from 21649fd to 1ee7d94 Compare March 20, 2025 00:38
@dshebib
Copy link
Contributor Author

dshebib commented Mar 20, 2025

/retest

Copy link
Member

@SergeyKanzhelev SergeyKanzhelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 20, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

DetailsGit tree hash: 36a1a3ad7c9e686076058c979454bb2990156f7b

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dshebib, SergeyKanzhelev

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 20, 2025
@k8s-ci-robot k8s-ci-robot merged commit a9d0f39 into kubernetes:master Mar 20, 2025
14 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.33 milestone Mar 20, 2025
@github-project-automation github-project-automation bot moved this from PRs - Needs Reviewer to Done in SIG Node CI/Test Board Mar 20, 2025
@github-project-automation github-project-automation bot moved this from Needs Reviewer to Done in SIG Node: code and documentation PRs Mar 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

9 participants