[release-4.12] OCPBUGS-16621: Fix TestClientTLS flakes #964

openshift-cherrypick-robot · 2023-07-20T19:56:40Z

This is an automated cherry-pick of #904

/assign Miciah

…esting new mTLS config

…eteWithOldPodTermination Also: - Rename pods to podList - When checking for old pod termination, only count the currently ready pods, instead of all pods

Follow-up to commit 20e4e38. * test/e2e/operator_test.go (waitForDeploymentCompleteWithOldPodTermination): Correct the function name in the godoc. Use "k8s.io/utils/pointer".Int32Deref, and respect the value in spec.replicas even if it is set explicitly to 0.

openshift-ci-robot · 2023-07-20T19:56:49Z

@openshift-cherrypick-robot: Jira Issue OCPBUGS-10846 has been cloned as Jira Issue OCPBUGS-16621. Will retitle bug to link to clone.
/retitle [release-4.12] OCPBUGS-16621: Fix TestClientTLS flakes

Details

In response to this:

This is an automated cherry-pick of #904

/assign Miciah

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2023-07-20T19:57:17Z

@openshift-cherrypick-robot: This pull request references Jira Issue OCPBUGS-16621, which is invalid:

expected dependent Jira Issue OCPBUGS-10846 to target a version in 4.13.0, 4.13.z, but it targets "4.14.0" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

This is an automated cherry-pick of #904

/assign Miciah

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Miciah · 2023-07-20T19:58:45Z

/jira refresh

openshift-ci-robot · 2023-07-20T19:58:52Z

@Miciah: This pull request references Jira Issue OCPBUGS-16621, which is valid. The bug has been moved to the POST state.

6 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.12.z) matches configured target version for branch (4.12.z)
bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)
dependent bug Jira Issue OCPBUGS-13071 is in the state Closed (Done), which is one of the valid states (VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE))
dependent Jira Issue OCPBUGS-13071 targets the "4.13.0" version, which is one of the valid target versions: 4.13.0, 4.13.z
bug has dependents

Requesting review from QA contact:
/cc @lihongan

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Miciah · 2023-07-20T20:00:55Z

/approve
/lgtm

The PR is low-risk as it only changes E2E tests.
/label backport-risk-assessed

openshift-ci · 2023-07-20T20:02:32Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Miciah

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [Miciah]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Miciah · 2023-07-21T05:03:46Z

e2e-aws-operator failed because must-gather failed.

e2e-aws-ovn-upgrade failed because [sig-network] pods should successfully create sandboxes by other failed:

{  10 failures to create the sandbox

ns/openshift-etcd pod/revision-pruner-8-ip-10-0-145-213.us-west-2.compute.internal node/ip-10-0-145-213.us-west-2.compute.internal - 251.30 seconds after deletion - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = creating pod sandbox with name "k8s_revision-pruner-8-ip-10-0-145-213.us-west-2.compute.internal_openshift-etcd_cd3bbe51-d3bf-4b5a-a727-df93ec82a5a3_0": initializing source docker://registry.build01.ci.openshift.org/ci-op-ljljq3q9/stable@sha256:db31c8023112831badd11b77d889ce504ad9d5de3c0855d6c877d887b746967c: Get "https://registry.build01.ci.openshift.org/openshift/token?scope=repository%3Aci-op-ljljq3q9%2Fstable%3Apull": read tcp 10.0.145.213:53356->52.206.202.27:443: read: connection reset by peer
ns/openshift-cluster-csi-drivers pod/aws-ebs-csi-driver-node-k7d4n node/ip-10-0-145-213.us-west-2.compute.internal - never deleted - network rollout - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = creating pod sandbox with name "k8s_aws-ebs-csi-driver-node-k7d4n_openshift-cluster-csi-drivers_168ffb33-d770-498e-9e2b-5aa057009c2c_0": initializing source docker://registry.build01.ci.openshift.org/ci-op-ljljq3q9/stable@sha256:db31c8023112831badd11b77d889ce504ad9d5de3c0855d6c877d887b746967c: Get "https://registry.build01.ci.openshift.org/openshift/token?scope=repository%3Aci-op-ljljq3q9%2Fstable%3Apull": read tcp 10.0.145.213:53432->52.206.202.27:443: read: connection reset by peer
ns/openshift-dns pod/dns-default-qrbsg node/ip-10-0-145-213.us-west-2.compute.internal - never deleted - network rollout - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = creating pod sandbox with name "k8s_dns-default-qrbsg_openshift-dns_dfd185a8-a53e-4098-97d5-22cb752c9b92_0": initializing source docker://registry.build01.ci.openshift.org/ci-op-ljljq3q9/stable@sha256:db31c8023112831badd11b77d889ce504ad9d5de3c0855d6c877d887b746967c: Get "https://registry.build01.ci.openshift.org/openshift/token?scope=repository%3Aci-op-ljljq3q9%2Fstable%3Apull": read tcp 10.0.145.213:53422->52.206.202.27:443: read: connection reset by peer
ns/e2e-k8s-sig-apps-daemonset-upgrade-9126 pod/ds1-967l9 node/ip-10-0-145-213.us-west-2.compute.internal - never deleted - network rollout - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = creating pod sandbox with name "k8s_ds1-967l9_e2e-k8s-sig-apps-daemonset-upgrade-9126_e8421e43-8bca-4d8a-910e-7e9f21452e1b_0": initializing source docker://registry.build01.ci.openshift.org/ci-op-ljljq3q9/stable@sha256:db31c8023112831badd11b77d889ce504ad9d5de3c0855d6c877d887b746967c: Get "https://registry.build01.ci.openshift.org/openshift/token?scope=repository%3Aci-op-ljljq3q9%2Fstable%3Apull": read tcp 10.0.145.213:53436->52.206.202.27:443: read: connection reset by peer
ns/openshift-multus pod/multus-l8dsg node/ip-10-0-145-213.us-west-2.compute.internal - never deleted - network rollout - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = creating pod sandbox with name "k8s_multus-l8dsg_openshift-multus_0513d2f8-235d-4d09-841a-3889108f5534_0": initializing source docker://registry.build01.ci.openshift.org/ci-op-ljljq3q9/stable@sha256:db31c8023112831badd11b77d889ce504ad9d5de3c0855d6c877d887b746967c: Get "https://registry.build01.ci.openshift.org/openshift/token?scope=repository%3Aci-op-ljljq3q9%2Fstable%3Apull": read tcp 10.0.145.213:53420->52.206.202.27:443: read: connection reset by peer
ns/openshift-network-diagnostics pod/network-check-target-blhxn node/ip-10-0-145-213.us-west-2.compute.internal - never deleted - network rollout - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = creating pod sandbox with name "k8s_network-check-target-blhxn_openshift-network-diagnostics_25fab3aa-2aaa-4ce5-8989-b459c2eeaaf5_0": initializing source docker://registry.build01.ci.openshift.org/ci-op-ljljq3q9/stable@sha256:db31c8023112831badd11b77d889ce504ad9d5de3c0855d6c877d887b746967c: Get "https://registry.build01.ci.openshift.org/openshift/token?scope=repository%3Aci-op-ljljq3q9%2Fstable%3Apull": read tcp 10.0.145.213:53396->52.206.202.27:443: read: connection reset by peer
ns/openshift-multus pod/network-metrics-daemon-tkcbz node/ip-10-0-145-213.us-west-2.compute.internal - never deleted - network rollout - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = creating pod sandbox with name "k8s_network-metrics-daemon-tkcbz_openshift-multus_f0d31f9a-06cb-4a69-b245-c1fc046fec46_0": initializing source docker://registry.build01.ci.openshift.org/ci-op-ljljq3q9/stable@sha256:db31c8023112831badd11b77d889ce504ad9d5de3c0855d6c877d887b746967c: Get "https://registry.build01.ci.openshift.org/openshift/token?scope=repository%3Aci-op-ljljq3q9%2Fstable%3Apull": read tcp 10.0.145.213:53448->52.206.202.27:443: read: connection reset by peer
ns/openshift-image-registry pod/node-ca-tgh6c node/ip-10-0-145-213.us-west-2.compute.internal - never deleted - network rollout - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = creating pod sandbox with name "k8s_node-ca-tgh6c_openshift-image-registry_ddf9850f-3556-4b6a-93a2-ab9bf3602819_0": initializing source docker://registry.build01.ci.openshift.org/ci-op-ljljq3q9/stable@sha256:db31c8023112831badd11b77d889ce504ad9d5de3c0855d6c877d887b746967c: Get "https://registry.build01.ci.openshift.org/openshift/token?scope=repository%3Aci-op-ljljq3q9%2Fstable%3Apull": read tcp 10.0.145.213:53394->52.206.202.27:443: read: connection reset by peer
ns/openshift-monitoring pod/node-exporter-trzxd node/ip-10-0-145-213.us-west-2.compute.internal - never deleted - network rollout - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = creating pod sandbox with name "k8s_node-exporter-trzxd_openshift-monitoring_c58e8fb6-2b5e-41e7-90c9-37706e682fa0_0": initializing source docker://registry.build01.ci.openshift.org/ci-op-ljljq3q9/stable@sha256:db31c8023112831badd11b77d889ce504ad9d5de3c0855d6c877d887b746967c: Get "https://registry.build01.ci.openshift.org/openshift/token?scope=repository%3Aci-op-ljljq3q9%2Fstable%3Apull": read tcp 10.0.145.213:53404->52.206.202.27:443: read: connection reset by peer
ns/openshift-ovn-kubernetes pod/ovnkube-master-6kfdv node/ip-10-0-145-213.us-west-2.compute.internal - never deleted - network rollout - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = creating pod sandbox with name "k8s_ovnkube-master-6kfdv_openshift-ovn-kubernetes_70f70854-a81a-404a-8b03-f156a2fa82a2_0": initializing source docker://registry.build01.ci.openshift.org/ci-op-ljljq3q9/stable@sha256:db31c8023112831badd11b77d889ce504ad9d5de3c0855d6c877d887b746967c: Get "https://registry.build01.ci.openshift.org/openshift/token?scope=repository%3Aci-op-ljljq3q9%2Fstable%3Apull": read tcp 10.0.145.213:53378->52.206.202.27:443: read: connection reset by peer}

e2e-gcp-ovn-serial failed because [sig-arch] events should not repeat pathologically failed:

{  3 events happened too frequently

event happened 21 times, something is wrong: node/ci-op-ljljq3q9-7b4ae-ckdxj-worker-f-x4vz2 - reason/ErrorReconcilingNode roles/worker [k8s.ovn.org/node-chassis-id annotation not found for node ci-op-ljljq3q9-7b4ae-ckdxj-worker-f-x4vz2, macAddress annotation not found for node "ci-op-ljljq3q9-7b4ae-ckdxj-worker-f-x4vz2" , k8s.ovn.org/l3-gateway-config annotation not found for node "ci-op-ljljq3q9-7b4ae-ckdxj-worker-f-x4vz2"]
event happened 21 times, something is wrong: node/ci-op-ljljq3q9-7b4ae-ckdxj-worker-b-hzbqm - reason/ErrorReconcilingNode roles/worker [k8s.ovn.org/node-chassis-id annotation not found for node ci-op-ljljq3q9-7b4ae-ckdxj-worker-b-hzbqm, macAddress annotation not found for node "ci-op-ljljq3q9-7b4ae-ckdxj-worker-b-hzbqm" , k8s.ovn.org/l3-gateway-config annotation not found for node "ci-op-ljljq3q9-7b4ae-ckdxj-worker-b-hzbqm"]
event happened 21 times, something is wrong: node/ci-op-ljljq3q9-7b4ae-ckdxj-worker-c-d2vvw - reason/ErrorReconcilingNode roles/worker [k8s.ovn.org/node-chassis-id annotation not found for node ci-op-ljljq3q9-7b4ae-ckdxj-worker-c-d2vvw, macAddress annotation not found for node "ci-op-ljljq3q9-7b4ae-ckdxj-worker-c-d2vvw" , k8s.ovn.org/l3-gateway-config annotation not found for node "ci-op-ljljq3q9-7b4ae-ckdxj-worker-c-d2vvw"]}

This is possibly the same issue as OCPBUGS-10841, which was fixed in 4.14.

I'll rerun tests after #959 merges.

Miciah · 2023-07-21T18:03:41Z

/test all
now that #959 has merged.

Miciah · 2023-07-21T21:34:54Z

e2e-aws-ovn-upgrade failed because [sig-arch][Feature:ClusterUpgrade] Cluster should remain functional during upgrade failed:

{  fail [github.com/openshift/origin/test/e2e/upgrade/dns/dns.go:142]: Jul 21 18:39:11.711: too many pods were waiting: ns/e2e-check-for-dns-availability-8150 pod/dns-test-12ebeaa6-609f-4101-a9c2-145790da6a18-6pktm,ns/e2e-check-for-dns-availability-8150 pod/dns-test-12ebeaa6-609f-4101-a9c2-145790da6a18-c5h2s,ns/e2e-check-for-dns-availability-8150 pod/dns-test-12ebeaa6-609f-4101-a9c2-145790da6a18-dwkqb
Ginkgo exit error 1: exit with code 1}

I haven't seen that one before. Let's see whether it happens again.
/test e2e-aws-ovn-upgrade

e2e-aws-operator failed because must-gather failed.
/test e2e-aws-operator

Miciah · 2023-07-24T14:01:49Z

e2e-aws-operator failed because must-gather failed.
/test e2e-aws-operator

e2e-azure-ovn failed because the installer could not connect to the API:

time="2023-07-21T19:03:20Z" level=info msg="Waiting up to 40m0s (until 7:43PM) for the cluster at https://api.ci-op-p5zg2hbl-48740.ci.azure.devcluster.openshift.com:6443 to initialize..."
time="2023-07-21T19:43:50Z" level=error msg="Attempted to gather ClusterOperator status after installation failure: listing ClusterOperator objects: Get \"https://api.ci-op-p5zg2hbl-48740.ci.azure.devcluster.openshift.com:6443/apis/config.openshift.io/v1/clusteroperators\": dial tcp 40.77.95.146:6443: i/o timeout"
time="2023-07-21T19:43:50Z" level=error msg="Cluster initialization failed because one or more operators are not functioning properly.\nThe cluster should be accessible for troubleshooting as detailed in the documentation linked below,\nhttps://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html\nThe 'wait-for install-complete' subcommand can then be used to continue the installation"
time="2023-07-21T19:43:50Z" level=error msg="failed to initialize the cluster: timed out waiting for the condition"

/test e2e-azure-ovn

e2e-aws-ovn-upgrade failed because [sig-network-edge] Verify DNS availability during and after upgrade success failed:

{Jul 21 22:15:14.618: too many pods were waiting: ns/e2e-check-for-dns-availability-999 pod/dns-test-b82219ad-1648-4d16-982a-a2e40e269e5e-2b9fp,ns/e2e-check-for-dns-availability-999 pod/dns-test-b82219ad-1648-4d16-982a-a2e40e269e5e-hdk9q,ns/e2e-check-for-dns-availability-999 pod/dns-test-b82219ad-1648-4d16-982a-a2e40e269e5e-shd4b Failure Jul 21 22:15:14.618: too many pods were waiting: ns/e2e-check-for-dns-availability-999 pod/dns-test-b82219ad-1648-4d16-982a-a2e40e269e5e-2b9fp,ns/e2e-check-for-dns-availability-999 pod/dns-test-b82219ad-1648-4d16-982a-a2e40e269e5e-hdk9q,ns/e2e-check-for-dns-availability-999 pod/dns-test-b82219ad-1648-4d16-982a-a2e40e269e5e-shd4b

github.com/openshift/origin/test/e2e/upgrade/dns.(*UpgradeTest).validateDNSResults(0x878452c?, 0xc005d71760)
	github.com/openshift/origin/test/e2e/upgrade/dns/dns.go:142 +0x2f4
github.com/openshift/origin/test/e2e/upgrade/dns.(*UpgradeTest).Test(0xc005d71760?, 0x93f4af8?, 0xcb36830?, 0x400000008?)
	github.com/openshift/origin/test/e2e/upgrade/dns/dns.go:48 +0x4e
github.com/openshift/origin/test/extended/util/disruption.(*chaosMonkeyAdapter).Test(0xc000aff040, 0xc001a7abd0)
	github.com/openshift/origin/test/extended/util/disruption/disruption.go:197 +0x315
k8s.io/kubernetes/test/e2e/chaosmonkey.(*Chaosmonkey).Do.func1()
	k8s.io/kubernetes@v1.25.0/test/e2e/chaosmonkey/chaosmonkey.go:94 +0x6a
created by k8s.io/kubernetes/test/e2e/chaosmonkey.(*Chaosmonkey).Do
	k8s.io/kubernetes@v1.25.0/test/e2e/chaosmonkey/chaosmonkey.go:91 +0x8b}

This looks like the same failure that was being tracked for 4.13 with OCPBUGS-6902 and fixed for 4.13 with openshift/origin#27715. I've initiated a backport: OCPBUGS-16696 / openshift/origin#28083.
/test e2e-aws-ovn-upgrade

openshift-ci · 2023-07-24T16:26:29Z

@openshift-cherrypick-robot: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-azure-ovn	`2e8f3bb`	link	false	`/test e2e-azure-ovn`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

lihongan · 2023-08-03T00:55:24Z

/label cherry-pick-approved

openshift-ci-robot · 2023-08-03T00:59:46Z

@openshift-cherrypick-robot: Jira Issue OCPBUGS-16621: All pull requests linked via external trackers have merged:

openshift/cluster-ingress-operator#964

Jira Issue OCPBUGS-16621 has been moved to the MODIFIED state.

Details

In response to this:

This is an automated cherry-pick of #904

/assign Miciah

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-merge-robot · 2023-08-03T16:31:00Z

Fix included in accepted release 4.12.0-0.nightly-2023-08-03-070107

rfredette and others added 3 commits July 20, 2023 19:56

Fix TestClientTLS: Wait for old router pods to be cleaned up before t…

a7128d4

…esting new mTLS config

Rename waitForDeploymentCompleteWithCleanup to waitForDeploymentCompl…

73912c8

…eteWithOldPodTermination Also: - Rename pods to podList - When checking for old pod termination, only count the currently ready pods, instead of all pods

openshift-cherrypick-robot mentioned this pull request Jul 20, 2023

OCPBUGS-10846: Fix TestClientTLS flakes #904

Merged

openshift-cherrypick-robot assigned Miciah Jul 20, 2023

openshift-ci bot changed the title ~~[release-4.12] OCPBUGS-10846: Fix TestClientTLS flakes~~ [release-4.12] OCPBUGS-16621: Fix TestClientTLS flakes Jul 20, 2023

openshift-ci-robot added jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. labels Jul 20, 2023

openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Jul 20, 2023

openshift-ci bot requested review from candita and frobware July 20, 2023 19:58

openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jul 20, 2023

openshift-ci bot requested a review from lihongan July 20, 2023 19:58

Miciah mentioned this pull request Jul 20, 2023

[release-4.12] OCPBUGS-16620: Deflake TestRouterCompressionOperation #963

Merged

openshift-ci bot added the backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. label Jul 20, 2023

openshift-ci bot assigned lihongan, melvinjoseph86 and ShudiLi Jul 20, 2023

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 20, 2023

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 20, 2023

openshift-ci bot added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Aug 3, 2023

openshift-merge-robot merged commit e56a18d into openshift:release-4.12 Aug 3, 2023

This was referenced Aug 7, 2023

[release-4.12] OCPBUGS-15467: Add missing AWS permission for ListTagsForResources #954

Merged

[release-4.12] OCPBUGS-13049: bump controller-runtime to fix the multi namespace cache indexing #922

Merged

[release-4.12] OCPBUGS-16621: Fix TestClientTLS flakes #964

[release-4.12] OCPBUGS-16621: Fix TestClientTLS flakes #964

Uh oh!

Conversation

openshift-cherrypick-robot commented Jul 20, 2023

Uh oh!

openshift-ci-robot commented Jul 20, 2023

Uh oh!

openshift-ci-robot commented Jul 20, 2023

Uh oh!

Miciah commented Jul 20, 2023

Uh oh!

openshift-ci-robot commented Jul 20, 2023

Uh oh!

Miciah commented Jul 20, 2023

Uh oh!

openshift-ci bot commented Jul 20, 2023

Uh oh!

Miciah commented Jul 21, 2023

Uh oh!

Miciah commented Jul 21, 2023

Uh oh!

Miciah commented Jul 21, 2023

Uh oh!

Miciah commented Jul 24, 2023

Uh oh!

openshift-ci bot commented Jul 24, 2023

Uh oh!

lihongan commented Aug 3, 2023

Uh oh!

openshift-ci-robot commented Aug 3, 2023

Uh oh!

openshift-merge-robot commented Aug 3, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants