Skip to content

Conversation

@openshift-cherrypick-robot

This is an automated cherry-pick of #904

/assign Miciah

rfredette and others added 3 commits July 20, 2023 19:56
…eteWithOldPodTermination

Also:
- Rename pods to podList
- When checking for old pod termination, only count the currently ready
  pods, instead of all pods
Follow-up to commit 20e4e38.

* test/e2e/operator_test.go
(waitForDeploymentCompleteWithOldPodTermination): Correct the function name
in the godoc.  Use "k8s.io/utils/pointer".Int32Deref, and respect the value
in spec.replicas even if it is set explicitly to 0.
@openshift-ci-robot
Copy link
Contributor

@openshift-cherrypick-robot: Jira Issue OCPBUGS-10846 has been cloned as Jira Issue OCPBUGS-16621. Will retitle bug to link to clone.
/retitle [release-4.12] OCPBUGS-16621: Fix TestClientTLS flakes

Details

In response to this:

This is an automated cherry-pick of #904

/assign Miciah

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot changed the title [release-4.12] OCPBUGS-10846: Fix TestClientTLS flakes [release-4.12] OCPBUGS-16621: Fix TestClientTLS flakes Jul 20, 2023
@openshift-ci-robot openshift-ci-robot added jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. labels Jul 20, 2023
@openshift-ci-robot
Copy link
Contributor

@openshift-cherrypick-robot: This pull request references Jira Issue OCPBUGS-16621, which is invalid:

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

This is an automated cherry-pick of #904

/assign Miciah

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Jul 20, 2023
@openshift-ci openshift-ci bot requested review from candita and frobware July 20, 2023 19:58
@Miciah
Copy link
Contributor

Miciah commented Jul 20, 2023

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jul 20, 2023
@openshift-ci-robot
Copy link
Contributor

@Miciah: This pull request references Jira Issue OCPBUGS-16621, which is valid. The bug has been moved to the POST state.

6 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.12.z) matches configured target version for branch (4.12.z)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)
  • dependent bug Jira Issue OCPBUGS-13071 is in the state Closed (Done), which is one of the valid states (VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE))
  • dependent Jira Issue OCPBUGS-13071 targets the "4.13.0" version, which is one of the valid target versions: 4.13.0, 4.13.z
  • bug has dependents

Requesting review from QA contact:
/cc @lihongan

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Miciah
Copy link
Contributor

Miciah commented Jul 20, 2023

/approve
/lgtm

The PR is low-risk as it only changes E2E tests.
/label backport-risk-assessed

@openshift-ci openshift-ci bot added the backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. label Jul 20, 2023
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 20, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 20, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Miciah

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 20, 2023
@Miciah
Copy link
Contributor

Miciah commented Jul 21, 2023

e2e-aws-operator failed because must-gather failed.

e2e-aws-ovn-upgrade failed because [sig-network] pods should successfully create sandboxes by other failed:

{  10 failures to create the sandbox

ns/openshift-etcd pod/revision-pruner-8-ip-10-0-145-213.us-west-2.compute.internal node/ip-10-0-145-213.us-west-2.compute.internal - 251.30 seconds after deletion - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = creating pod sandbox with name "k8s_revision-pruner-8-ip-10-0-145-213.us-west-2.compute.internal_openshift-etcd_cd3bbe51-d3bf-4b5a-a727-df93ec82a5a3_0": initializing source docker://registry.build01.ci.openshift.org/ci-op-ljljq3q9/stable@sha256:db31c8023112831badd11b77d889ce504ad9d5de3c0855d6c877d887b746967c: Get "https://registry.build01.ci.openshift.org/openshift/token?scope=repository%3Aci-op-ljljq3q9%2Fstable%3Apull": read tcp 10.0.145.213:53356->52.206.202.27:443: read: connection reset by peer
ns/openshift-cluster-csi-drivers pod/aws-ebs-csi-driver-node-k7d4n node/ip-10-0-145-213.us-west-2.compute.internal - never deleted - network rollout - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = creating pod sandbox with name "k8s_aws-ebs-csi-driver-node-k7d4n_openshift-cluster-csi-drivers_168ffb33-d770-498e-9e2b-5aa057009c2c_0": initializing source docker://registry.build01.ci.openshift.org/ci-op-ljljq3q9/stable@sha256:db31c8023112831badd11b77d889ce504ad9d5de3c0855d6c877d887b746967c: Get "https://registry.build01.ci.openshift.org/openshift/token?scope=repository%3Aci-op-ljljq3q9%2Fstable%3Apull": read tcp 10.0.145.213:53432->52.206.202.27:443: read: connection reset by peer
ns/openshift-dns pod/dns-default-qrbsg node/ip-10-0-145-213.us-west-2.compute.internal - never deleted - network rollout - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = creating pod sandbox with name "k8s_dns-default-qrbsg_openshift-dns_dfd185a8-a53e-4098-97d5-22cb752c9b92_0": initializing source docker://registry.build01.ci.openshift.org/ci-op-ljljq3q9/stable@sha256:db31c8023112831badd11b77d889ce504ad9d5de3c0855d6c877d887b746967c: Get "https://registry.build01.ci.openshift.org/openshift/token?scope=repository%3Aci-op-ljljq3q9%2Fstable%3Apull": read tcp 10.0.145.213:53422->52.206.202.27:443: read: connection reset by peer
ns/e2e-k8s-sig-apps-daemonset-upgrade-9126 pod/ds1-967l9 node/ip-10-0-145-213.us-west-2.compute.internal - never deleted - network rollout - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = creating pod sandbox with name "k8s_ds1-967l9_e2e-k8s-sig-apps-daemonset-upgrade-9126_e8421e43-8bca-4d8a-910e-7e9f21452e1b_0": initializing source docker://registry.build01.ci.openshift.org/ci-op-ljljq3q9/stable@sha256:db31c8023112831badd11b77d889ce504ad9d5de3c0855d6c877d887b746967c: Get "https://registry.build01.ci.openshift.org/openshift/token?scope=repository%3Aci-op-ljljq3q9%2Fstable%3Apull": read tcp 10.0.145.213:53436->52.206.202.27:443: read: connection reset by peer
ns/openshift-multus pod/multus-l8dsg node/ip-10-0-145-213.us-west-2.compute.internal - never deleted - network rollout - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = creating pod sandbox with name "k8s_multus-l8dsg_openshift-multus_0513d2f8-235d-4d09-841a-3889108f5534_0": initializing source docker://registry.build01.ci.openshift.org/ci-op-ljljq3q9/stable@sha256:db31c8023112831badd11b77d889ce504ad9d5de3c0855d6c877d887b746967c: Get "https://registry.build01.ci.openshift.org/openshift/token?scope=repository%3Aci-op-ljljq3q9%2Fstable%3Apull": read tcp 10.0.145.213:53420->52.206.202.27:443: read: connection reset by peer
ns/openshift-network-diagnostics pod/network-check-target-blhxn node/ip-10-0-145-213.us-west-2.compute.internal - never deleted - network rollout - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = creating pod sandbox with name "k8s_network-check-target-blhxn_openshift-network-diagnostics_25fab3aa-2aaa-4ce5-8989-b459c2eeaaf5_0": initializing source docker://registry.build01.ci.openshift.org/ci-op-ljljq3q9/stable@sha256:db31c8023112831badd11b77d889ce504ad9d5de3c0855d6c877d887b746967c: Get "https://registry.build01.ci.openshift.org/openshift/token?scope=repository%3Aci-op-ljljq3q9%2Fstable%3Apull": read tcp 10.0.145.213:53396->52.206.202.27:443: read: connection reset by peer
ns/openshift-multus pod/network-metrics-daemon-tkcbz node/ip-10-0-145-213.us-west-2.compute.internal - never deleted - network rollout - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = creating pod sandbox with name "k8s_network-metrics-daemon-tkcbz_openshift-multus_f0d31f9a-06cb-4a69-b245-c1fc046fec46_0": initializing source docker://registry.build01.ci.openshift.org/ci-op-ljljq3q9/stable@sha256:db31c8023112831badd11b77d889ce504ad9d5de3c0855d6c877d887b746967c: Get "https://registry.build01.ci.openshift.org/openshift/token?scope=repository%3Aci-op-ljljq3q9%2Fstable%3Apull": read tcp 10.0.145.213:53448->52.206.202.27:443: read: connection reset by peer
ns/openshift-image-registry pod/node-ca-tgh6c node/ip-10-0-145-213.us-west-2.compute.internal - never deleted - network rollout - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = creating pod sandbox with name "k8s_node-ca-tgh6c_openshift-image-registry_ddf9850f-3556-4b6a-93a2-ab9bf3602819_0": initializing source docker://registry.build01.ci.openshift.org/ci-op-ljljq3q9/stable@sha256:db31c8023112831badd11b77d889ce504ad9d5de3c0855d6c877d887b746967c: Get "https://registry.build01.ci.openshift.org/openshift/token?scope=repository%3Aci-op-ljljq3q9%2Fstable%3Apull": read tcp 10.0.145.213:53394->52.206.202.27:443: read: connection reset by peer
ns/openshift-monitoring pod/node-exporter-trzxd node/ip-10-0-145-213.us-west-2.compute.internal - never deleted - network rollout - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = creating pod sandbox with name "k8s_node-exporter-trzxd_openshift-monitoring_c58e8fb6-2b5e-41e7-90c9-37706e682fa0_0": initializing source docker://registry.build01.ci.openshift.org/ci-op-ljljq3q9/stable@sha256:db31c8023112831badd11b77d889ce504ad9d5de3c0855d6c877d887b746967c: Get "https://registry.build01.ci.openshift.org/openshift/token?scope=repository%3Aci-op-ljljq3q9%2Fstable%3Apull": read tcp 10.0.145.213:53404->52.206.202.27:443: read: connection reset by peer
ns/openshift-ovn-kubernetes pod/ovnkube-master-6kfdv node/ip-10-0-145-213.us-west-2.compute.internal - never deleted - network rollout - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = creating pod sandbox with name "k8s_ovnkube-master-6kfdv_openshift-ovn-kubernetes_70f70854-a81a-404a-8b03-f156a2fa82a2_0": initializing source docker://registry.build01.ci.openshift.org/ci-op-ljljq3q9/stable@sha256:db31c8023112831badd11b77d889ce504ad9d5de3c0855d6c877d887b746967c: Get "https://registry.build01.ci.openshift.org/openshift/token?scope=repository%3Aci-op-ljljq3q9%2Fstable%3Apull": read tcp 10.0.145.213:53378->52.206.202.27:443: read: connection reset by peer}

e2e-gcp-ovn-serial failed because [sig-arch] events should not repeat pathologically failed:

{  3 events happened too frequently

event happened 21 times, something is wrong: node/ci-op-ljljq3q9-7b4ae-ckdxj-worker-f-x4vz2 - reason/ErrorReconcilingNode roles/worker [k8s.ovn.org/node-chassis-id annotation not found for node ci-op-ljljq3q9-7b4ae-ckdxj-worker-f-x4vz2, macAddress annotation not found for node "ci-op-ljljq3q9-7b4ae-ckdxj-worker-f-x4vz2" , k8s.ovn.org/l3-gateway-config annotation not found for node "ci-op-ljljq3q9-7b4ae-ckdxj-worker-f-x4vz2"]
event happened 21 times, something is wrong: node/ci-op-ljljq3q9-7b4ae-ckdxj-worker-b-hzbqm - reason/ErrorReconcilingNode roles/worker [k8s.ovn.org/node-chassis-id annotation not found for node ci-op-ljljq3q9-7b4ae-ckdxj-worker-b-hzbqm, macAddress annotation not found for node "ci-op-ljljq3q9-7b4ae-ckdxj-worker-b-hzbqm" , k8s.ovn.org/l3-gateway-config annotation not found for node "ci-op-ljljq3q9-7b4ae-ckdxj-worker-b-hzbqm"]
event happened 21 times, something is wrong: node/ci-op-ljljq3q9-7b4ae-ckdxj-worker-c-d2vvw - reason/ErrorReconcilingNode roles/worker [k8s.ovn.org/node-chassis-id annotation not found for node ci-op-ljljq3q9-7b4ae-ckdxj-worker-c-d2vvw, macAddress annotation not found for node "ci-op-ljljq3q9-7b4ae-ckdxj-worker-c-d2vvw" , k8s.ovn.org/l3-gateway-config annotation not found for node "ci-op-ljljq3q9-7b4ae-ckdxj-worker-c-d2vvw"]}

This is possibly the same issue as OCPBUGS-10841, which was fixed in 4.14.

I'll rerun tests after #959 merges.

@Miciah
Copy link
Contributor

Miciah commented Jul 21, 2023

/test all
now that #959 has merged.

@Miciah
Copy link
Contributor

Miciah commented Jul 21, 2023

e2e-aws-ovn-upgrade failed because [sig-arch][Feature:ClusterUpgrade] Cluster should remain functional during upgrade failed:

{  fail [github.com/openshift/origin/test/e2e/upgrade/dns/dns.go:142]: Jul 21 18:39:11.711: too many pods were waiting: ns/e2e-check-for-dns-availability-8150 pod/dns-test-12ebeaa6-609f-4101-a9c2-145790da6a18-6pktm,ns/e2e-check-for-dns-availability-8150 pod/dns-test-12ebeaa6-609f-4101-a9c2-145790da6a18-c5h2s,ns/e2e-check-for-dns-availability-8150 pod/dns-test-12ebeaa6-609f-4101-a9c2-145790da6a18-dwkqb
Ginkgo exit error 1: exit with code 1}

I haven't seen that one before. Let's see whether it happens again.
/test e2e-aws-ovn-upgrade

e2e-aws-operator failed because must-gather failed.
/test e2e-aws-operator

@Miciah
Copy link
Contributor

Miciah commented Jul 24, 2023

e2e-aws-operator failed because must-gather failed.
/test e2e-aws-operator

e2e-azure-ovn failed because the installer could not connect to the API:

time="2023-07-21T19:03:20Z" level=info msg="Waiting up to 40m0s (until 7:43PM) for the cluster at https://api.ci-op-p5zg2hbl-48740.ci.azure.devcluster.openshift.com:6443 to initialize..."
time="2023-07-21T19:43:50Z" level=error msg="Attempted to gather ClusterOperator status after installation failure: listing ClusterOperator objects: Get \"https://api.ci-op-p5zg2hbl-48740.ci.azure.devcluster.openshift.com:6443/apis/config.openshift.io/v1/clusteroperators\": dial tcp 40.77.95.146:6443: i/o timeout"
time="2023-07-21T19:43:50Z" level=error msg="Cluster initialization failed because one or more operators are not functioning properly.\nThe cluster should be accessible for troubleshooting as detailed in the documentation linked below,\nhttps://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html\nThe 'wait-for install-complete' subcommand can then be used to continue the installation"
time="2023-07-21T19:43:50Z" level=error msg="failed to initialize the cluster: timed out waiting for the condition"

/test e2e-azure-ovn

e2e-aws-ovn-upgrade failed because [sig-network-edge] Verify DNS availability during and after upgrade success failed:

{Jul 21 22:15:14.618: too many pods were waiting: ns/e2e-check-for-dns-availability-999 pod/dns-test-b82219ad-1648-4d16-982a-a2e40e269e5e-2b9fp,ns/e2e-check-for-dns-availability-999 pod/dns-test-b82219ad-1648-4d16-982a-a2e40e269e5e-hdk9q,ns/e2e-check-for-dns-availability-999 pod/dns-test-b82219ad-1648-4d16-982a-a2e40e269e5e-shd4b Failure Jul 21 22:15:14.618: too many pods were waiting: ns/e2e-check-for-dns-availability-999 pod/dns-test-b82219ad-1648-4d16-982a-a2e40e269e5e-2b9fp,ns/e2e-check-for-dns-availability-999 pod/dns-test-b82219ad-1648-4d16-982a-a2e40e269e5e-hdk9q,ns/e2e-check-for-dns-availability-999 pod/dns-test-b82219ad-1648-4d16-982a-a2e40e269e5e-shd4b

github.com/openshift/origin/test/e2e/upgrade/dns.(*UpgradeTest).validateDNSResults(0x878452c?, 0xc005d71760)
	github.com/openshift/origin/test/e2e/upgrade/dns/dns.go:142 +0x2f4
github.com/openshift/origin/test/e2e/upgrade/dns.(*UpgradeTest).Test(0xc005d71760?, 0x93f4af8?, 0xcb36830?, 0x400000008?)
	github.com/openshift/origin/test/e2e/upgrade/dns/dns.go:48 +0x4e
github.com/openshift/origin/test/extended/util/disruption.(*chaosMonkeyAdapter).Test(0xc000aff040, 0xc001a7abd0)
	github.com/openshift/origin/test/extended/util/disruption/disruption.go:197 +0x315
k8s.io/kubernetes/test/e2e/chaosmonkey.(*Chaosmonkey).Do.func1()
	k8s.io/kubernetes@v1.25.0/test/e2e/chaosmonkey/chaosmonkey.go:94 +0x6a
created by k8s.io/kubernetes/test/e2e/chaosmonkey.(*Chaosmonkey).Do
	k8s.io/kubernetes@v1.25.0/test/e2e/chaosmonkey/chaosmonkey.go:91 +0x8b}

This looks like the same failure that was being tracked for 4.13 with OCPBUGS-6902 and fixed for 4.13 with openshift/origin#27715. I've initiated a backport: OCPBUGS-16696 / openshift/origin#28083.
/test e2e-aws-ovn-upgrade

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 24, 2023

@openshift-cherrypick-robot: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-azure-ovn 2e8f3bb link false /test e2e-azure-ovn

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@lihongan
Copy link
Contributor

lihongan commented Aug 3, 2023

/label cherry-pick-approved

@openshift-ci openshift-ci bot added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Aug 3, 2023
@openshift-merge-robot openshift-merge-robot merged commit e56a18d into openshift:release-4.12 Aug 3, 2023
@openshift-ci-robot
Copy link
Contributor

@openshift-cherrypick-robot: Jira Issue OCPBUGS-16621: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-16621 has been moved to the MODIFIED state.

Details

In response to this:

This is an automated cherry-pick of #904

/assign Miciah

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot
Copy link
Contributor

Fix included in accepted release 4.12.0-0.nightly-2023-08-03-070107

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants