OCPBUGS-2435: TestRouterCompressionOperation: Nil-pointer fix by Miciah · Pull Request #843 · openshift/cluster-ingress-operator

Miciah · 2022-10-15T20:48:53Z

Fix a nil-pointer dereference in the getHttpHeaders helper for TestRouterCompressionOperation.

Follow-up to #679.

test/e2e/router_compression_test.go (getHttpHeaders): Don't dereference the response value from client.Do if it returned a non-nil error value.

Fix a nil-pointer dereference in the getHttpHeaders helper for TestRouterCompressionOperation. Follow-up to commit 211b9c1. This commit fixes OCPBUGS-2435. https://issues.redhat.com/browse/OCPBUGS-2435 * test/e2e/router_compression_test.go (getHttpHeaders): Don't dereference the response value from client.Do if it returned a non-nil error value.

openshift-ci-robot · 2022-10-15T20:48:59Z

@Miciah: This pull request references Jira Issue OCPBUGS-2435, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.12.0) matches configured target version for branch (4.12.0)
bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @lihongan

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Fix a nil-pointer dereference in the getHttpHeaders helper for TestRouterCompressionOperation.

Follow-up to #679.

test/e2e/router_compression_test.go (getHttpHeaders): Don't dereference the response value from client.Do if it returned a non-nil error value.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Miciah · 2022-10-15T20:50:05Z

/test e2e-azure-operator
/test e2e-gcp-operator

Miciah · 2022-10-15T22:22:01Z

/test e2e-aws-operator

Miciah · 2022-10-15T23:02:03Z

e2e-gcp-operator failed and e2e-azure-operator failed because TestUnmanagedDNSToManagedDNSInternalIngressController failed in both cases, and this test is known to be flaky. e2e-aws-operator failed because of API issues.

/retest

Miciah · 2022-10-16T16:17:58Z

e2e-aws-operator failed because of kube-apiserver problems. This job failure shows an interesting new failure mode that results from #837: The operator could not fetch the ingresses.config.openshift.io/cluster object, and so the operator did not even try to create the default ingresscontroller.
/test e2e-aws-operator

e2e-azure-operator failed and e2e-gcp-operator failed because kube-apiserver and TestUnmanagedDNSToManagedDNSInternalIngressController flaked.
/test e2e-azure-operator
/test e2e-gcp-operator

Miciah · 2022-10-16T18:08:07Z

The operator could not fetch the ingresses.config.openshift.io/cluster object

Following up on that here: openshift/installer#6478 (comment)

frobware · 2022-10-17T08:55:45Z

/lgtm
/approve

openshift-ci · 2022-10-17T08:56:08Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: frobware

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [frobware]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2022-10-17T09:16:17Z

/retest-required

Remaining retests: 0 against base HEAD 6d63b3c and 2 for PR HEAD a658ba2 in total

frobware · 2022-10-17T09:37:14Z

/retest

Miciah · 2022-10-17T13:41:22Z

/hold
No point in retesting when we know AWS installs are broken. We can resume testing once openshift/installer#6490 is merged.

candita · 2022-10-17T14:03:27Z

/retest-required

Miciah · 2022-10-17T19:22:38Z

/hold cancel
now that openshift/installer#6490 is merged.
/test all
/test e2e-azure-operator
/test e2e-gcp-operator

Miciah · 2022-10-17T20:56:09Z

e2e-gcp-operator failed on TestUnmanagedDNSToManagedDNSInternalIngressController, which is known to be flaky.
/test e2e-gcp-operator

Miciah · 2022-10-17T23:36:58Z

/test all
because #779 merged.
/test e2e-azure-operator
/test e2e-gcp-operator
These keep failing on TestUnmanagedDNSToManagedDNSInternalIngressController and very often kube-apiserver NodeInstallerProgressing failures.

Miciah · 2022-10-18T00:13:56Z

/hold
because I don't want this PR to merge without e2e-azure-operator and e2e-gcp-operator passing.

Miciah · 2022-10-18T00:14:44Z

Installer error:

 level=error msg=Error: error reading Route Table (rtb-0841b02723ba42394): couldn't find resource

/test e2e-aws-ovn

Miciah · 2022-10-18T00:15:13Z

Same installer error for e2e-aws-ovn-upgrade.
/test e2e-aws-ovn-upgrade

frobware · 2022-10-18T08:16:26Z

/retest-required

gcs278 · 2022-10-19T14:40:11Z

FYI #845 is mostly failing on this E2E error now.

gcs278 · 2022-10-19T14:43:10Z

/retest
Looks like distruption-type issues, ovn issues, DNSUnmanagedToManagedInternal-type issues fixed by #845, but nothing related to TestRouterCompressionOperation

Miciah · 2022-10-20T17:45:58Z

/test all
now that #845 has merged.
/test e2e-azure-operator
/test e2e-gcp-operator

Miciah · 2022-10-20T19:32:29Z

Tests passed in e2e-gcp-operator:

�[36mINFO�[0m[2022-10-20T18:44:03Z] Running step e2e-gcp-operator-test.          
�[36mINFO�[0m[2022-10-20T19:08:13Z] Step e2e-gcp-operator-test succeeded after 24m10s.

Tests passed in e2e-azure-operator:

�[36mINFO�[0m[2022-10-20T19:07:00Z] Running step e2e-azure-operator-test.        
�[36mINFO�[0m[2022-10-20T19:28:00Z] Step e2e-azure-operator-test succeeded after 21m0s.

/hold cancel

Miciah · 2022-10-20T19:52:15Z

e2e-azure-ovn failed because must-gather failed.
/test e2e-azure-ovn

Miciah · 2022-10-20T20:44:16Z

e2e-aws-operator failed because must-gather failed and because kube-apiserver and kube-scheduler were reportingNodeInstallerProgressing.
/test e2e-aws-operator

Miciah · 2022-10-20T21:04:14Z

e2e-gcp-ovn-serial failed because disruption/ingress-to-oauth-server connection/new and disruption/ingress-to-console connection/new failed.
/test e2e-tcp-ovn-serial

openshift-ci · 2022-10-20T21:04:18Z

@Miciah: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test e2e-aws-operator
/test e2e-aws-ovn
/test e2e-aws-ovn-upgrade
/test e2e-gcp-ovn-serial
/test images
/test unit
/test verify

The following commands are available to trigger optional jobs:

/test e2e-aws-ovn-single-node
/test e2e-azure-operator
/test e2e-azure-ovn
/test e2e-gcp-operator

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator
pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-ovn
pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-ovn-single-node
pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-ovn-upgrade
pull-ci-openshift-cluster-ingress-operator-master-e2e-azure-ovn
pull-ci-openshift-cluster-ingress-operator-master-e2e-gcp-ovn-serial
pull-ci-openshift-cluster-ingress-operator-master-images
pull-ci-openshift-cluster-ingress-operator-master-unit
pull-ci-openshift-cluster-ingress-operator-master-verify

Details

In response to this:

e2e-gcp-ovn-serial failed because disruption/ingress-to-oauth-server connection/new and disruption/ingress-to-console connection/new failed.
/test e2e-tcp-ovn-serial

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Miciah · 2022-10-20T21:04:23Z

/test e2e-gcp-ovn-serial

gcs278 · 2022-10-20T21:48:20Z

/test e2e-gcp-ovn-serial
Disruption failures again.

Miciah · 2022-10-20T23:43:17Z

e2e-aws-operator failed because TestUnmanagedDNSToManagedDNSInternalIngressController got EOFs. Also, kube-apiserver and kube-scheduler were reporting NodeInstallerProgressing.
/test e2e-aws-operator

Miciah · 2022-10-21T03:22:05Z

e2e-aws-operator failed because TestUpdateDefaultIngressController failed.

In addition, kube-apiserver reported NodeInstallerProgressing, image-registry reported NodeCADaemonProgressing, and ingress reported DeploymentRollingOut.

The failed deployment rollout for ingress may be related to the following MalscheduledPod event with timestamp 00:41:23: pod/router-default-5d98b8ddb6-zvm96 pod/router-default-68d8559557-lv24z should be one per node, but all were placed on node/ip-10-0-198-248.us-west-1.compute.internal; evicting pod/router-default-5d98b8ddb6-zvm96. The router-default-5d98b8ddb6-zvm96 pod is not in pods.json, but router-default-68d8559557-lv24z is.

I noticed in the kube-apiserver audit logs there is a single eviction request, by ingress-operator, for router-default-5d98b8ddb6-zvm96, as well as two deletes:

ip-10-0-142-11.us-west-1.compute.internal-audit-2022-10-21T00-52-08.826.log:{"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata","auditID":"2283a8ef-9132-4345-9980-2530e8d87ee2","stage":"ResponseComplete","requestURI":"/api/v1/namespaces/openshift-ingress/pods/router-default-5d98b8ddb6-zvm96","verb":"delete","user":{"username":"system:node:ip-10-0-198-248.us-west-1.compute.internal","groups":["system:nodes","system:authenticated"]},"sourceIPs":["10.0.167.134"],"userAgent":"kubelet/v1.25.2+495fa99 (linux/amd64) kubernetes/e313ec2","objectRef":{"resource":"pods","namespace":"openshift-ingress","name":"router-default-5d98b8ddb6-zvm96","apiVersion":"v1"},"responseStatus":{"metadata":{},"code":200},"requestReceivedTimestamp":"2022-10-21T00:42:36.176533Z","stageTimestamp":"2022-10-21T00:42:36.196551Z","annotations":{"authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":""}}
ip-10-0-232-241.us-west-1.compute.internal-audit-2022-10-21T00-44-52.075.log:{"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata","auditID":"604ec47b-a1bf-4078-8516-4a631a584f7b","stage":"ResponseComplete","requestURI":"/api/v1/namespaces/openshift-ingress/pods/router-default-5d98b8ddb6-zvm96/eviction","verb":"create","user":{"username":"system:serviceaccount:openshift-ingress-operator:ingress-operator","uid":"bc667cf5-757c-432f-8cac-a0d2dd6a7b63","groups":["system:serviceaccounts","system:serviceaccounts:openshift-ingress-operator","system:authenticated"],"extra":{"authentication.kubernetes.io/pod-name":["ingress-operator-545ccb9685-pl2sc"],"authentication.kubernetes.io/pod-uid":["590af6d6-06ee-4823-a35d-4e42fa0db3d1"]}},"sourceIPs":["10.0.142.11"],"userAgent":"ingress-operator/v0.0.0 (linux/amd64) kubernetes/$Format","objectRef":{"resource":"pods","namespace":"openshift-ingress","name":"router-default-5d98b8ddb6-zvm96","apiVersion":"v1","subresource":"eviction"},"responseStatus":{"metadata":{},"status":"Success","code":201},"requestReceivedTimestamp":"2022-10-21T00:41:23.091911Z","stageTimestamp":"2022-10-21T00:41:23.166917Z","annotations":{"authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":"RBAC: allowed by ClusterRoleBinding \"openshift-ingress-operator\" of ClusterRole \"openshift-ingress-operator\" to ServiceAccount \"ingress-operator/openshift-ingress-operator\""}}
ip-10-0-232-241.us-west-1.compute.internal-audit-2022-10-21T00-44-52.075.log:{"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata","auditID":"0e1c2602-832f-47dc-a544-0a60be7ba190","stage":"ResponseComplete","requestURI":"/api/v1/namespaces/openshift-ingress/pods/router-default-5d98b8ddb6-zvm96","verb":"delete","user":{"username":"system:serviceaccount:kube-system:replicaset-controller","uid":"240367d2-d201-4e96-bcea-c3002b1ba9fd","groups":["system:serviceaccounts","system:serviceaccounts:kube-system","system:authenticated"]},"sourceIPs":["10.0.167.134"],"userAgent":"kube-controller-manager/v1.25.0 (linux/amd64) kubernetes/4bd0702/system:serviceaccount:kube-system:replicaset-controller","objectRef":{"resource":"pods","namespace":"openshift-ingress","name":"router-default-5d98b8ddb6-zvm96","apiVersion":"v1"},"responseStatus":{"metadata":{},"code":200},"requestReceivedTimestamp":"2022-10-21T00:41:23.153632Z","stageTimestamp":"2022-10-21T00:41:23.180243Z","annotations":{"authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":"RBAC: allowed by ClusterRoleBinding \"system:controller:replicaset-controller\" of ClusterRole \"system:controller:replicaset-controller\" to ServiceAccount \"replicaset-controller/kube-system\""}}

Subsequently, machine-config-controller tried and failed over a hundred times to evict the router-default-68d8559557-lv24z pod. For example:

ip-10-0-142-11.us-west-1.compute.internal-audit-2022-10-21T00-52-08.826.log:{"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata","auditID":"8634515a-8fcb-40ea-8ac8-f12c3d218f42","stage":"ResponseComplete","requestURI":"/api/v1/namespaces/openshift-ingress/pods/router-default-68d8559557-lv24z/eviction","verb":"create","user":{"username":"system:serviceaccount:openshift-machine-config-operator:machine-config-controller","uid":"d94da2fa-39a4-4aea-8d48-e18eee481602","groups":["system:serviceaccounts","system:serviceaccounts:openshift-machine-config-operator","system:authenticated"],"extra":{"authentication.kubernetes.io/pod-name":["machine-config-controller-7cf476fb49-ll984"],"authentication.kubernetes.io/pod-uid":["f25ace7b-9298-47c0-bf09-39a70aee017d"]}},"sourceIPs":["10.129.0.33"],"userAgent":"machine-config-controller/v0.0.0 (linux/amd64) kubernetes/$Format/node-update-controller","objectRef":{"resource":"pods","namespace":"openshift-ingress","name":"router-default-68d8559557-lv24z","apiVersion":"v1","subresource":"eviction"},"responseStatus":{"metadata":{},"status":"Failure","message":"Cannot evict pod as it would violate the pod's disruption budget.","reason":"TooManyRequests","details":{"causes":[{"reason":"DisruptionBudget","message":"The disruption budget router-default needs 1 healthy pods and has 1 currently"}]},"code":429},"requestReceivedTimestamp":"2022-10-21T00:47:02.273260Z","stageTimestamp":"2022-10-21T00:47:02.280643Z","annotations":{"authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":"RBAC: allowed by ClusterRoleBinding \"machine-config-controller\" of ClusterRole \"machine-config-controller\" to ServiceAccount \"machine-config-controller/openshift-machine-config-operator\""}}

I wonder whether the ingress operator's eviction broke the subsequent rollout. This may warrant some investigation.

Anyway, that seems to be a one-off failure for this PR, so I'll assume it isn't caused by the changes in this PR.
/test e2e-aws-operator

Miciah · 2022-10-21T06:05:08Z

e2e-aws-operator failed because kube-apiserver reported NodeInstallerProgressing and because the TestUnmanagedDNSToManagedDNSInternalIngressController test failed with EOFs.
/test e2e-aws-operator

Miciah · 2022-10-21T11:36:59Z

=== RUN   TestAll/serial/TestRouterCompressionOperation
    router_compression_test.go:178: failed to observe deployment completion: failed to observe deployment rollout complete; deployment specifies 2 replicas and has generation 14; last observed 2 updated, 0 available, 3 total replicas, with observed generation 14

/test e2e-aws-operator

Miciah · 2022-10-21T14:18:42Z

Seems like e2e-aws-operator is consistently failing, for various reasons unrelated to this PR. Since e2e-azure-operator and e2e-gcp-operator passed (#843 (comment)), only a few of the e2e-aws-operator job failures are from failures in our E2E tests, and none of the failures are attributable to changes in this PR, I'm going to override e2e-aws-operator so we can at least eliminate one cause for test failures for other PRs.
/override ci/prow/e2e-aws-operator

openshift-ci · 2022-10-21T14:22:54Z

@Miciah: Overrode contexts on behalf of Miciah: ci/prow/e2e-aws-operator

Details

In response to this:

Seems like e2e-aws-operator is consistently failing, for various reasons unrelated to this PR. Since e2e-azure-operator and e2e-gcp-operator passed (#843 (comment)), only a few of the e2e-aws-operator job failures are from failures in our E2E tests, and none of the failures are attributable to changes in this PR, I'm going to override e2e-aws-operator so we can at least eliminate one cause for test failures for other PRs.
/override ci/prow/e2e-aws-operator

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci · 2022-10-21T14:23:00Z

@Miciah: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci-robot · 2022-10-21T14:27:39Z

@Miciah: All pull requests linked via external trackers have merged:

openshift/cluster-ingress-operator#843

Jira Issue OCPBUGS-2435 has been moved to the MODIFIED state.

Details

In response to this:

Fix a nil-pointer dereference in the getHttpHeaders helper for TestRouterCompressionOperation.

Follow-up to #679.

test/e2e/router_compression_test.go (getHttpHeaders): Don't dereference the response value from client.Do if it returned a non-nil error value.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci bot requested review from alebedev87, knobunc and lihongan October 15, 2022 20:49

Miciah mentioned this pull request Oct 15, 2022

NE-1071: Default HAProxy maxconn value to 50000 for OCP 4.12) #836

Merged

Miciah mentioned this pull request Oct 17, 2022

CCO-249: Replace GCP role with explicit permissions #844

Merged

openshift-ci bot assigned frobware Oct 17, 2022

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 17, 2022

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 17, 2022

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 17, 2022

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 17, 2022

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 18, 2022

gcs278 mentioned this pull request Oct 19, 2022

OCPBUGS-2493: Fix TestUnmanagedDNSToManagedDNSInternal E2E test race conditions #845

Merged

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 20, 2022

openshift-merge-robot merged commit 9742afc into openshift:master Oct 21, 2022

Miciah mentioned this pull request Oct 21, 2022

Updating ose-cluster-ingress-operator images to be consistent with ART #834

Merged

Conversation

Miciah commented Oct 15, 2022

Uh oh!

openshift-ci-robot commented Oct 15, 2022

Uh oh!

Miciah commented Oct 15, 2022

Uh oh!

Miciah commented Oct 15, 2022

Uh oh!

Miciah commented Oct 15, 2022

Uh oh!

Miciah commented Oct 16, 2022

Uh oh!

Miciah commented Oct 16, 2022

Uh oh!

frobware commented Oct 17, 2022

Uh oh!

openshift-ci bot commented Oct 17, 2022

Uh oh!

openshift-ci-robot commented Oct 17, 2022

Uh oh!

frobware commented Oct 17, 2022

Uh oh!

Miciah commented Oct 17, 2022

Uh oh!

candita commented Oct 17, 2022

Uh oh!

Miciah commented Oct 17, 2022

Uh oh!

Miciah commented Oct 17, 2022

Uh oh!

Miciah commented Oct 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Miciah commented Oct 18, 2022

Uh oh!

Miciah commented Oct 18, 2022

Uh oh!

Miciah commented Oct 18, 2022

Uh oh!

frobware commented Oct 18, 2022

Uh oh!

gcs278 commented Oct 19, 2022

Uh oh!

gcs278 commented Oct 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Miciah commented Oct 20, 2022

Uh oh!

Miciah commented Oct 20, 2022

Uh oh!

Miciah commented Oct 20, 2022

Uh oh!

Miciah commented Oct 20, 2022

Uh oh!

Miciah commented Oct 20, 2022

Uh oh!

openshift-ci bot commented Oct 20, 2022

Uh oh!

Miciah commented Oct 20, 2022

Uh oh!

gcs278 commented Oct 20, 2022

Uh oh!

Miciah commented Oct 20, 2022

Uh oh!

Miciah commented Oct 21, 2022

Uh oh!

Miciah commented Oct 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Miciah commented Oct 21, 2022

Uh oh!

Miciah commented Oct 21, 2022

Uh oh!

openshift-ci bot commented Oct 21, 2022

Uh oh!

openshift-ci bot commented Oct 21, 2022

Uh oh!

openshift-ci-robot commented Oct 21, 2022

Uh oh!

Miciah commented Oct 17, 2022 •

edited

Loading

gcs278 commented Oct 19, 2022 •

edited

Loading

Miciah commented Oct 21, 2022 •

edited

Loading