-
Notifications
You must be signed in to change notification settings - Fork 65
Enable 4.4.10 in fast channel(s) #295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Reviewing the initial CI updates, failures were:
There were also some 4.4.10 -> 4.5 RC CI failures, but nothing that looked like it was worth pulling candidate edges. And no alarming * -> 4.4.10 Telemetry/Insights either. Digging into the hung 4.4.5 -> 4.4.10 failure: $ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-launch-aws/1275681394676207616/artifacts/launch/pods/openshift-cluster-version_cluster-version-operator-7847b46597-wks4s_cluster-version-operator.log | grep 'Running sync.*in state\|Result of work'
I0624 08:00:07.749594 1 sync_worker.go:471] Running sync registry.svc.ci.openshift.org/ocp/release@sha256:0d1ffca302ae55d32574b38438c148d33c2a8a05c8daf97eeb13e9ab948174f7 (force=true) on generation 2 in state Updating at attempt 0
I0624 08:05:52.801425 1 task_graph.go:596] Result of work: [Cluster operator openshift-apiserver is reporting a failure: APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable]
I0624 08:06:15.588329 1 sync_worker.go:471] Running sync registry.svc.ci.openshift.org/ocp/release@sha256:0d1ffca302ae55d32574b38438c148d33c2a8a05c8daf97eeb13e9ab948174f7 (force=true) on generation 2 in state Updating at attempt 1
I0624 08:12:00.639878 1 task_graph.go:596] Result of work: [Cluster operator kube-apiserver is reporting a failure: NodeInstallerDegraded: 1 nodes are failing on revision 10:
I0624 08:12:46.300005 1 sync_worker.go:471] Running sync registry.svc.ci.openshift.org/ocp/release@sha256:0d1ffca302ae55d32574b38438c148d33c2a8a05c8daf97eeb13e9ab948174f7 (force=true) on generation 2 in state Updating at attempt 2
I0624 08:18:31.351840 1 task_graph.go:596] Result of work: [Cluster operator kube-apiserver is reporting a failure: NodeInstallerDegraded: 1 nodes are failing on revision 10:
I0624 08:19:58.985733 1 sync_worker.go:471] Running sync registry.svc.ci.openshift.org/ocp/release@sha256:0d1ffca302ae55d32574b38438c148d33c2a8a05c8daf97eeb13e9ab948174f7 (force=true) on generation 2 in state Updating at attempt 3
I0624 08:25:44.037572 1 task_graph.go:596] Result of work: [Cluster operator kube-apiserver is reporting a failure: NodeInstallerDegraded: 1 nodes are failing on revision 10:
I0624 08:28:51.164473 1 sync_worker.go:471] Running sync registry.svc.ci.openshift.org/ocp/release@sha256:0d1ffca302ae55d32574b38438c148d33c2a8a05c8daf97eeb13e9ab948174f7 (force=true) on generation 2 in state Updating at attempt 4
I0624 08:34:36.216272 1 task_graph.go:596] Result of work: [Cluster operator kube-apiserver is reporting a failure: NodeInstallerDegraded: 1 nodes are failing on revision 10:
I0624 08:37:37.075758 1 sync_worker.go:471] Running sync registry.svc.ci.openshift.org/ocp/release@sha256:0d1ffca302ae55d32574b38438c148d33c2a8a05c8daf97eeb13e9ab948174f7 (force=true) on generation 2 in state Updating at attempt 5Drilling into the ClusterOperator: $ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-launch-aws/1275681394676207616/artifacts/launch/clusteroperators.json | jq -r '.items[] | select(.metadata.name == "kube-apiserver").status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + (.reason // "-") + ": " + (.message // "-")'
2020-06-24T08:02:17Z Degraded=True NodeInstaller_InstallerPodFailed: NodeInstallerDegraded: 1 nodes are failing on revision 10:
NodeInstallerDegraded: pods "installer-10-ip-10-0-145-36.us-west-1.compute.internal" not found
2020-06-24T07:51:55Z Progressing=True NodeInstaller: NodeInstallerProgressing: 1 nodes are at revision 9; 2 nodes are at revision 10
2020-06-24T07:10:33Z Available=True AsExpected: StaticPodsAvailable: 3 nodes are active; 1 nodes are at revision 9; 2 nodes are at revision 10
2020-06-24T07:08:15Z Upgradeable=True AsExpected: -Indeed, that node-installer pod seems to be missing: $ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-launch-aws/1275681394676207616/artifacts/launch/pods.json | jq -r '.items[] | select(.metadata | ((.name | contains("installer")) and .namespace == "openshift-kube-apiserver")) | .status.phase + " " + .metadata.name'
Succeeded installer-10-ip-10-0-131-183.us-west-1.compute.internal
Succeeded installer-10-ip-10-0-143-5.us-west-1.compute.internalNode itself seems fine: $ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-launch-aws/1275681394676207616/artifacts/launch/nodes.json | jq -r '.items[] | select(.metadata.name == "ip-10-0-145-36.us-west-1.compute.internal").status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + (.reason // "-") + ": " + (.message // "-")'
2020-06-24T08:04:39Z MemoryPressure=False KubeletHasSufficientMemory: kubelet has sufficient memory available
2020-06-24T08:04:39Z DiskPressure=False KubeletHasNoDiskPressure: kubelet has no disk pressure
2020-06-24T08:04:39Z PIDPressure=False KubeletHasSufficientPID: kubelet has sufficient PID available
2020-06-24T08:04:39Z Ready=True KubeletReady: kubelet is posting ready statusChecking events: $ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-launch-aws/1275681394676207616/artifacts/launch/events.json | jq -r '.items[] | select(.metadata.namespace == "openshift-kube-apiserver" and (.involvedObject.name == "installer-10-ip-10-0-145-36.us-west-1.compute.internal")) | .firstTimestamp + " " + (.count | tostring) + " " + .message'
2020-06-24T07:59:47Z 1 Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:fd5dcdb63afcda01fc0cb2eba9a642f20948258c536e516ec1f6bd46256cf33f" already present on machine
2020-06-24T07:59:48Z 1 Created container installer
2020-06-24T07:59:48Z 1 Started container installer
2020-06-24T07:59:54Z 1 Successfully installed revision 10So everything seems fine with the pod, but then it was deleted by something and since then the kube-apiserver operator is freaking out and refusing further progress. I'll see if I can find an existing bug around this... |
|
Bug for /lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: openshift-bot, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/override ci/prow/publish |
|
@wking: Overrode contexts on behalf of wking: ci/prow/publish DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/hold cancel Errata is public. |
Please merge as soon as https://errata.devel.redhat.com/advisory/56069 is shipped live OR if a Cincinnati-first release is approved.
This should provide adequate soak time for candidate channel PR #294