*: add cluster-etcd-operator to CVO operator payload #53

hexfusion · 2020-01-15T17:15:26Z

This PR effectively adds the cluster-etcd-operator to the CVO operator payload and set it to a Managed state.

We also are reverting the static-sync controller to utilize a generic operator informer and remove it from WaitForCacheSync. This is only a short term precaution as we observed a failure case in testing where WaitForCacheSync appeared to be blocking the actions of the controller which are vital to the bootstrapping process.

alaypatel07 · 2020-01-24T13:33:24Z

/test e2e-aws

hexfusion · 2020-01-24T15:30:10Z

Azure seems fine to me I think we might have a slight delay on rolling out the cluster resulting in timeout. This was a manual run I just did with this CI release image.

level=debug msg="Still waiting for the cluster to initialize: Working towards 0.0.1-2020-01-24-134140"
level=debug msg="Still waiting for the cluster to initialize: Working towards 0.0.1-2020-01-24-134140: downloading update"
level=debug msg="Still waiting for the cluster to initialize: Working towards 0.0.1-2020-01-24-134140: 1% complete"
level=debug msg="Still waiting for the cluster to initialize: Working towards 0.0.1-2020-01-24-134140: 8% complete"
level=debug msg="Still waiting for the cluster to initialize: Working towards 0.0.1-2020-01-24-134140: 13% complete"
level=debug msg="Still waiting for the cluster to initialize: Working towards 0.0.1-2020-01-24-134140: 64% complete"
level=debug msg="Still waiting for the cluster to initialize: Working towards 0.0.1-2020-01-24-134140: 78% complete"
level=debug msg="Still waiting for the cluster to initialize: Working towards 0.0.1-2020-01-24-134140: 83% complete"
level=debug msg="Still waiting for the cluster to initialize: Working towards 0.0.1-2020-01-24-134140: 84% complete"
level=debug msg="Still waiting for the cluster to initialize: Working towards 0.0.1-2020-01-24-134140: 97% complete"
level=debug msg="Still waiting for the cluster to initialize: Working towards 0.0.1-2020-01-24-134140: 98% complete"
level=debug msg="Still waiting for the cluster to initialize: Working towards 0.0.1-2020-01-24-134140: 98% complete, waiting on authentication, console, monitoring, openshift-apiserver, openshift-samples"
level=debug msg="Still waiting for the cluster to initialize: Working towards 0.0.1-2020-01-24-134140: 99% complete"
level=debug msg="Still waiting for the cluster to initialize: Working towards 0.0.1-2020-01-24-134140: 100% complete, waiting on authentication, console"
level=debug msg="Still waiting for the cluster to initialize: Working towards 0.0.1-2020-01-24-134140: 100% complete"
level=debug msg="Still waiting for the cluster to initialize: Working towards 0.0.1-2020-01-24-134140: 100% complete"
level=debug msg="Still waiting for the cluster to initialize: Some cluster operators are still updating: authentication, console"
level=debug msg="Still waiting for the cluster to initialize: Working towards 0.0.1-2020-01-24-134140: 100% complete"
level=debug msg="Still waiting for the cluster to initialize: Some cluster operators are still updating: authentication, console"
level=error msg="Cluster operator authentication Degraded is True with RouteHealth_FailedGet: RouteHealthDegraded: failed to GET route: dial tcp: lookup oauth-openshift.apps.sbatsche-ceo-53-1-2020-01-24.catchall.azure.devcluster.openshift.com on 172.30.0.10:53: no such host"
level=info msg="Cluster operator authentication Progressing is Unknown with NoData: "
level=info msg="Cluster operator authentication Available is Unknown with NoData: "
level=info msg="Cluster operator console Progressing is True with SyncLoopRefreshProgressingInProgress: SyncLoopRefreshProgressing: Working toward version 0.0.1-2020-01-24-134140"
level=info msg="Cluster operator console Available is False with DeploymentAvailableInsufficientReplicas: DeploymentAvailable: 0 pods available for console deployment"
level=info msg="Cluster operator insights Disabled is True with Disabled: Health reporting is disabled"
level=fatal msg="failed to initialize the cluster: Some cluster operators are still updating: authentication, console"

hexfusion · 2020-01-24T15:30:35Z

gcp will need the MCO PR [1]

[1] openshift/machine-config-operator#1408

Update: merged

…factory This reverts commits - 78dbdc9 - bb9b498 for pkg/cmd/staticpodcontroller/staticpodcontroller.go Signed-off-by: Sam Batschelet <sbatsche@redhat.com>

Signed-off-by: Sam Batschelet <sbatsche@redhat.com>

hexfusion · 2020-01-24T17:03:11Z

cc @smarterclayton @crawford

hexfusion · 2020-01-24T17:04:32Z

/hold

for further testing

hexfusion · 2020-01-24T18:18:52Z

aws failure a known issue

ref: https://coreos.slack.com/archives/C999USB0D/p1579889171233100

/test e2e-aws

NOTE: this did not appear to be known.

hexfusion · 2020-01-24T19:19:08Z

/test e2e-azure

hexfusion · 2020-01-24T21:10:12Z

/test e2e-aws

alaypatel07 · 2020-01-24T22:34:37Z

level=info msg="Cluster operator kube-scheduler Progressing is True with : Progressing: 3 nodes are at revision 4; 0 nodes have achieved new revision 5"
level=info msg="Cluster operator machine-config Available is False with : Cluster not available for 0.0.1-2020-01-24-192655"
level=error msg="Cluster operator machine-config Degraded is True with RequiredPoolsFailed: Failed to resync 0.0.1-2020-01-24-192655 because: timed out waiting for the condition during syncRequiredMachineConfigPools: error pool master is not ready, retrying. Status: (pool degraded: false total: 3, ready 2, updated: 3, unavailable: 1)"
level=info msg="Cluster operator monitoring Progressing is True with RollOutInProgress: Rolling out the stack."
level=error msg="Cluster operator monitoring Degraded is True with UpdatingnodeExporterFailed: Failed to rollout

/test e2e-azure

alaypatel07 · 2020-01-25T03:42:43Z

staticsync will have to be refined, but for now, let's get this in!!

/lgtm

openshift-ci-robot · 2020-01-25T03:43:08Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alaypatel07, hexfusion

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [alaypatel07,hexfusion]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

hexfusion · 2020-01-25T03:43:41Z

/hold

waiting for cluster bot upgrade tests to complete

hexfusion · 2020-01-25T06:47:20Z

/test e2e-gcp-upgrade

hexfusion · 2020-01-25T09:46:04Z

Upgrade testiing: passed

Azure:

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-upgrade/31

AWS:

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/15470

GCP:

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade/253

hexfusion · 2020-01-25T09:47:23Z

Given the above, I am making the call to move forward.

/hold cancel

openshift-ci-robot requested review from deads2k and soltysh January 15, 2020 17:17

hexfusion changed the title ~~wip for testing~~ *: add cluster-etcd-operator to CVO operator payload Jan 24, 2020

openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 24, 2020

hexfusion added 2 commits January 24, 2020 11:32

revert static controllers: use typed informer, start operatorinformer…

64a0035

…factory This reverts commits - 78dbdc9 - bb9b498 for pkg/cmd/staticpodcontroller/staticpodcontroller.go Signed-off-by: Sam Batschelet <sbatsche@redhat.com>

*: add cluster-etcd-operator to CVO operator payload

8f7c7c0

Signed-off-by: Sam Batschelet <sbatsche@redhat.com>

hexfusion force-pushed the revert_sync branch from 79f1307 to 8f7c7c0 Compare January 24, 2020 16:36

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 24, 2020

openshift-ci-robot assigned alaypatel07 Jan 25, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 25, 2020

openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 25, 2020

openshift-merge-robot merged commit aec838e into openshift:master Jan 25, 2020

*: add cluster-etcd-operator to CVO operator payload #53

*: add cluster-etcd-operator to CVO operator payload #53

Uh oh!

Conversation

hexfusion commented Jan 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alaypatel07 commented Jan 24, 2020

Uh oh!

hexfusion commented Jan 24, 2020

Uh oh!

hexfusion commented Jan 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hexfusion commented Jan 24, 2020

Uh oh!

hexfusion commented Jan 24, 2020

Uh oh!

hexfusion commented Jan 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hexfusion commented Jan 24, 2020

Uh oh!

hexfusion commented Jan 24, 2020

Uh oh!

alaypatel07 commented Jan 24, 2020

Uh oh!

alaypatel07 commented Jan 25, 2020

Uh oh!

openshift-ci-robot commented Jan 25, 2020

Uh oh!

hexfusion commented Jan 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hexfusion commented Jan 25, 2020

Uh oh!

hexfusion commented Jan 25, 2020

Uh oh!

hexfusion commented Jan 25, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hexfusion commented Jan 15, 2020 •

edited

Loading

hexfusion commented Jan 24, 2020 •

edited

Loading

hexfusion commented Jan 24, 2020 •

edited

Loading

hexfusion commented Jan 25, 2020 •

edited

Loading